commit
b17b299b32
|
@ -1,5 +0,0 @@
|
|||
---
|
||||
title: "监控、日志和排错"
|
||||
weight: 80
|
||||
description: 设置监视和日志记录以对集群进行故障排除或调试容器化应用。
|
||||
---
|
|
@ -1,453 +0,0 @@
|
|||
---
|
||||
title: 审计
|
||||
content_type: concept
|
||||
---
|
||||
<!--
|
||||
reviewers:
|
||||
- soltysh
|
||||
- sttts
|
||||
- ericchiang
|
||||
content_type: concept
|
||||
title: Auditing
|
||||
-->
|
||||
<!-- overview -->
|
||||
|
||||
{{< feature-state state="beta" >}}
|
||||
|
||||
<!--
|
||||
Kubernetes _auditing_ provides a security-relevant, chronological set of records documenting
|
||||
the sequence of actions in a cluster. The cluster audits the activities generated by users,
|
||||
by applications that use the Kubernetes API, and by the control plane itself.
|
||||
|
||||
Auditing allows cluster administrators to answer the following questions:
|
||||
-->
|
||||
Kubernetes _审计(Auditing)_ 功能提供了与安全相关的、按时间顺序排列的记录集,
|
||||
记录每个用户、使用 Kubernetes API 的应用以及控制面自身引发的活动。
|
||||
|
||||
审计功能使得集群管理员能够回答以下问题:
|
||||
|
||||
<!--
|
||||
- what happened?
|
||||
- when did it happen?
|
||||
- who initiated it?
|
||||
- on what did it happen?
|
||||
- where was it observed?
|
||||
- from where was it initiated?
|
||||
- to where was it going?
|
||||
-->
|
||||
- 发生了什么?
|
||||
- 什么时候发生的?
|
||||
- 谁触发的?
|
||||
- 活动发生在哪个(些)对象上?
|
||||
- 在哪观察到的?
|
||||
- 它从哪触发的?
|
||||
- 活动的后续处理行为是什么?
|
||||
|
||||
<!-- body -->
|
||||
|
||||
<!--
|
||||
Audit records begin their lifecycle inside the
|
||||
[kube-apiserver](/docs/reference/command-line-tools-reference/kube-apiserver/)
|
||||
component. Each request on each stage
|
||||
of its execution generates an audit event, which is then pre-processed according to
|
||||
a certain policy and written to a backend. The policy determines what's recorded
|
||||
and the backends persist the records. The current backend implementations
|
||||
include logs files and webhooks.
|
||||
-->
|
||||
审计记录最初产生于
|
||||
[kube-apiserver](/zh/docs/reference/command-line-tools-reference/kube-apiserver/)
|
||||
内部。每个请求在不同执行阶段都会生成审计事件;这些审计事件会根据特定策略
|
||||
被预处理并写入后端。策略确定要记录的内容和用来存储记录的后端。
|
||||
当前的后端支持日志文件和 webhook。
|
||||
|
||||
<!--
|
||||
Each request can be recorded with an associated _stage_. The defined stages are:
|
||||
|
||||
- `RequestReceived` - The stage for events generated as soon as the audit
|
||||
handler receives the request, and before it is delegated down the handler
|
||||
chain.
|
||||
- `ResponseStarted` - Once the response headers are sent, but before the
|
||||
response body is sent. This stage is only generated for long-running requests
|
||||
(e.g. watch).
|
||||
- `ResponseComplete` - The response body has been completed and no more bytes
|
||||
will be sent.
|
||||
- `Panic` - Events generated when a panic occurred.
|
||||
-->
|
||||
每个请求都可被记录其相关的 _阶段(stage)_。已定义的阶段有:
|
||||
|
||||
- `RequestReceived` - 此阶段对应审计处理器接收到请求后,并且在委托给
|
||||
其余处理器之前生成的事件。
|
||||
- `ResponseStarted` - 在响应消息的头部发送后,响应消息体发送前生成的事件。
|
||||
只有长时间运行的请求(例如 watch)才会生成这个阶段。
|
||||
- `ResponseComplete` - 当响应消息体完成并且没有更多数据需要传输的时候。
|
||||
- `Panic` - 当 panic 发生时生成。
|
||||
|
||||
<!--
|
||||
The configuration of an
|
||||
[Audit Event configuration](/docs/reference/config-api/apiserver-audit.v1/#audit-k8s-io-v1-Event)
|
||||
is different from the
|
||||
[Event](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#event-v1-core)
|
||||
API object.
|
||||
-->
|
||||
{{< note >}}
|
||||
[审计事件配置](/zh/docs/reference/config-api/apiserver-audit.v1/#audit-k8s-io-v1-Event)
|
||||
的配置与 [Event](/zh/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#event-v1-core)
|
||||
API 对象不同。
|
||||
{{< /note >}}
|
||||
|
||||
<!--
|
||||
The audit logging feature increases the memory consumption of the API server
|
||||
because some context required for auditing is stored for each request.
|
||||
Additionally, memory consumption depends on the audit logging configuration.
|
||||
-->
|
||||
审计日志记录功能会增加 API server 的内存消耗,因为需要为每个请求存储审计所需的某些上下文。
|
||||
此外,内存消耗取决于审计日志记录的配置。
|
||||
|
||||
<!--
|
||||
## Audit Policy
|
||||
|
||||
Audit policy defines rules about what events should be recorded and what data
|
||||
they should include. The audit policy object structure is defined in the
|
||||
[`audit.k8s.io` API group](/docs/reference/config-api/apiserver-audit.v1/#audit-k8s-io-v1-Policy).
|
||||
When an event is processed, it's
|
||||
compared against the list of rules in order. The first matching rule sets the
|
||||
_audit level_ of the event. The defined audit levels are:
|
||||
-->
|
||||
## 审计策略 {#audit-policy}
|
||||
|
||||
审计政策定义了关于应记录哪些事件以及应包含哪些数据的规则。
|
||||
审计策略对象结构定义在
|
||||
[`audit.k8s.io` API 组](/zh/docs/reference/config-api/apiserver-audit.v1/#audit-k8s-io-v1-Policy)
|
||||
处理事件时,将按顺序与规则列表进行比较。第一个匹配规则设置事件的
|
||||
_审计级别(Audit Level)_。已定义的审计级别有:
|
||||
|
||||
<!--
|
||||
- `None` - don't log events that match this rule.
|
||||
- `Metadata` - log request metadata (requesting user, timestamp, resource,
|
||||
verb, etc.) but not request or response body.
|
||||
- `Request` - log event metadata and request body but not response body.
|
||||
This does not apply for non-resource requests.
|
||||
- `RequestResponse` - log event metadata, request and response bodies.
|
||||
This does not apply for non-resource requests.
|
||||
-->
|
||||
- `None` - 符合这条规则的日志将不会记录。
|
||||
- `Metadata` - 记录请求的元数据(请求的用户、时间戳、资源、动词等等),
|
||||
但是不记录请求或者响应的消息体。
|
||||
- `Request` - 记录事件的元数据和请求的消息体,但是不记录响应的消息体。
|
||||
这不适用于非资源类型的请求。
|
||||
- `RequestResponse` - 记录事件的元数据,请求和响应的消息体。这不适用于非资源类型的请求。
|
||||
|
||||
<!--
|
||||
You can pass a file with the policy to `kube-apiserver`
|
||||
using the `--audit-policy-file` flag. If the flag is omitted, no events are logged.
|
||||
Note that the `rules` field __must__ be provided in the audit policy file.
|
||||
A policy with no (0) rules is treated as illegal.
|
||||
|
||||
Below is an example audit policy file:
|
||||
-->
|
||||
你可以使用 `--audit-policy-file` 标志将包含策略的文件传递给 `kube-apiserver`。
|
||||
如果不设置该标志,则不记录事件。
|
||||
注意 `rules` 字段 __必须__ 在审计策略文件中提供。没有(0)规则的策略将被视为非法配置。
|
||||
|
||||
以下是一个审计策略文件的示例:
|
||||
|
||||
{{< codenew file="audit/audit-policy.yaml" >}}
|
||||
|
||||
<!--
|
||||
You can use a minimal audit policy file to log all requests at the `Metadata` level:
|
||||
-->
|
||||
你可以使用最低限度的审计策略文件在 `Metadata` 级别记录所有请求:
|
||||
|
||||
```yaml
|
||||
# 在 Metadata 级别为所有请求生成日志
|
||||
apiVersion: audit.k8s.io/v1beta1
|
||||
kind: Policy
|
||||
rules:
|
||||
- level: Metadata
|
||||
```
|
||||
|
||||
<!--
|
||||
If you're crafting your own audit profile, you can use the audit profile for Google Container-Optimized OS as a starting point. You can check the
|
||||
[configure-helper.sh](https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/gci/configure-helper.sh)
|
||||
script, which generates the audit policy file. You can see most of the audit policy file by looking directly at the script.
|
||||
|
||||
You can also refer to the [`Policy` configuration reference](/docs/reference/config-api/apiserver-audit.v1/#audit-k8s-io-v1-Policy)
|
||||
for details about the fields defined.
|
||||
-->
|
||||
如果你在打磨自己的审计配置文件,你可以使用为 Google Container-Optimized OS
|
||||
设计的审计配置作为出发点。你可以参考
|
||||
[configure-helper.sh](https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/gci/configure-helper.sh)
|
||||
脚本,该脚本能够生成审计策略文件。你可以直接在脚本中看到审计策略的绝大部份内容。
|
||||
|
||||
你也可以参考 [`Policy` 配置参考](/zh/docs/reference/config-api/apiserver-audit.v1/#audit-k8s-io-v1-Policy)
|
||||
以获取有关已定义字段的详细信息。
|
||||
|
||||
<!--
|
||||
## Audit backends
|
||||
|
||||
Audit backends persist audit events to an external storage.
|
||||
Out of the box, the kube-apiserver provides two backends:
|
||||
|
||||
- Log backend, which writes events into the filesystem
|
||||
- Webhook backend, which sends events to an external HTTP API
|
||||
|
||||
In all cases, audit events follow a structure defined by the Kubernetes API in the
|
||||
[`audit.k8s.io` API group](/docs/reference/config-api/apiserver-audit.v1/#audit-k8s-io-v1-Event).
|
||||
-->
|
||||
|
||||
## 审计后端 {#audit-backends}
|
||||
|
||||
审计后端实现将审计事件导出到外部存储。`Kube-apiserver` 默认提供两个后端:
|
||||
|
||||
- Log 后端,将事件写入到文件系统
|
||||
- Webhook 后端,将事件发送到外部 HTTP API
|
||||
|
||||
在这所有情况下,审计事件均遵循 Kubernetes API 在
|
||||
[`audit.k8s.io` API 组](/zh/docs/reference/config-api/apiserver-audit.v1/#audit-k8s-io-v1-Event)
|
||||
中定义的结构。
|
||||
|
||||
<!--
|
||||
In case of patches, request body is a JSON array with patch operations, not a JSON object
|
||||
with an appropriate Kubernetes API object. For example, the following request body is a valid patch
|
||||
request to `/apis/batch/v1/namespaces/some-namespace/jobs/some-job-name`.
|
||||
-->
|
||||
{{< note >}}
|
||||
对于 patch 请求,请求的消息体需要是设定 patch 操作的 JSON 所构成的一个串,
|
||||
而不是一个完整的 Kubernetes API 对象 JSON 串。
|
||||
例如,以下的示例是一个合法的 patch 请求消息体,该请求对应
|
||||
`/apis/batch/v1/namespaces/some-namespace/jobs/some-job-name`。
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"op": "replace",
|
||||
"path": "/spec/parallelism",
|
||||
"value": 0
|
||||
},
|
||||
{
|
||||
"op": "remove",
|
||||
"path": "/spec/template/spec/containers/0/terminationMessagePolicy"
|
||||
}
|
||||
]
|
||||
```
|
||||
{{< /note >}}
|
||||
|
||||
<!--
|
||||
### Log backend
|
||||
|
||||
The log backend writes audit events to a file in [JSONlines](https://jsonlines.org/) format.
|
||||
You can configure the log audit backend using the following `kube-apiserver` flags:
|
||||
|
||||
Log backend writes audit events to a file in JSON format. You can configure
|
||||
log audit backend using the following [kube-apiserver][kube-apiserver] flags:
|
||||
-->
|
||||
### Log 后端
|
||||
|
||||
Log 后端将审计事件写入 [JSONlines](https://jsonlines.org/) 格式的文件。
|
||||
你可以使用以下 `kube-apiserver` 标志配置 Log 审计后端:
|
||||
|
||||
<!--
|
||||
- `--audit-log-path` specifies the log file path that log backend uses to write
|
||||
audit events. Not specifying this flag disables log backend. `-` means standard out
|
||||
- `--audit-log-maxage` defined the maximum number of days to retain old audit log files
|
||||
- `--audit-log-maxbackup` defines the maximum number of audit log files to retain
|
||||
- `--audit-log-maxsize` defines the maximum size in megabytes of the audit log file before it gets rotated
|
||||
-->
|
||||
- `--audit-log-path` 指定用来写入审计事件的日志文件路径。不指定此标志会禁用日志后端。`-` 意味着标准化
|
||||
- `--audit-log-maxage` 定义保留旧审计日志文件的最大天数
|
||||
- `--audit-log-maxbackup` 定义要保留的审计日志文件的最大数量
|
||||
- `--audit-log-maxsize` 定义审计日志文件的最大大小(兆字节)
|
||||
|
||||
<!--
|
||||
If your cluster's control plane runs the kube-apiserver as a Pod, remember to mount the `hostPath`
|
||||
to the location of the policy file and log file, so that audit records are persisted. For example:
|
||||
-->
|
||||
如果你的集群控制面以 Pod 的形式运行 kube-apiserver,记得要通过 `hostPath`
|
||||
卷来访问策略文件和日志文件所在的目录,这样审计记录才会持久保存下来。例如:
|
||||
|
||||
```shell
|
||||
--audit-policy-file=/etc/kubernetes/audit-policy.yaml
|
||||
--audit-log-path=/var/log/kubernetes/audit/audit.log
|
||||
```
|
||||
|
||||
接下来挂载数据卷:
|
||||
|
||||
```yaml
|
||||
volumeMounts:
|
||||
- mountPath: /etc/kubernetes/audit-policy.yaml
|
||||
name: audit
|
||||
readOnly: true
|
||||
- mountPath: /var/log/kubernetes/audit/
|
||||
name: audit-log
|
||||
readOnly: false
|
||||
```
|
||||
|
||||
<!--
|
||||
and finally configure the `hostPath`:
|
||||
-->
|
||||
最后配置 `hostPath`:
|
||||
|
||||
```yaml
|
||||
...
|
||||
volumes:
|
||||
- name: audit
|
||||
hostPath:
|
||||
path: /etc/kubernetes/audit-policy.yaml
|
||||
type: File
|
||||
|
||||
- name: audit-log
|
||||
hostPath:
|
||||
path: /var/log/kubernetes/audit/
|
||||
type: DirectoryOrCreate
|
||||
```
|
||||
|
||||
<!--
|
||||
### Webhook backend
|
||||
|
||||
The webhook audit backend sends audit events to a remote web API, which is assumed to
|
||||
be a form of the Kubernetes API, including means of authentication. You can configure
|
||||
a webhook audit backend using the following kube-apiserver flags:
|
||||
-->
|
||||
### Webhook 后端 {#webhook-backend}
|
||||
|
||||
Webhook 后端将审计事件发送到远程 Web API,该远程 API 应该暴露与 `kube-apiserver`
|
||||
形式相同的 API,包括其身份认证机制。你可以使用如下 kube-apiserver 标志来配置
|
||||
Webhook 审计后端:
|
||||
|
||||
<!--
|
||||
- `--audit-webhook-config-file` specifies the path to a file with a webhook
|
||||
configuration. The webhook configuration is effectively a specialized
|
||||
[kubeconfig](/docs/tasks/access-application-cluster/configure-access-multiple-clusters).
|
||||
- `--audit-webhook-initial-backoff` specifies the amount of time to wait after the first failed
|
||||
request before retrying. Subsequent requests are retried with exponential backoff.
|
||||
|
||||
The webhook config file uses the kubeconfig format to specify the remote address of
|
||||
the service and credentials used to connect to it.
|
||||
-->
|
||||
- `--audit-webhook-config-file` 设置 Webhook 配置文件的路径。Webhook 配置文件实际上是一个
|
||||
[kubeconfig 文件](/zh/docs/concepts/configuration/organize-cluster-access-kubeconfig/)。
|
||||
- `--audit-webhook-initial-backoff` 指定在第一次失败后重发请求等待的时间。随后的请求将以指数退避重试。
|
||||
|
||||
Webhook 配置文件使用 kubeconfig 格式指定服务的远程地址和用于连接它的凭据。
|
||||
|
||||
<!--
|
||||
## Event batching {#batching}
|
||||
|
||||
Both log and webhook backends support batching. Using webhook as an example, here's the list of
|
||||
available flags. To get the same flag for log backend, replace `webhook` with `log` in the flag
|
||||
name. By default, batching is enabled in `webhook` and disabled in `log`. Similarly, by default
|
||||
throttling is enabled in `webhook` and disabled in `log`.
|
||||
-->
|
||||
## 事件批处理 {#batching}
|
||||
|
||||
日志和 Webhook 后端都支持批处理。以 Webhook 为例,以下是可用参数列表。要获取日志
|
||||
后端的同样参数,请在参数名称中将 `webhook` 替换为 `log`。
|
||||
默认情况下,在 `webhook` 中批处理是被启用的,在 `log` 中批处理是被禁用的。
|
||||
同样,默认情况下,在 `webhook` 中启用带宽限制,在 `log` 中禁用带宽限制。
|
||||
|
||||
<!--
|
||||
- `--audit-webhook-mode` defines the buffering strategy. One of the following:
|
||||
- `batch` - buffer events and asynchronously process them in batches. This is the default.
|
||||
- `blocking` - block API server responses on processing each individual event.
|
||||
- `blocking-strict` - Same as blocking, but when there is a failure during audit logging at the
|
||||
RequestReceived stage, the whole request to the kube-apiserver fails.
|
||||
-->
|
||||
- `--audit-webhook-mode` 定义缓存策略,可选值如下:
|
||||
- `batch` - 以批处理缓存事件和异步的过程。这是默认值。
|
||||
- `blocking` - 在 API 服务器处理每个单独事件时,阻塞其响应。
|
||||
- `blocking-strict` - 与 `blocking` 相同,不过当审计日志在 RequestReceived 阶段
|
||||
失败时,整个 API 服务请求会失效。
|
||||
|
||||
<!--
|
||||
The following flags are used only in the `batch` mode.
|
||||
|
||||
- `--audit-webhook-batch-buffer-size` defines the number of events to buffer before batching.
|
||||
If the rate of incoming events overflows the buffer, events are dropped.
|
||||
- `--audit-webhook-batch-max-size` defines the maximum number of events in one batch.
|
||||
- `--audit-webhook-batch-max-wait` defines the maximum amount of time to wait before unconditionally
|
||||
batching events in the queue.
|
||||
- `--audit-webhook-batch-throttle-qps` defines the maximum average number of batches generated
|
||||
per second.
|
||||
- `--audit-webhook-batch-throttle-burst` defines the maximum number of batches generated at the same
|
||||
moment if the allowed QPS was underutilized previously.
|
||||
-->
|
||||
以下参数仅用于 `batch` 模式。
|
||||
|
||||
- `--audit-webhook-batch-buffer-size` 定义 batch 之前要缓存的事件数。
|
||||
如果传入事件的速率溢出缓存区,则会丢弃事件。
|
||||
- `--audit-webhook-batch-max-size` 定义一个 batch 中的最大事件数。
|
||||
- `--audit-webhook-batch-max-wait` 无条件 batch 队列中的事件前等待的最大事件。
|
||||
- `--audit-webhook-batch-throttle-qps` 每秒生成的最大批次数。
|
||||
- `--audit-webhook-batch-throttle-burst` 在达到允许的 QPS 前,同一时刻允许存在的最大 batch 生成数。
|
||||
|
||||
<!--
|
||||
## Parameter tuning
|
||||
|
||||
Parameters should be set to accommodate the load on the API server.
|
||||
|
||||
For example, if kube-apiserver receives 100 requests each second, and each request is audited only
|
||||
on `ResponseStarted` and `ResponseComplete` stages, you should account for ≅200 audit
|
||||
events being generated each second. Assuming that there are up to 100 events in a batch,
|
||||
you should set throttling level at least 2 queries per second. Assuming that the backend can take up to
|
||||
5 seconds to write events, you should set the buffer size to hold up to 5 seconds of events;
|
||||
that is: 10 batches, or 1000 events.
|
||||
-->
|
||||
## 参数调整 {#parameter-tuning}
|
||||
|
||||
需要设置参数以适应 API 服务器上的负载。
|
||||
|
||||
例如,如果 kube-apiserver 每秒收到 100 个请求,并且每个请求仅在 `ResponseStarted`
|
||||
和 `ResponseComplete` 阶段进行审计,则应该考虑每秒生成约 200 个审计事件。
|
||||
假设批处理中最多有 100 个事件,则应将限制级别设置为每秒至少 2 个查询。
|
||||
假设后端最多需要 5 秒钟来写入事件,你应该设置缓冲区大小以容纳最多 5 秒的事件,
|
||||
即 10 个 batch,即 1000 个事件。
|
||||
|
||||
<!--
|
||||
In most cases however, the default parameters should be sufficient and you don't have to worry about
|
||||
setting them manually. You can look at the following Prometheus metrics exposed by kube-apiserver
|
||||
and in the logs to monitor the state of the auditing subsystem.
|
||||
|
||||
- `apiserver_audit_event_total` metric contains the total number of audit events exported.
|
||||
- `apiserver_audit_error_total` metric contains the total number of events dropped due to an error
|
||||
during exporting.
|
||||
-->
|
||||
但是,在大多数情况下,默认参数应该足够了,你不必手动设置它们。
|
||||
你可以查看 kube-apiserver 公开的以下 Prometheus 指标,并在日志中监控审计子系统的状态。
|
||||
|
||||
- `apiserver_audit_event_total` 包含所有暴露的审计事件数量的指标。
|
||||
- `apiserver_audit_error_total` 在暴露时由于发生错误而被丢弃的事件的数量。
|
||||
|
||||
<!--
|
||||
### Log entry truncation {#truncate}
|
||||
|
||||
Both log and webhook backends support limiting the size of events that are logged.
|
||||
As an example, the following is the list of flags available for the log backend:
|
||||
-->
|
||||
### 日志条目截断 {#truncate}
|
||||
|
||||
日志后端和 Webhook 后端都支持限制所输出的事件的尺寸。
|
||||
例如,下面是可以为日志后端配置的标志列表:
|
||||
|
||||
<!--
|
||||
- `audit-log-truncate-enabled` whether event and batch truncating is enabled.
|
||||
- `audit-log-truncate-max-batch-size` maximum size in bytes of the batch sent to the underlying backend.
|
||||
- `audit-log-truncate-max-event-size` maximum size in bytes of the audit event sent to the underlying backend.
|
||||
-->
|
||||
- `audit-log-truncate-enabled`:是否弃用事件和批次的截断处理。
|
||||
- `audit-log-truncate-max-batch-size`:向下层后端发送的各批次的最大尺寸字节数。
|
||||
- `audit-log-truncate-max-event-size`:向下层后端发送的审计事件的最大尺寸字节数。
|
||||
|
||||
<!--
|
||||
By default truncate is disabled in both `webhook` and `log`, a cluster administrator should set
|
||||
`audit-log-truncate-enabled` or `audit-webhook-truncate-enabled` to enable the feature.
|
||||
-->
|
||||
默认情况下,截断操作在 `webhook` 和 `log` 后端都是被禁用的,集群管理员需要设置
|
||||
`audit-log-truncate-enabled` 或 `audit-webhook-truncate-enabled` 标志来启用此操作。
|
||||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
<!--
|
||||
* Learn about [Mutating webhook auditing annotations](/docs/reference/access-authn-authz/extensible-admission-controllers/#mutating-webhook-auditing-annotations).
|
||||
-->
|
||||
* 了解 [Mutating webhook 审计注解](/zh/docs/reference/access-authn-authz/extensible-admission-controllers/#mutating-webhook-auditing-annotations)。
|
||||
|
|
@ -1,568 +0,0 @@
|
|||
---
|
||||
reviewers:
|
||||
- Random-Liu
|
||||
- feiskyer
|
||||
- mrunalp
|
||||
title: 使用 crictl 对 Kubernetes 节点进行调试
|
||||
content_type: task
|
||||
---
|
||||
|
||||
<!--
|
||||
reviewers:
|
||||
- Random-Liu
|
||||
- feiskyer
|
||||
- mrunalp
|
||||
title: Debugging Kubernetes nodes with crictl
|
||||
content_type: task
|
||||
-->
|
||||
|
||||
|
||||
<!-- overview -->
|
||||
|
||||
{{< feature-state for_k8s_version="v1.11" state="stable" >}}
|
||||
|
||||
<!--
|
||||
`crictl` is a command-line interface for CRI-compatible container runtimes.
|
||||
You can use it to inspect and debug container runtimes and applications on a
|
||||
Kubernetes node. `crictl` and its source are hosted in the
|
||||
[cri-tools](https://github.com/kubernetes-sigs/cri-tools) repository.
|
||||
-->
|
||||
|
||||
`crictl` 是 CRI 兼容的容器运行时命令行接口。
|
||||
你可以使用它来检查和调试 Kubernetes 节点上的容器运行时和应用程序。
|
||||
`crictl` 和它的源代码在
|
||||
[cri-tools](https://github.com/kubernetes-sigs/cri-tools) 代码库。
|
||||
|
||||
## {{% heading "prerequisites" %}}
|
||||
|
||||
<!--
|
||||
`crictl` requires a Linux operating system with a CRI runtime.
|
||||
-->
|
||||
`crictl` 需要带有 CRI 运行时的 Linux 操作系统。
|
||||
|
||||
<!-- steps -->
|
||||
|
||||
<!--
|
||||
## Installing crictl
|
||||
|
||||
You can download a compressed archive `crictl` from the cri-tools
|
||||
[release page](https://github.com/kubernetes-sigs/cri-tools/releases), for several
|
||||
different architectures. Download the version that corresponds to your version
|
||||
of Kubernetes. Extract it and move it to a location on your system path, such as
|
||||
`/usr/local/bin/`.
|
||||
-->
|
||||
## 安装 crictl
|
||||
|
||||
你可以从 cri-tools [发布页面](https://github.com/kubernetes-sigs/cri-tools/releases)
|
||||
下载一个压缩的 `crictl` 归档文件,用于几种不同的架构。
|
||||
下载与你的 kubernetes 版本相对应的版本。
|
||||
提取它并将其移动到系统路径上的某个位置,例如`/usr/local/bin/`。
|
||||
|
||||
<!--
|
||||
## General usage
|
||||
|
||||
The `crictl` command has several subcommands and runtime flags. Use
|
||||
`crictl help` or `crictl <subcommand> help` for more details.
|
||||
-->
|
||||
## 一般用法
|
||||
|
||||
`crictl` 命令有几个子命令和运行时参数。
|
||||
有关详细信息,请使用 `crictl help` 或 `crictl <subcommand> help` 获取帮助信息。
|
||||
|
||||
<!--
|
||||
You can set the endpoint for `crictl` by doing one of the following:
|
||||
-->
|
||||
你可以用以下方法之一来为 `crictl` 设置端点:
|
||||
|
||||
<!--
|
||||
* Set the `--runtime-endpoint` and `--image-endpoint` flags.
|
||||
* Set the `CONTAINER_RUNTIME_ENDPOINT` and `IMAGE_SERVICE_ENDPOINT` environment
|
||||
variables.
|
||||
* Set the endpoint in the configuration file `/etc/crictl.yaml`. To specify a
|
||||
different file, use the `--config=PATH_TO_FILE` flag when you run `crictl`.
|
||||
-->
|
||||
- 设置参数 `--runtime-endpoint` 和 `--image-endpoint`。
|
||||
- 设置环境变量 `CONTAINER_RUNTIME_ENDPOINT` 和 `IMAGE_SERVICE_ENDPOINT`。
|
||||
- 在配置文件 `--config=/etc/crictl.yaml` 中设置端点。
|
||||
要设置不同的文件,可以在运行 `crictl` 时使用 `--config=PATH_TO_FILE` 标志。
|
||||
|
||||
<!--
|
||||
You can also specify timeout values when connecting to the server and enable or
|
||||
disable debugging, by specifying `timeout` or `debug` values in the configuration
|
||||
file or using the `--timeout` and `--debug` command-line flags.
|
||||
-->
|
||||
你还可以在连接到服务器并启用或禁用调试时指定超时值,方法是在配置文件中指定
|
||||
`timeout` 或 `debug` 值,或者使用 `--timeout` 和 `--debug` 命令行参数。
|
||||
|
||||
<!--
|
||||
To view or edit the current configuration, view or edit the contents of
|
||||
`/etc/crictl.yaml`. For example, the configuration when using the `containerd`
|
||||
container runtime would be similar to this:
|
||||
-->
|
||||
要查看或编辑当前配置,请查看或编辑 `/etc/crictl.yaml` 的内容。
|
||||
例如,使用 `containerd` 容器运行时的配置会类似于这样:
|
||||
|
||||
```
|
||||
runtime-endpoint: unix:///var/run/containerd/containerd.sock
|
||||
image-endpoint: unix:///var/run/containerd/containerd.sock
|
||||
timeout: 10
|
||||
debug: true
|
||||
```
|
||||
|
||||
<!--
|
||||
To learn more about `crictl`, refer to the [`crictl`
|
||||
documentation](https://github.com/kubernetes-sigs/cri-tools/blob/master/docs/crictl.md).
|
||||
-->
|
||||
要进一步了解 `crictl`,参阅
|
||||
[`crictl` 文档](https://github.com/kubernetes-sigs/cri-tools/blob/master/docs/crictl.md)。
|
||||
|
||||
<!--
|
||||
## Example crictl commands
|
||||
|
||||
The following examples show some `crictl` commands and example output.
|
||||
-->
|
||||
## crictl 命令示例
|
||||
|
||||
{{< warning >}}
|
||||
<!--
|
||||
If you use `crictl` to create pod sandboxes or containers on a running
|
||||
Kubernetes cluster, the Kubelet will eventually delete them. `crictl` is not a
|
||||
general purpose workflow tool, but a tool that is useful for debugging.
|
||||
-->
|
||||
如果使用 `crictl` 在正在运行的 Kubernetes 集群上创建 Pod 沙盒或容器,
|
||||
kubelet 最终将删除它们。
|
||||
`crictl` 不是一个通用的工作流工具,而是一个对调试有用的工具。
|
||||
{{< /warning >}}
|
||||
|
||||
<!--
|
||||
### List pods
|
||||
|
||||
List all pods:
|
||||
-->
|
||||
### 打印 Pod 清单
|
||||
|
||||
打印所有 Pod 的清单:
|
||||
|
||||
```shell
|
||||
crictl pods
|
||||
```
|
||||
|
||||
<!--
|
||||
The output is similar to this:
|
||||
-->
|
||||
输出类似于:
|
||||
|
||||
```
|
||||
POD ID CREATED STATE NAME NAMESPACE ATTEMPT
|
||||
926f1b5a1d33a About a minute ago Ready sh-84d7dcf559-4r2gq default 0
|
||||
4dccb216c4adb About a minute ago Ready nginx-65899c769f-wv2gp default 0
|
||||
a86316e96fa89 17 hours ago Ready kube-proxy-gblk4 kube-system 0
|
||||
919630b8f81f1 17 hours ago Ready nvidia-device-plugin-zgbbv kube-system 0
|
||||
```
|
||||
|
||||
<!--
|
||||
List pods by name:
|
||||
-->
|
||||
根据名称打印 Pod 清单:
|
||||
|
||||
```shell
|
||||
crictl pods --name nginx-65899c769f-wv2gp
|
||||
```
|
||||
|
||||
<!--
|
||||
The output is similar to this:
|
||||
-->
|
||||
输出类似于这样:
|
||||
|
||||
```
|
||||
POD ID CREATED STATE NAME NAMESPACE ATTEMPT
|
||||
4dccb216c4adb 2 minutes ago Ready nginx-65899c769f-wv2gp default 0
|
||||
```
|
||||
|
||||
<!--
|
||||
List pods by label:
|
||||
-->
|
||||
根据标签打印 Pod 清单:
|
||||
|
||||
```shell
|
||||
crictl pods --label run=nginx
|
||||
```
|
||||
|
||||
<!--
|
||||
The output is similar to this:
|
||||
-->
|
||||
输出类似于这样:
|
||||
|
||||
```none
|
||||
POD ID CREATED STATE NAME NAMESPACE ATTEMPT
|
||||
4dccb216c4adb 2 minutes ago Ready nginx-65899c769f-wv2gp default 0
|
||||
```
|
||||
|
||||
<!--
|
||||
### List images
|
||||
|
||||
List all images:
|
||||
-->
|
||||
### 打印镜像清单
|
||||
|
||||
打印所有镜像清单:
|
||||
|
||||
```shell
|
||||
crictl images
|
||||
```
|
||||
|
||||
<!--
|
||||
The output is similar to this:
|
||||
-->
|
||||
输出类似于这样:
|
||||
|
||||
```none
|
||||
IMAGE TAG IMAGE ID SIZE
|
||||
busybox latest 8c811b4aec35f 1.15MB
|
||||
k8s-gcrio.azureedge.net/hyperkube-amd64 v1.10.3 e179bbfe5d238 665MB
|
||||
k8s-gcrio.azureedge.net/pause-amd64 3.1 da86e6ba6ca19 742kB
|
||||
nginx latest cd5239a0906a6 109MB
|
||||
```
|
||||
|
||||
<!--
|
||||
List images by repository:
|
||||
-->
|
||||
根据仓库打印镜像清单:
|
||||
|
||||
```shell
|
||||
crictl images nginx
|
||||
```
|
||||
|
||||
<!--
|
||||
The output is similar to this:
|
||||
-->
|
||||
输出类似于这样:
|
||||
|
||||
```none
|
||||
IMAGE TAG IMAGE ID SIZE
|
||||
nginx latest cd5239a0906a6 109MB
|
||||
```
|
||||
|
||||
<!--
|
||||
Only list image IDs:
|
||||
-->
|
||||
只打印镜像 ID:
|
||||
|
||||
```shell
|
||||
crictl images -q
|
||||
```
|
||||
|
||||
<!--
|
||||
The output is similar to this:
|
||||
-->
|
||||
输出类似于这样:
|
||||
|
||||
```none
|
||||
sha256:8c811b4aec35f259572d0f79207bc0678df4c736eeec50bc9fec37ed936a472a
|
||||
sha256:e179bbfe5d238de6069f3b03fccbecc3fb4f2019af741bfff1233c4d7b2970c5
|
||||
sha256:da86e6ba6ca197bf6bc5e9d900febd906b133eaa4750e6bed647b0fbe50ed43e
|
||||
sha256:cd5239a0906a6ccf0562354852fae04bc5b52d72a2aff9a871ddb6bd57553569
|
||||
```
|
||||
|
||||
<!--
|
||||
### List containers
|
||||
|
||||
List all containers:
|
||||
-->
|
||||
### 打印容器清单
|
||||
|
||||
打印所有容器清单:
|
||||
|
||||
```shell
|
||||
crictl ps -a
|
||||
```
|
||||
|
||||
<!--
|
||||
The output is similar to this:
|
||||
-->
|
||||
输出类似于这样:
|
||||
|
||||
```none
|
||||
CONTAINER ID IMAGE CREATED STATE NAME ATTEMPT
|
||||
1f73f2d81bf98 busybox@sha256:141c253bc4c3fd0a201d32dc1f493bcf3fff003b6df416dea4f41046e0f37d47 7 minutes ago Running sh 1
|
||||
9c5951df22c78 busybox@sha256:141c253bc4c3fd0a201d32dc1f493bcf3fff003b6df416dea4f41046e0f37d47 8 minutes ago Exited sh 0
|
||||
87d3992f84f74 nginx@sha256:d0a8828cccb73397acb0073bf34f4d7d8aa315263f1e7806bf8c55d8ac139d5f 8 minutes ago Running nginx 0
|
||||
1941fb4da154f k8s-gcrio.azureedge.net/hyperkube-amd64@sha256:00d814b1f7763f4ab5be80c58e98140dfc69df107f253d7fdd714b30a714260a 18 hours ago Running kube-proxy 0
|
||||
```
|
||||
|
||||
<!--
|
||||
List running containers:
|
||||
-->
|
||||
打印正在运行的容器清单:
|
||||
|
||||
```shell
|
||||
crictl ps
|
||||
```
|
||||
|
||||
<!--
|
||||
The output is similar to this:
|
||||
-->
|
||||
输出类似于这样:
|
||||
|
||||
```none
|
||||
CONTAINER ID IMAGE CREATED STATE NAME ATTEMPT
|
||||
1f73f2d81bf98 busybox@sha256:141c253bc4c3fd0a201d32dc1f493bcf3fff003b6df416dea4f41046e0f37d47 6 minutes ago Running sh 1
|
||||
87d3992f84f74 nginx@sha256:d0a8828cccb73397acb0073bf34f4d7d8aa315263f1e7806bf8c55d8ac139d5f 7 minutes ago Running nginx 0
|
||||
1941fb4da154f k8s-gcrio.azureedge.net/hyperkube-amd64@sha256:00d814b1f7763f4ab5be80c58e98140dfc69df107f253d7fdd714b30a714260a 17 hours ago Running kube-proxy 0
|
||||
```
|
||||
|
||||
<!--
|
||||
### Execute a command in a running container
|
||||
-->
|
||||
### 在正在运行的容器上执行命令
|
||||
|
||||
```shell
|
||||
crictl exec -i -t 1f73f2d81bf98 ls
|
||||
```
|
||||
|
||||
<!--
|
||||
The output is similar to this:
|
||||
-->
|
||||
输出类似于这样:
|
||||
|
||||
```none
|
||||
bin dev etc home proc root sys tmp usr var
|
||||
```
|
||||
|
||||
<!--
|
||||
### Get a container's logs
|
||||
|
||||
Get all container logs:
|
||||
-->
|
||||
### 获取容器日志
|
||||
|
||||
获取容器的所有日志:
|
||||
|
||||
```shell
|
||||
crictl logs 87d3992f84f74
|
||||
```
|
||||
|
||||
<!--
|
||||
The output is similar to this:
|
||||
-->
|
||||
输出类似于这样:
|
||||
|
||||
```none
|
||||
10.240.0.96 - - [06/Jun/2018:02:45:49 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.47.0" "-"
|
||||
10.240.0.96 - - [06/Jun/2018:02:45:50 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.47.0" "-"
|
||||
10.240.0.96 - - [06/Jun/2018:02:45:51 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.47.0" "-"
|
||||
```
|
||||
|
||||
<!--
|
||||
Get only the latest `N` lines of logs:
|
||||
-->
|
||||
获取最近的 `N` 行日志:
|
||||
|
||||
```shell
|
||||
crictl logs --tail=1 87d3992f84f74
|
||||
```
|
||||
|
||||
<!--
|
||||
The output is similar to this:
|
||||
-->
|
||||
输出类似于这样:
|
||||
|
||||
```none
|
||||
10.240.0.96 - - [06/Jun/2018:02:45:51 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.47.0" "-"
|
||||
```
|
||||
|
||||
<!--
|
||||
### Run a pod sandbox
|
||||
|
||||
Using `crictl` to run a pod sandbox is useful for debugging container runtimes.
|
||||
On a running Kubernetes cluster, the sandbox will eventually be stopped and
|
||||
deleted by the Kubelet.
|
||||
-->
|
||||
### 运行 Pod 沙盒
|
||||
|
||||
用 `crictl` 运行 Pod 沙盒对容器运行时排错很有帮助。
|
||||
在运行的 Kubernetes 集群中,沙盒会随机地被 kubelet 停止和删除。
|
||||
|
||||
<!--
|
||||
1. Create a JSON file like the following:
|
||||
-->
|
||||
1. 编写下面的 JSON 文件:
|
||||
|
||||
```json
|
||||
{
|
||||
"metadata": {
|
||||
"name": "nginx-sandbox",
|
||||
"namespace": "default",
|
||||
"attempt": 1,
|
||||
"uid": "hdishd83djaidwnduwk28bcsb"
|
||||
},
|
||||
"logDirectory": "/tmp",
|
||||
"linux": {
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
<!--
|
||||
2. Use the `crictl runp` command to apply the JSON and run the sandbox.
|
||||
-->
|
||||
2. 使用 `crictl runp` 命令应用 JSON 文件并运行沙盒。
|
||||
|
||||
```shell
|
||||
crictl runp pod-config.json
|
||||
```
|
||||
|
||||
<!--
|
||||
The ID of the sandbox is returned.
|
||||
-->
|
||||
返回了沙盒的 ID。
|
||||
|
||||
<!--
|
||||
### Create a container
|
||||
|
||||
Using `crictl` to create a container is useful for debugging container runtimes.
|
||||
On a running Kubernetes cluster, the sandbox will eventually be stopped and
|
||||
deleted by the Kubelet.
|
||||
-->
|
||||
### 创建容器
|
||||
|
||||
用 `crictl` 创建容器对容器运行时排错很有帮助。
|
||||
在运行的 Kubernetes 集群中,沙盒会随机的被 kubelet 停止和删除。
|
||||
|
||||
<!--
|
||||
1. Pull a busybox image
|
||||
-->
|
||||
1. 拉取 busybox 镜像
|
||||
|
||||
```shell
|
||||
crictl pull busybox
|
||||
```
|
||||
```none
|
||||
Image is up to date for busybox@sha256:141c253bc4c3fd0a201d32dc1f493bcf3fff003b6df416dea4f41046e0f37d47
|
||||
```
|
||||
|
||||
<!--
|
||||
2. Create configs for the pod and the container:
|
||||
-->
|
||||
2. 创建 Pod 和容器的配置:
|
||||
|
||||
<!--
|
||||
**Pod config**:
|
||||
-->
|
||||
**Pod 配置**:
|
||||
|
||||
```json
|
||||
{
|
||||
"metadata": {
|
||||
"name": "nginx-sandbox",
|
||||
"namespace": "default",
|
||||
"attempt": 1,
|
||||
"uid": "hdishd83djaidwnduwk28bcsb"
|
||||
},
|
||||
"log_directory": "/tmp",
|
||||
"linux": {
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
<!--
|
||||
**Container config**:
|
||||
-->
|
||||
**容器配置**:
|
||||
|
||||
```json
|
||||
{
|
||||
"metadata": {
|
||||
"name": "busybox"
|
||||
},
|
||||
"image":{
|
||||
"image": "busybox"
|
||||
},
|
||||
"command": [
|
||||
"top"
|
||||
],
|
||||
"log_path":"busybox.log",
|
||||
"linux": {
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
<!--
|
||||
3. Create the container, passing the ID of the previously-created pod, the
|
||||
container config file, and the pod config file. The ID of the container is
|
||||
returned.
|
||||
-->
|
||||
3. 创建容器,传递先前创建的 Pod 的 ID、容器配置文件和 Pod 配置文件。返回容器的 ID。
|
||||
|
||||
```bash
|
||||
crictl create f84dd361f8dc51518ed291fbadd6db537b0496536c1d2d6c05ff943ce8c9a54f container-config.json pod-config.json
|
||||
```
|
||||
|
||||
<!--
|
||||
4. List all containers and verify that the newly-created container has its
|
||||
state set to `Created`.
|
||||
-->
|
||||
4. 查询所有容器并确认新创建的容器状态为 `Created`。
|
||||
|
||||
```bash
|
||||
crictl ps -a
|
||||
```
|
||||
<!--
|
||||
The output is similar to this:
|
||||
-->
|
||||
输出类似于这样:
|
||||
|
||||
```none
|
||||
CONTAINER ID IMAGE CREATED STATE NAME ATTEMPT
|
||||
3e025dd50a72d busybox 32 seconds ago Created busybox 0
|
||||
```
|
||||
|
||||
<!--
|
||||
### Start a container
|
||||
|
||||
To start a container, pass its ID to `crictl start`:
|
||||
-->
|
||||
### 启动容器
|
||||
|
||||
要启动容器,要将容器 ID 传给 `crictl start`:
|
||||
|
||||
```shell
|
||||
crictl start 3e025dd50a72d956c4f14881fbb5b1080c9275674e95fb67f965f6478a957d60
|
||||
```
|
||||
|
||||
<!--
|
||||
The output is similar to this:
|
||||
-->
|
||||
输出类似于这样:
|
||||
|
||||
```
|
||||
3e025dd50a72d956c4f14881fbb5b1080c9275674e95fb67f965f6478a957d60
|
||||
```
|
||||
|
||||
<!--
|
||||
Check the container has its state set to `Running`.
|
||||
-->
|
||||
确认容器的状态为 `Running`。
|
||||
|
||||
```shell
|
||||
crictl ps
|
||||
```
|
||||
|
||||
<!--
|
||||
The output is similar to this:
|
||||
-->
|
||||
输出类似于这样:
|
||||
|
||||
```
|
||||
CONTAINER ID IMAGE CREATED STATE NAME ATTEMPT
|
||||
3e025dd50a72d busybox About a minute ago Running busybox 0
|
||||
```
|
||||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
<!--
|
||||
* [Learn more about `crictl`](https://github.com/kubernetes-sigs/cri-tools).
|
||||
* [Map `docker` CLI commands to `crictl`](/docs/reference/tools/map-crictl-dockercli/).
|
||||
-->
|
||||
* [进一步了解 `crictl`](https://github.com/kubernetes-sigs/cri-tools).
|
||||
* [将 `docker` CLI 命令映射到 `crictl`](/zh/docs/reference/tools/map-crictl-dockercli/).
|
||||
|
|
@ -1,541 +0,0 @@
|
|||
:---
|
||||
reviewers:
|
||||
- janetkuo
|
||||
- thockin
|
||||
content_type: concept
|
||||
title: 应用自测与调试
|
||||
---
|
||||
|
||||
<!-- overview -->
|
||||
|
||||
<!--
|
||||
Once your application is running, you'll inevitably need to debug problems with it.
|
||||
Earlier we described how you can use `kubectl get pods` to retrieve simple status information about
|
||||
your pods. But there are a number of ways to get even more information about your application.
|
||||
-->
|
||||
运行应用时,不可避免的需要定位问题。
|
||||
前面我们介绍了如何使用 `kubectl get pods` 来查询 pod 的简单信息。
|
||||
除此之外,还有一系列的方法来获取应用的更详细信息。
|
||||
|
||||
<!-- body -->
|
||||
|
||||
<!--
|
||||
## Using `kubectl describe pod` to fetch details about pods
|
||||
-->
|
||||
## 使用 `kubectl describe pod` 命令获取 Pod 详情
|
||||
|
||||
<!--
|
||||
For this example we'll use a Deployment to create two pods, similar to the earlier example.
|
||||
-->
|
||||
与之前的例子类似,我们使用一个 Deployment 来创建两个 Pod。
|
||||
|
||||
{{< codenew file="application/nginx-with-request.yaml" >}}
|
||||
|
||||
<!--
|
||||
Create deployment by running following command:
|
||||
-->
|
||||
使用如下命令创建 Deployment:
|
||||
|
||||
```shell
|
||||
kubectl apply -f https://k8s.io/examples/application/nginx-with-request.yaml
|
||||
```
|
||||
|
||||
```
|
||||
deployment.apps/nginx-deployment created
|
||||
```
|
||||
|
||||
<!--
|
||||
Check pod status by following command:
|
||||
-->
|
||||
使用如下命令查看 Pod 状态:
|
||||
|
||||
```shell
|
||||
kubectl get pods
|
||||
```
|
||||
|
||||
```
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
nginx-deployment-1006230814-6winp 1/1 Running 0 11s
|
||||
nginx-deployment-1006230814-fmgu3 1/1 Running 0 11s
|
||||
```
|
||||
|
||||
<!--
|
||||
We can retrieve a lot more information about each of these pods using `kubectl describe pod`. For example:
|
||||
-->
|
||||
我们可以使用 `kubectl describe pod` 命令来查询每个 Pod 的更多信息,比如:
|
||||
|
||||
```shell
|
||||
kubectl describe pod nginx-deployment-1006230814-6winp
|
||||
```
|
||||
|
||||
```
|
||||
Name: nginx-deployment-1006230814-6winp
|
||||
Namespace: default
|
||||
Node: kubernetes-node-wul5/10.240.0.9
|
||||
Start Time: Thu, 24 Mar 2016 01:39:49 +0000
|
||||
Labels: app=nginx,pod-template-hash=1006230814
|
||||
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"nginx-deployment-1956810328","uid":"14e607e7-8ba1-11e7-b5cb-fa16" ...
|
||||
Status: Running
|
||||
IP: 10.244.0.6
|
||||
Controllers: ReplicaSet/nginx-deployment-1006230814
|
||||
Containers:
|
||||
nginx:
|
||||
Container ID: docker://90315cc9f513c724e9957a4788d3e625a078de84750f244a40f97ae355eb1149
|
||||
Image: nginx
|
||||
Image ID: docker://6f62f48c4e55d700cf3eb1b5e33fa051802986b77b874cc351cce539e5163707
|
||||
Port: 80/TCP
|
||||
QoS Tier:
|
||||
cpu: Guaranteed
|
||||
memory: Guaranteed
|
||||
Limits:
|
||||
cpu: 500m
|
||||
memory: 128Mi
|
||||
Requests:
|
||||
memory: 128Mi
|
||||
cpu: 500m
|
||||
State: Running
|
||||
Started: Thu, 24 Mar 2016 01:39:51 +0000
|
||||
Ready: True
|
||||
Restart Count: 0
|
||||
Environment: <none>
|
||||
Mounts:
|
||||
/var/run/secrets/kubernetes.io/serviceaccount from default-token-5kdvl (ro)
|
||||
Conditions:
|
||||
Type Status
|
||||
Initialized True
|
||||
Ready True
|
||||
PodScheduled True
|
||||
Volumes:
|
||||
default-token-4bcbi:
|
||||
Type: Secret (a volume populated by a Secret)
|
||||
SecretName: default-token-4bcbi
|
||||
Optional: false
|
||||
QoS Class: Guaranteed
|
||||
Node-Selectors: <none>
|
||||
Tolerations: <none>
|
||||
Events:
|
||||
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
|
||||
--------- -------- ----- ---- ------------- -------- ------ -------
|
||||
54s 54s 1 {default-scheduler } Normal Scheduled Successfully assigned nginx-deployment-1006230814-6winp to kubernetes-node-wul5
|
||||
54s 54s 1 {kubelet kubernetes-node-wul5} spec.containers{nginx} Normal Pulling pulling image "nginx"
|
||||
53s 53s 1 {kubelet kubernetes-node-wul5} spec.containers{nginx} Normal Pulled Successfully pulled image "nginx"
|
||||
53s 53s 1 {kubelet kubernetes-node-wul5} spec.containers{nginx} Normal Created Created container with docker id 90315cc9f513
|
||||
53s 53s 1 {kubelet kubernetes-node-wul5} spec.containers{nginx} Normal Started Started container with docker id 90315cc9f513
|
||||
```
|
||||
|
||||
<!--
|
||||
Here you can see configuration information about the container(s) and Pod (labels, resource requirements, etc.),
|
||||
as well as status information about the container(s) and Pod (state, readiness, restart count, events, etc.).
|
||||
-->
|
||||
这里可以看到容器和 Pod 的标签、资源需求等配置信息,还可以看到状态、就绪态、
|
||||
重启次数、事件等状态信息。
|
||||
|
||||
<!--
|
||||
The container state is one of Waiting, Running, or Terminated.
|
||||
Depending on the state, additional information will be provided -- here you can see that for a container in Running state, the system tells you when the container started.
|
||||
-->
|
||||
容器状态是 Waiting、Running 和 Terminated 之一。
|
||||
根据状态的不同,还有对应的额外的信息 —— 在这里你可以看到,
|
||||
对于处于运行状态的容器,系统会告诉你容器的启动时间。
|
||||
|
||||
<!--
|
||||
Ready tells you whether the container passed its last readiness probe.
|
||||
(In this case, the container does not have a readiness probe configured; the container is assumed to be ready if no readiness probe is configured.)
|
||||
-->
|
||||
Ready 指示是否通过了最后一个就绪态探测。
|
||||
(在本例中,容器没有配置就绪态探测;如果没有配置就绪态探测,则假定容器已经就绪。)
|
||||
|
||||
<!--
|
||||
Restart Count tells you how many times the container has been restarted;
|
||||
this information can be useful for detecting crash loops in containers that are configured with a restart policy of 'always.'
|
||||
-->
|
||||
Restart Count 告诉你容器已重启的次数;
|
||||
这些信息对于定位配置了 “Always” 重启策略的容器持续崩溃问题非常有用。
|
||||
|
||||
<!--
|
||||
Currently the only Condition associated with a Pod is the binary Ready condition,
|
||||
which indicates that the pod is able to service requests and should be added to the load balancing pools of all matching services.
|
||||
-->
|
||||
目前,唯一与 Pod 有关的状态是 Ready 状况,该状况表明 Pod 能够为请求提供服务,
|
||||
并且应该添加到相应服务的负载均衡池中。
|
||||
|
||||
<!--
|
||||
Lastly, you see a log of recent events related to your Pod.
|
||||
The system compresses multiple identical events by indicating the first and last time it was seen and the number of times it was seen.
|
||||
"From" indicates the component that is logging the event,
|
||||
"SubobjectPath" tells you which object (e.g. container within the pod) is being referred to,
|
||||
and "Reason" and "Message" tell you what happened.
|
||||
-->
|
||||
最后,你还可以看到与 Pod 相关的近期事件。
|
||||
系统通过指示第一次和最后一次看到事件以及看到该事件的次数来压缩多个相同的事件。
|
||||
“From” 标明记录事件的组件,
|
||||
“SubobjectPath” 告诉你引用了哪个对象(例如 Pod 中的容器),
|
||||
“Reason” 和 “Message” 告诉你发生了什么。
|
||||
|
||||
<!--
|
||||
## Example: debugging Pending Pods
|
||||
|
||||
A common scenario that you can detect using events is when you've created a Pod that won't fit on any node.
|
||||
For example, the Pod might request more resources than are free on any node,
|
||||
or it might specify a label selector that doesn't match any nodes.
|
||||
Let's say we created the previous Deployment with 5 replicas (instead of 2) and requesting 600 millicores instead of 500,
|
||||
on a four-node cluster where each (virtual) machine has 1 CPU.
|
||||
In that case one of the Pods will not be able to schedule.
|
||||
(Note that because of the cluster addon pods such as fluentd, skydns, etc., that run on each node, if we requested 1000 millicores then none of the Pods would be able to schedule.)
|
||||
-->
|
||||
## 例子: 调试 Pending 状态的 Pod
|
||||
|
||||
可以使用事件来调试的一个常见的场景是,你创建 Pod 无法被调度到任何节点。
|
||||
比如,Pod 请求的资源比较多,没有任何一个节点能够满足,或者它指定了一个标签,没有节点可匹配。
|
||||
假定我们创建之前的 Deployment 时指定副本数是 5(不再是 2),并且请求 600 毫核(不再是 500),
|
||||
对于一个 4 个节点的集群,若每个节点只有 1 个 CPU,这时至少有一个 Pod 不能被调度。
|
||||
(需要注意的是,其他集群插件 Pod,比如 fluentd、skydns 等等会在每个节点上运行,
|
||||
如果我们需求 1000 毫核,将不会有 Pod 会被调度。)
|
||||
|
||||
```shell
|
||||
kubectl get pods
|
||||
```
|
||||
|
||||
```
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
nginx-deployment-1006230814-6winp 1/1 Running 0 7m
|
||||
nginx-deployment-1006230814-fmgu3 1/1 Running 0 7m
|
||||
nginx-deployment-1370807587-6ekbw 1/1 Running 0 1m
|
||||
nginx-deployment-1370807587-fg172 0/1 Pending 0 1m
|
||||
nginx-deployment-1370807587-fz9sd 0/1 Pending 0 1m
|
||||
```
|
||||
|
||||
<!--
|
||||
To find out why the nginx-deployment-1370807587-fz9sd pod is not running, we can use `kubectl describe pod` on the pending Pod and look at its events:
|
||||
-->
|
||||
为了查找 Pod nginx-deployment-1370807587-fz9sd 没有运行的原因,我们可以使用
|
||||
`kubectl describe pod` 命令描述 Pod,查看其事件:
|
||||
|
||||
```shell
|
||||
kubectl describe pod nginx-deployment-1370807587-fz9sd
|
||||
```
|
||||
|
||||
```
|
||||
Name: nginx-deployment-1370807587-fz9sd
|
||||
Namespace: default
|
||||
Node: /
|
||||
Labels: app=nginx,pod-template-hash=1370807587
|
||||
Status: Pending
|
||||
IP:
|
||||
Controllers: ReplicaSet/nginx-deployment-1370807587
|
||||
Containers:
|
||||
nginx:
|
||||
Image: nginx
|
||||
Port: 80/TCP
|
||||
QoS Tier:
|
||||
memory: Guaranteed
|
||||
cpu: Guaranteed
|
||||
Limits:
|
||||
cpu: 1
|
||||
memory: 128Mi
|
||||
Requests:
|
||||
cpu: 1
|
||||
memory: 128Mi
|
||||
Environment Variables:
|
||||
Volumes:
|
||||
default-token-4bcbi:
|
||||
Type: Secret (a volume populated by a Secret)
|
||||
SecretName: default-token-4bcbi
|
||||
Events:
|
||||
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
|
||||
--------- -------- ----- ---- ------------- -------- ------ -------
|
||||
1m 48s 7 {default-scheduler } Warning FailedScheduling pod (nginx-deployment-1370807587-fz9sd) failed to fit in any node
|
||||
fit failure on node (kubernetes-node-6ta5): Node didn't have enough resource: CPU, requested: 1000, used: 1420, capacity: 2000
|
||||
fit failure on node (kubernetes-node-wul5): Node didn't have enough resource: CPU, requested: 1000, used: 1100, capacity: 2000
|
||||
```
|
||||
|
||||
<!--
|
||||
Here you can see the event generated by the scheduler saying that the Pod failed to schedule for reason `FailedScheduling` (and possibly others).
|
||||
The message tells us that there were not enough resources for the Pod on any of the nodes.
|
||||
-->
|
||||
这里你可以看到由调度器记录的事件,它表明了 Pod 不能被调度的原因是 `FailedScheduling`(也可能是其他值)。
|
||||
其 message 部分表明没有任何节点拥有足够多的资源。
|
||||
|
||||
<!--
|
||||
To correct this situation, you can use `kubectl scale` to update your Deployment to specify four or fewer replicas. (Or you could leave the one Pod pending, which is harmless.)
|
||||
-->
|
||||
要纠正这种情况,可以使用 `kubectl scale` 更新 Deployment,以指定 4 个或更少的副本。
|
||||
(或者你可以让 Pod 继续保持这个状态,这是无害的。)
|
||||
|
||||
<!--
|
||||
Events such as the ones you saw at the end of `kubectl describe pod` are persisted in etcd and
|
||||
provide high-level information on what is happening in the cluster.
|
||||
To list all events you can use
|
||||
-->
|
||||
你在 `kubectl describe pod` 结尾处看到的事件都保存在 etcd 中,
|
||||
并提供关于集群中正在发生的事情的高级信息。
|
||||
如果需要列出所有事件,可使用命令:
|
||||
|
||||
```shell
|
||||
kubectl get events
|
||||
```
|
||||
|
||||
<!--
|
||||
but you have to remember that events are namespaced.
|
||||
This means that if you're interested in events for some namespaced object
|
||||
(e.g. what happened with Pods in namespace `my-namespace`) you need to explicitly provide a namespace to the command:
|
||||
-->
|
||||
但是,需要注意的是,事件是区分名字空间的。
|
||||
如果你对某些名字空间域的对象(比如 `my-namespace` 名字下的 Pod)的事件感兴趣,
|
||||
你需要显式地在命令行中指定名字空间:
|
||||
|
||||
```shell
|
||||
kubectl get events --namespace=my-namespace
|
||||
```
|
||||
|
||||
<!--
|
||||
To see events from all namespaces, you can use the `--all-namespaces` argument.
|
||||
-->
|
||||
查看所有 namespace 的事件,可使用 `--all-namespaces` 参数。
|
||||
|
||||
<!--
|
||||
In addition to `kubectl describe pod`, another way to get extra information about a pod (beyond what is provided by `kubectl get pod`) is
|
||||
to pass the `-o yaml` output format flag to `kubectl get pod`.
|
||||
This will give you, in YAML format, even more information than `kubectl describe pod`--essentially all of the information the system has about the Pod.
|
||||
Here you will see things like annotations (which are key-value metadata without the label restrictions, that is used internally by Kubernetes system components),
|
||||
restart policy, ports, and volumes.
|
||||
-->
|
||||
除了 `kubectl describe pod` 以外,另一种获取 Pod 额外信息(除了 `kubectl get pod`)的方法
|
||||
是给 `kubectl get pod` 增加 `-o yaml` 输出格式参数。
|
||||
该命令将以 YAML 格式为你提供比 `kubectl describe pod` 更多的信息 —— 实际上是系统拥有的关于 Pod 的所有信息。
|
||||
在这里,你将看到注解(没有标签限制的键值元数据,由 Kubernetes 系统组件在内部使用)、
|
||||
重启策略、端口和卷等。
|
||||
|
||||
```shell
|
||||
kubectl get pod nginx-deployment-1006230814-6winp -o yaml
|
||||
```
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
annotations:
|
||||
kubernetes.io/created-by: |
|
||||
{"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"nginx-deployment-1006230814","uid":"4c84c175-f161-11e5-9a78-42010af00005","apiVersion":"extensions","resourceVersion":"133434"}}
|
||||
creationTimestamp: 2016-03-24T01:39:50Z
|
||||
generateName: nginx-deployment-1006230814-
|
||||
labels:
|
||||
app: nginx
|
||||
pod-template-hash: "1006230814"
|
||||
name: nginx-deployment-1006230814-6winp
|
||||
namespace: default
|
||||
resourceVersion: "133447"
|
||||
uid: 4c879808-f161-11e5-9a78-42010af00005
|
||||
spec:
|
||||
containers:
|
||||
- image: nginx
|
||||
imagePullPolicy: Always
|
||||
name: nginx
|
||||
ports:
|
||||
- containerPort: 80
|
||||
protocol: TCP
|
||||
resources:
|
||||
limits:
|
||||
cpu: 500m
|
||||
memory: 128Mi
|
||||
requests:
|
||||
cpu: 500m
|
||||
memory: 128Mi
|
||||
terminationMessagePath: /dev/termination-log
|
||||
volumeMounts:
|
||||
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
|
||||
name: default-token-4bcbi
|
||||
readOnly: true
|
||||
dnsPolicy: ClusterFirst
|
||||
nodeName: kubernetes-node-wul5
|
||||
restartPolicy: Always
|
||||
securityContext: {}
|
||||
serviceAccount: default
|
||||
serviceAccountName: default
|
||||
terminationGracePeriodSeconds: 30
|
||||
volumes:
|
||||
- name: default-token-4bcbi
|
||||
secret:
|
||||
secretName: default-token-4bcbi
|
||||
status:
|
||||
conditions:
|
||||
- lastProbeTime: null
|
||||
lastTransitionTime: 2016-03-24T01:39:51Z
|
||||
status: "True"
|
||||
type: Ready
|
||||
containerStatuses:
|
||||
- containerID: docker://90315cc9f513c724e9957a4788d3e625a078de84750f244a40f97ae355eb1149
|
||||
image: nginx
|
||||
imageID: docker://6f62f48c4e55d700cf3eb1b5e33fa051802986b77b874cc351cce539e5163707
|
||||
lastState: {}
|
||||
name: nginx
|
||||
ready: true
|
||||
restartCount: 0
|
||||
state:
|
||||
running:
|
||||
startedAt: 2016-03-24T01:39:51Z
|
||||
hostIP: 10.240.0.9
|
||||
phase: Running
|
||||
podIP: 10.244.0.6
|
||||
startTime: 2016-03-24T01:39:49Z
|
||||
```
|
||||
|
||||
<!--
|
||||
## Example: debugging a down/unreachable node
|
||||
|
||||
Sometimes when debugging it can be useful to look at the status of a node
|
||||
-- for example, because you've noticed strange behavior of a Pod that's running on the node,
|
||||
or to find out why a Pod won't schedule onto the node.
|
||||
As with Pods, you can use `kubectl describe node` and `kubectl get node -o yaml` to
|
||||
retrieve detailed information about nodes.
|
||||
For example, here's what you'll see if a node is down
|
||||
(disconnected from the network, or kubelet dies and won't restart, etc.).
|
||||
Notice the events that show the node is NotReady, and
|
||||
also notice that the pods are no longer running
|
||||
(they are evicted after five minutes of NotReady status).
|
||||
-->
|
||||
## 示例:调试宕机或无法联系的节点
|
||||
|
||||
有时候,在调试时,查看节点的状态是很有用的 —— 例如,因为你已经注意到节点上运行的 Pod 的奇怪行为,
|
||||
或者想了解为什么 Pod 不会调度到节点上。
|
||||
与 Pod 一样,你可以使用 `kubectl describe node` 和 `kubectl get node -o yaml` 来查询节点的详细信息。
|
||||
例如,如果某个节点宕机(与网络断开连接,或者 kubelet 挂掉无法重新启动等等),你将看到以下情况。
|
||||
请注意显示节点未就绪的事件,也请注意 Pod 不再运行(它们在5分钟未就绪状态后被驱逐)。
|
||||
|
||||
```shell
|
||||
kubectl get nodes
|
||||
```
|
||||
|
||||
```
|
||||
NAME STATUS ROLES AGE VERSION
|
||||
kubernetes-node-861h NotReady <none> 1h v1.13.0
|
||||
kubernetes-node-bols Ready <none> 1h v1.13.0
|
||||
kubernetes-node-st6x Ready <none> 1h v1.13.0
|
||||
kubernetes-node-unaj Ready <none> 1h v1.13.0
|
||||
```
|
||||
|
||||
```shell
|
||||
kubectl describe node kubernetes-node-861h
|
||||
```
|
||||
|
||||
```none
|
||||
Name: kubernetes-node-861h
|
||||
Role
|
||||
Labels: kubernetes.io/arch=amd64
|
||||
kubernetes.io/os=linux
|
||||
kubernetes.io/hostname=kubernetes-node-861h
|
||||
Annotations: node.alpha.kubernetes.io/ttl=0
|
||||
volumes.kubernetes.io/controller-managed-attach-detach=true
|
||||
Taints: <none>
|
||||
CreationTimestamp: Mon, 04 Sep 2017 17:13:23 +0800
|
||||
Phase:
|
||||
Conditions:
|
||||
Type Status LastHeartbeatTime LastTransitionTime Reason Message
|
||||
---- ------ ----------------- ------------------ ------ -------
|
||||
OutOfDisk Unknown Fri, 08 Sep 2017 16:04:28 +0800 Fri, 08 Sep 2017 16:20:58 +0800 NodeStatusUnknown Kubelet stopped posting node status.
|
||||
MemoryPressure Unknown Fri, 08 Sep 2017 16:04:28 +0800 Fri, 08 Sep 2017 16:20:58 +0800 NodeStatusUnknown Kubelet stopped posting node status.
|
||||
DiskPressure Unknown Fri, 08 Sep 2017 16:04:28 +0800 Fri, 08 Sep 2017 16:20:58 +0800 NodeStatusUnknown Kubelet stopped posting node status.
|
||||
Ready Unknown Fri, 08 Sep 2017 16:04:28 +0800 Fri, 08 Sep 2017 16:20:58 +0800 NodeStatusUnknown Kubelet stopped posting node status.
|
||||
Addresses: 10.240.115.55,104.197.0.26
|
||||
Capacity:
|
||||
cpu: 2
|
||||
hugePages: 0
|
||||
memory: 4046788Ki
|
||||
pods: 110
|
||||
Allocatable:
|
||||
cpu: 1500m
|
||||
hugePages: 0
|
||||
memory: 1479263Ki
|
||||
pods: 110
|
||||
System Info:
|
||||
Machine ID: 8e025a21a4254e11b028584d9d8b12c4
|
||||
System UUID: 349075D1-D169-4F25-9F2A-E886850C47E3
|
||||
Boot ID: 5cd18b37-c5bd-4658-94e0-e436d3f110e0
|
||||
Kernel Version: 4.4.0-31-generic
|
||||
OS Image: Debian GNU/Linux 8 (jessie)
|
||||
Operating System: linux
|
||||
Architecture: amd64
|
||||
Container Runtime Version: docker://1.12.5
|
||||
Kubelet Version: v1.6.9+a3d1dfa6f4335
|
||||
Kube-Proxy Version: v1.6.9+a3d1dfa6f4335
|
||||
ExternalID: 15233045891481496305
|
||||
Non-terminated Pods: (9 in total)
|
||||
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
|
||||
--------- ---- ------------ ---------- --------------- -------------
|
||||
......
|
||||
Allocated resources:
|
||||
(Total limits may be over 100 percent, i.e., overcommitted.)
|
||||
CPU Requests CPU Limits Memory Requests Memory Limits
|
||||
------------ ---------- --------------- -------------
|
||||
900m (60%) 2200m (146%) 1009286400 (66%) 5681286400 (375%)
|
||||
Events: <none>
|
||||
```
|
||||
|
||||
```shell
|
||||
kubectl get node kubernetes-node-861h -o yaml
|
||||
```
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Node
|
||||
metadata:
|
||||
creationTimestamp: 2015-07-10T21:32:29Z
|
||||
labels:
|
||||
kubernetes.io/hostname: kubernetes-node-861h
|
||||
name: kubernetes-node-861h
|
||||
resourceVersion: "757"
|
||||
selfLink: /api/v1/nodes/kubernetes-node-861h
|
||||
uid: 2a69374e-274b-11e5-a234-42010af0d969
|
||||
spec:
|
||||
externalID: "15233045891481496305"
|
||||
podCIDR: 10.244.0.0/24
|
||||
providerID: gce://striped-torus-760/us-central1-b/kubernetes-node-861h
|
||||
status:
|
||||
addresses:
|
||||
- address: 10.240.115.55
|
||||
type: InternalIP
|
||||
- address: 104.197.0.26
|
||||
type: ExternalIP
|
||||
capacity:
|
||||
cpu: "1"
|
||||
memory: 3800808Ki
|
||||
pods: "100"
|
||||
conditions:
|
||||
- lastHeartbeatTime: 2015-07-10T21:34:32Z
|
||||
lastTransitionTime: 2015-07-10T21:35:15Z
|
||||
reason: Kubelet stopped posting node status.
|
||||
status: Unknown
|
||||
type: Ready
|
||||
nodeInfo:
|
||||
bootID: 4e316776-b40d-4f78-a4ea-ab0d73390897
|
||||
containerRuntimeVersion: docker://Unknown
|
||||
kernelVersion: 3.16.0-0.bpo.4-amd64
|
||||
kubeProxyVersion: v0.21.1-185-gffc5a86098dc01
|
||||
kubeletVersion: v0.21.1-185-gffc5a86098dc01
|
||||
machineID: ""
|
||||
osImage: Debian GNU/Linux 7 (wheezy)
|
||||
systemUUID: ABE5F6B4-D44B-108B-C46A-24CCE16C8B6E
|
||||
```
|
||||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
<!--
|
||||
Learn about additional debugging tools, including:
|
||||
-->
|
||||
了解更多的调试工具:
|
||||
|
||||
<!--
|
||||
* [Logging](/docs/concepts/cluster-administration/logging/)
|
||||
* [Monitoring](/docs/tasks/debug-application-cluster/resource-usage-monitoring/)
|
||||
* [Getting into containers via `exec`](/docs/tasks/debug-application-cluster/get-shell-running-container/)
|
||||
* [Connecting to containers via proxies](/docs/tasks/extend-kubernetes/http-proxy-access-api/)
|
||||
* [Connecting to containers via port forwarding](/docs/tasks/access-application-cluster/port-forward-access-application-cluster/)
|
||||
* [Inspect Kubernetes node with crictl](/docs/tasks/debug-application-cluster/crictl/)
|
||||
-->
|
||||
* [日志](/zh/docs/concepts/cluster-administration/logging/)
|
||||
* [监控](/zh/docs/tasks/debug-application-cluster/resource-usage-monitoring/)
|
||||
* [使用 `exec` 进入容器](/zh/docs/tasks/debug-application-cluster/get-shell-running-container/)
|
||||
* [使用代理连接容器](/zh/docs/tasks/extend-kubernetes/http-proxy-access-api/)
|
||||
* [使用端口转发连接容器](/zh/docs/tasks/access-application-cluster/port-forward-access-application-cluster/)
|
||||
* [使用 crictl 检查节点](/zh/docs/tasks/debug-application-cluster/crictl/)
|
||||
|
|
@ -1,252 +0,0 @@
|
|||
---
|
||||
title: 集群故障排查
|
||||
content_type: concept
|
||||
---
|
||||
<!--
|
||||
reviewers:
|
||||
- davidopp
|
||||
title: Troubleshoot Clusters
|
||||
content_type: concept
|
||||
-->
|
||||
|
||||
<!-- overview -->
|
||||
|
||||
<!--
|
||||
This doc is about cluster troubleshooting; we assume you have already ruled out your application as the root cause of the
|
||||
problem you are experiencing. See
|
||||
the [application troubleshooting guide](/docs/tasks/debug-application-cluster/debug-application) for tips on application debugging.
|
||||
You may also visit [troubleshooting document](/docs/tasks/debug-application-cluster/troubleshooting/) for more information.
|
||||
-->
|
||||
本篇文档是介绍集群故障排查的;我们假设对于你碰到的问题,你已经排除了是由应用程序造成的。
|
||||
对于应用的调试,请参阅
|
||||
[应用故障排查指南](/zh/docs/tasks/debug-application-cluster/debug-application/)。
|
||||
你也可以访问[故障排查](/zh/docs/tasks/debug-application-cluster/troubleshooting/)
|
||||
来获取更多的信息。
|
||||
|
||||
<!-- body -->
|
||||
|
||||
<!--
|
||||
## Listing your cluster
|
||||
|
||||
The first thing to debug in your cluster is if your nodes are all registered correctly.
|
||||
|
||||
Run
|
||||
-->
|
||||
## 列举集群节点
|
||||
|
||||
调试的第一步是查看所有的节点是否都已正确注册。
|
||||
|
||||
运行
|
||||
|
||||
```shell
|
||||
kubectl get nodes
|
||||
```
|
||||
|
||||
<!--
|
||||
And verify that all of the nodes you expect to see are present and that they are all in the `Ready` state.
|
||||
|
||||
To get detailed information about the overall health of your cluster, you can run:
|
||||
-->
|
||||
验证你所希望看见的所有节点都能够显示出来,并且都处于 `Ready` 状态。
|
||||
|
||||
为了了解你的集群的总体健康状况详情,你可以运行:
|
||||
|
||||
```shell
|
||||
kubectl cluster-info dump
|
||||
```
|
||||
|
||||
<!--
|
||||
## Looking at logs
|
||||
|
||||
For now, digging deeper into the cluster requires logging into the relevant machines. Here are the locations
|
||||
of the relevant log files. (note that on systemd-based systems, you may need to use `journalctl` instead)
|
||||
-->
|
||||
## 查看日志
|
||||
|
||||
到这里,挖掘出集群更深层的信息就需要登录到相关的机器上。下面是相关日志文件所在的位置。
|
||||
(注意,对于基于 systemd 的系统,你可能需要使用`journalctl`)。
|
||||
|
||||
<!--
|
||||
### Master
|
||||
|
||||
* `/var/log/kube-apiserver.log` - API Server, responsible for serving the API
|
||||
* `/var/log/kube-scheduler.log` - Scheduler, responsible for making scheduling decisions
|
||||
* `/var/log/kube-controller-manager.log` - Controller that manages replication controllers
|
||||
-->
|
||||
### 主控节点
|
||||
|
||||
* `/var/log/kube-apiserver.log` - API 服务器, 提供API服务
|
||||
* `/var/log/kube-scheduler.log` - 调度器, 负责产生调度决策
|
||||
* `/var/log/kube-controller-manager.log` - 管理副本控制器的控制器
|
||||
|
||||
<!--
|
||||
### Worker Nodes
|
||||
|
||||
* `/var/log/kubelet.log` - Kubelet, responsible for running containers on the node
|
||||
* `/var/log/kube-proxy.log` - Kube Proxy, responsible for service load balancing
|
||||
-->
|
||||
|
||||
### 工作节点
|
||||
|
||||
* `/var/log/kubelet.log` - `kubelet`,负责在节点运行容器
|
||||
* `/var/log/kube-proxy.log` - `kube-proxy`, 负责服务的负载均衡
|
||||
|
||||
|
||||
<!--
|
||||
## A general overview of cluster failure modes
|
||||
|
||||
This is an incomplete list of things that could go wrong, and how to adjust your cluster setup to mitigate the problems.
|
||||
-->
|
||||
## 集群故障模式的一般性概述
|
||||
|
||||
下面是一个不完整的列表,列举了一些可能的出错场景,以及通过调整集群配置来解决相关问题的方法。
|
||||
|
||||
<!--
|
||||
### Root causes:
|
||||
|
||||
- VM(s) shutdown
|
||||
- Network partition within cluster, or between cluster and users
|
||||
- Crashes in Kubernetes software
|
||||
- Data loss or unavailability of persistent storage (e.g. GCE PD or AWS EBS volume)
|
||||
- Operator error, for example misconfigured Kubernetes software or application software
|
||||
-->
|
||||
### 根本原因
|
||||
|
||||
- VM(s) 关机
|
||||
- 集群之间,或者集群和用户之间网络分裂
|
||||
- Kubernetes 软件本身崩溃
|
||||
- 数据丢失或者持久化存储不可用(如:GCE PD 或 AWS EBS 卷)
|
||||
- 操作错误,如:Kubernetes 或者应用程序配置错误
|
||||
|
||||
<!--
|
||||
### Specific scenarios:
|
||||
|
||||
- Apiserver VM shutdown or apiserver crashing
|
||||
- Results
|
||||
- unable to stop, update, or start new pods, services, replication controller
|
||||
- existing pods and services should continue to work normally, unless they depend on the Kubernetes API
|
||||
- Apiserver backing storage lost
|
||||
- Results
|
||||
- apiserver should fail to come up
|
||||
- kubelets will not be able to reach it but will continue to run the same pods and provide the same service proxying
|
||||
- manual recovery or recreation of apiserver state necessary before apiserver is restarted
|
||||
-->
|
||||
### 具体情况
|
||||
|
||||
- API 服务器所在的 VM 关机或者 API 服务器崩溃
|
||||
- 结果
|
||||
- 不能停止、更新或者启动新的 Pod、服务或副本控制器
|
||||
- 现有的 Pod 和服务在不依赖 Kubernetes API 的情况下应该能继续正常工作
|
||||
- API 服务器的后端存储丢失
|
||||
- 结果
|
||||
- API 服务器应该不能启动
|
||||
- kubelet 将不能访问 API 服务器,但是能够继续运行之前的 Pod 和提供相同的服务代理
|
||||
- 在 API 服务器重启之前,需要手动恢复或者重建 API 服务器的状态
|
||||
<!--
|
||||
- Supporting services (node controller, replication controller manager, scheduler, etc) VM shutdown or crashes
|
||||
- currently those are colocated with the apiserver, and their unavailability has similar consequences as apiserver
|
||||
- in future, these will be replicated as well and may not be co-located
|
||||
- they do not have their own persistent state
|
||||
- Individual node (VM or physical machine) shuts down
|
||||
- Results
|
||||
- pods on that Node stop running
|
||||
- Network partition
|
||||
- Results
|
||||
- partition A thinks the nodes in partition B are down; partition B thinks the apiserver is down. (Assuming the master VM ends up in partition A.)
|
||||
-->
|
||||
- Kubernetes 服务组件(节点控制器、副本控制器管理器、调度器等)所在的 VM 关机或者崩溃
|
||||
- 当前,这些控制器是和 API 服务器在一起运行的,它们不可用的现象是与 API 服务器类似的
|
||||
- 将来,这些控制器也会复制为多份,并且可能不在运行于同一节点上
|
||||
- 它们没有自己的持久状态
|
||||
- 单个节点(VM 或者物理机)关机
|
||||
- 结果
|
||||
- 此节点上的所有 Pod 都停止运行
|
||||
- 网络分裂
|
||||
- 结果
|
||||
- 分区 A 认为分区 B 中所有的节点都已宕机;分区 B 认为 API 服务器宕机
|
||||
(假定主控节点所在的 VM 位于分区 A 内)。
|
||||
<!--
|
||||
- Kubelet software fault
|
||||
- Results
|
||||
- crashing kubelet cannot start new pods on the node
|
||||
- kubelet might delete the pods or not
|
||||
- node marked unhealthy
|
||||
- replication controllers start new pods elsewhere
|
||||
- Cluster operator error
|
||||
- Results
|
||||
- loss of pods, services, etc
|
||||
- lost of apiserver backing store
|
||||
- users unable to read API
|
||||
- etc.
|
||||
-->
|
||||
- kubelet 软件故障
|
||||
- 结果
|
||||
- 崩溃的 kubelet 就不能在其所在的节点上启动新的 Pod
|
||||
- kubelet 可能删掉 Pod 或者不删
|
||||
- 节点被标识为非健康态
|
||||
- 副本控制器会在其它的节点上启动新的 Pod
|
||||
- 集群操作错误
|
||||
- 结果
|
||||
- 丢失 Pod 或服务等等
|
||||
- 丢失 API 服务器的后端存储
|
||||
- 用户无法读取API
|
||||
- 等等
|
||||
|
||||
<!--
|
||||
### Mitigations:
|
||||
|
||||
- Action: Use IaaS provider's automatic VM restarting feature for IaaS VMs
|
||||
- Mitigates: Apiserver VM shutdown or apiserver crashing
|
||||
- Mitigates: Supporting services VM shutdown or crashes
|
||||
|
||||
- Action: Use IaaS providers reliable storage (e.g. GCE PD or AWS EBS volume) for VMs with apiserver+etcd
|
||||
- Mitigates: Apiserver backing storage lost
|
||||
|
||||
- Action: Use [high-availability](/docs/setup/production-environment/tools/kubeadm/high-availability/) configuration
|
||||
- Mitigates: Control plane node shutdown or control plane components (scheduler, API server, controller-manager) crashing
|
||||
- Will tolerate one or more simultaneous node or component failures
|
||||
- Mitigates: API server backing storage (i.e., etcd's data directory) lost
|
||||
- Assumes HA (highly-available) etcd configuration
|
||||
-->
|
||||
### 缓解措施
|
||||
|
||||
- 措施:对于 IaaS 上的 VMs,使用 IaaS 的自动 VM 重启功能
|
||||
- 缓解:API 服务器 VM 关机或 API 服务器崩溃
|
||||
- 缓解:Kubernetes 服务组件所在的 VM 关机或崩溃
|
||||
|
||||
- 措施: 对于运行 API 服务器和 etcd 的 VM,使用 IaaS 提供的可靠的存储(例如 GCE PD 或者 AWS EBS 卷)
|
||||
- 缓解:API 服务器后端存储的丢失
|
||||
|
||||
- 措施:使用[高可用性](/zh/docs/setup/production-environment/tools/kubeadm/high-availability/)的配置
|
||||
- 缓解:主控节点 VM 关机或者主控节点组件(调度器、API 服务器、控制器管理器)崩馈
|
||||
- 将容许一个或多个节点或组件同时出现故障
|
||||
- 缓解:API 服务器后端存储(例如 etcd 的数据目录)丢失
|
||||
- 假定你使用了高可用的 etcd 配置
|
||||
|
||||
<!--
|
||||
- Action: Snapshot apiserver PDs/EBS-volumes periodically
|
||||
- Mitigates: Apiserver backing storage lost
|
||||
- Mitigates: Some cases of operator error
|
||||
- Mitigates: Some cases of Kubernetes software fault
|
||||
|
||||
- Action: use replication controller and services in front of pods
|
||||
- Mitigates: Node shutdown
|
||||
- Mitigates: Kubelet software fault
|
||||
|
||||
- Action: applications (containers) designed to tolerate unexpected restarts
|
||||
- Mitigates: Node shutdown
|
||||
- Mitigates: Kubelet software fault
|
||||
-->
|
||||
- 措施:定期对 API 服务器的 PDs/EBS 卷执行快照操作
|
||||
- 缓解:API 服务器后端存储丢失
|
||||
- 缓解:一些操作错误的场景
|
||||
- 缓解:一些 Kubernetes 软件本身故障的场景
|
||||
|
||||
- 措施:在 Pod 的前面使用副本控制器或服务
|
||||
- 缓解:节点关机
|
||||
- 缓解:kubelet 软件故障
|
||||
|
||||
- 措施:应用(容器)设计成容许异常重启
|
||||
- 缓解:节点关机
|
||||
- 缓解:kubelet 软件故障
|
||||
|
|
@ -1,525 +0,0 @@
|
|||
---
|
||||
title: 调试运行中的 Pod
|
||||
content_type: task
|
||||
---
|
||||
|
||||
<!-- overview -->
|
||||
<!--
|
||||
This page explains how to debug Pods running (or crashing) on a Node.
|
||||
-->
|
||||
本页解释如何在节点上调试运行中(或崩溃)的 Pod。
|
||||
|
||||
## {{% heading "prerequisites" %}}
|
||||
|
||||
<!--
|
||||
* Your {{< glossary_tooltip text="Pod" term_id="pod" >}} should already be
|
||||
scheduled and running. If your Pod is not yet running, start with [Troubleshoot
|
||||
Applications](/docs/tasks/debug-application-cluster/debug-application/).
|
||||
* For some of the advanced debugging steps you need to know on which Node the
|
||||
Pod is running and have shell access to run commands on that Node. You don't
|
||||
need that access to run the standard debug steps that use `kubectl`.
|
||||
-->
|
||||
* 你的 {{< glossary_tooltip text="Pod" term_id="pod" >}} 应该已经被调度并正在运行中,
|
||||
如果你的 Pod 还没有运行,请参阅
|
||||
[应用问题排查](/zh/docs/tasks/debug-application-cluster/debug-application/)。
|
||||
|
||||
* 对于一些高级调试步骤,你应该知道 Pod 具体运行在哪个节点上,在该节点上有权限去运行一些命令。
|
||||
你不需要任何访问权限就可以使用 `kubectl` 去运行一些标准调试步骤。
|
||||
|
||||
<!-- steps -->
|
||||
|
||||
<!--
|
||||
## Examining pod logs {#examine-pod-logs}
|
||||
|
||||
First, look at the logs of the affected container:
|
||||
|
||||
```shell
|
||||
kubectl logs ${POD_NAME} ${CONTAINER_NAME}
|
||||
```
|
||||
|
||||
If your container has previously crashed, you can access the previous container's crash log with:
|
||||
|
||||
```shell
|
||||
kubectl logs --previous ${POD_NAME} ${CONTAINER_NAME}
|
||||
```
|
||||
-->
|
||||
## 检查 Pod 的日志 {#examine-pod-logs}
|
||||
|
||||
首先,查看受到影响的容器的日志:
|
||||
|
||||
```shell
|
||||
kubectl logs ${POD_NAME} ${CONTAINER_NAME}
|
||||
```
|
||||
|
||||
如果你的容器之前崩溃过,你可以通过下面命令访问之前容器的崩溃日志:
|
||||
|
||||
```shell
|
||||
kubectl logs --previous ${POD_NAME} ${CONTAINER_NAME}
|
||||
```
|
||||
|
||||
<!--
|
||||
## Debugging with container exec {#container-exec}
|
||||
|
||||
```shell
|
||||
kubectl exec ${POD_NAME} -c ${CONTAINER_NAME} -- ${CMD} ${ARG1} ${ARG2} ... ${ARGN}
|
||||
```
|
||||
|
||||
As an example, to look at the logs from a running Cassandra pod, you might run
|
||||
|
||||
```shell
|
||||
kubectl exec cassandra -- cat /var/log/cassandra/system.log
|
||||
```
|
||||
|
||||
You can run a shell that's connected to your terminal using the `-i` and `-t`
|
||||
arguments to `kubectl exec`, for example:
|
||||
|
||||
```shell
|
||||
kubectl exec -it cassandra -- sh
|
||||
```
|
||||
|
||||
For more details, see [Get a Shell to a Running Container](
|
||||
/docs/tasks/debug-application-cluster/get-shell-running-container/).
|
||||
-->
|
||||
## 使用容器 exec 进行调试 {#container-exec}
|
||||
|
||||
如果 {{< glossary_tooltip text="容器镜像" term_id="image" >}} 包含调试程序,
|
||||
比如从 Linux 和 Windows 操作系统基础镜像构建的镜像,你可以使用 `kubectl exec` 命令
|
||||
在特定的容器中运行一些命令:
|
||||
|
||||
```shell
|
||||
kubectl exec ${POD_NAME} -c ${CONTAINER_NAME} -- ${CMD} ${ARG1} ${ARG2} ... ${ARGN}
|
||||
```
|
||||
{{< note >}}
|
||||
`-c ${CONTAINER_NAME}` 是可选择的。如果Pod中仅包含一个容器,就可以忽略它。
|
||||
{{< /note >}}
|
||||
|
||||
例如,要查看正在运行的 Cassandra pod中的日志,可以运行:
|
||||
|
||||
```shell
|
||||
kubectl exec cassandra -- cat /var/log/cassandra/system.log
|
||||
```
|
||||
|
||||
你可以在 `kubectl exec` 命令后面加上 `-i` 和 `-t` 来运行一个连接到你的终端的 Shell,比如:
|
||||
|
||||
```shell
|
||||
kubectl exec -it cassandra -- sh
|
||||
```
|
||||
|
||||
若要了解更多内容,可查看[获取正在运行容器的 Shell](/zh/docs/tasks/debug-application-cluster/get-shell-running-container/)。
|
||||
|
||||
<!--
|
||||
## Debugging with an ephemeral debug container {#ephemeral-container}
|
||||
|
||||
{{< feature-state state="beta" for_k8s_version="v1.23" >}}
|
||||
|
||||
{{< glossary_tooltip text="Ephemeral containers" term_id="ephemeral-container" >}}
|
||||
are useful for interactive troubleshooting when `kubectl exec` is insufficient
|
||||
because a container has crashed or a container image doesn't include debugging
|
||||
utilities, such as with [distroless images](
|
||||
https://github.com/GoogleContainerTools/distroless).
|
||||
-->
|
||||
## 使用临时调试容器来进行调试 {#ephemeral-container}
|
||||
|
||||
{{< feature-state state="beta" for_k8s_version="v1.23" >}}
|
||||
|
||||
当由于容器崩溃或容器镜像不包含调试程序(例如[无发行版镜像](https://github.com/GoogleContainerTools/distroless)等)
|
||||
而导致 `kubectl exec` 无法运行时,{{< glossary_tooltip text="临时容器" term_id="ephemeral-container" >}}对于排除交互式故障很有用。
|
||||
|
||||
<!--
|
||||
### Example debugging using ephemeral containers {#ephemeral-container-example}
|
||||
|
||||
You can use the `kubectl debug` command to add ephemeral containers to a
|
||||
running Pod. First, create a pod for the example:
|
||||
|
||||
```shell
|
||||
kubectl run ephemeral-demo --image=k8s.gcr.io/pause:3.1 --restart=Never
|
||||
```
|
||||
|
||||
This section use the `pause` container image in examples because it does not
|
||||
contain debugging utilities, but this method works with all container
|
||||
images.
|
||||
-->
|
||||
## 使用临时容器来调试的例子 {#ephemeral-container-example}
|
||||
|
||||
你可以使用 `kubectl debug` 命令来给正在运行中的 Pod 增加一个临时容器。
|
||||
首先,像示例一样创建一个 pod:
|
||||
|
||||
```shell
|
||||
kubectl run ephemeral-demo --image=k8s.gcr.io/pause:3.1 --restart=Never
|
||||
```
|
||||
|
||||
{{< note >}}
|
||||
本节示例中使用 `pause` 容器镜像,因为它不包含调试程序,但是这个方法适用于所有容器镜像。
|
||||
{{< /note >}}
|
||||
|
||||
<!--
|
||||
If you attempt to use `kubectl exec` to create a shell you will see an error
|
||||
because there is no shell in this container image.
|
||||
|
||||
```shell
|
||||
kubectl exec -it ephemeral-demo -- sh
|
||||
```
|
||||
|
||||
```
|
||||
OCI runtime exec failed: exec failed: container_linux.go:346: starting container process caused "exec: \"sh\": executable file not found in $PATH": unknown
|
||||
```
|
||||
|
||||
You can instead add a debugging container using `kubectl debug`. If you
|
||||
specify the `-i`/`--interactive` argument, `kubectl` will automatically attach
|
||||
to the console of the Ephemeral Container.
|
||||
|
||||
```shell
|
||||
kubectl debug -it ephemeral-demo --image=busybox:1.28 --target=ephemeral-demo
|
||||
```
|
||||
|
||||
```
|
||||
Defaulting debug container name to debugger-8xzrl.
|
||||
If you don't see a command prompt, try pressing enter.
|
||||
/ #
|
||||
```
|
||||
-->
|
||||
如果你尝试使用 `kubectl exec` 来创建一个 shell,你将会看到一个错误,因为这个容器镜像中没有 shell。
|
||||
|
||||
```shell
|
||||
kubectl exec -it ephemeral-demo -- sh
|
||||
```
|
||||
|
||||
```
|
||||
OCI runtime exec failed: exec failed: container_linux.go:346: starting container process caused "exec: \"sh\": executable file not found in $PATH": unknown
|
||||
```
|
||||
|
||||
你可以改为使用 `kubectl debug` 添加调试容器。
|
||||
如果你指定 `-i` 或者 `--interactive` 参数,`kubectl` 将自动挂接到临时容器的控制台。
|
||||
|
||||
```shell
|
||||
kubectl debug -it ephemeral-demo --image=busybox:1.28 --target=ephemeral-demo
|
||||
```
|
||||
|
||||
```
|
||||
Defaulting debug container name to debugger-8xzrl.
|
||||
If you don't see a command prompt, try pressing enter.
|
||||
/ #
|
||||
```
|
||||
|
||||
<!--
|
||||
This command adds a new busybox container and attaches to it. The `--target`
|
||||
parameter targets the process namespace of another container. It's necessary
|
||||
here because `kubectl run` does not enable [process namespace sharing](
|
||||
/docs/tasks/configure-pod-container/share-process-namespace/) in the pod it
|
||||
creates.
|
||||
|
||||
The `--target` parameter must be supported by the {{< glossary_tooltip
|
||||
text="Container Runtime" term_id="container-runtime" >}}. When not supported,
|
||||
the Ephemeral Container may not be started, or it may be started with an
|
||||
isolated process namespace so that `ps` does not reveal processes in other containers.
|
||||
|
||||
You can view the state of the newly created ephemeral container using `kubectl describe`:
|
||||
-->
|
||||
此命令添加一个新的 busybox 容器并将其挂接到该容器。`--target` 参数指定另一个容器的进程命名空间。
|
||||
这是必需的,因为 `kubectl run` 不能在它创建的pod中启用
|
||||
[共享进程命名空间](/zh/docs/tasks/configure-pod-container/share-process-namespace/)。
|
||||
|
||||
{{< note >}}
|
||||
{{< glossary_tooltip text="容器运行时" term_id="container-runtime" >}}必须支持`--target`参数。
|
||||
如果不支持,则临时容器可能不会启动,或者可能使用隔离的进程命名空间启动,
|
||||
以便 `ps` 不显示其他容器内的进程。
|
||||
{{< /note >}}
|
||||
|
||||
你可以使用 `kubectl describe` 查看新创建的临时容器的状态:
|
||||
|
||||
```shell
|
||||
kubectl describe pod ephemeral-demo
|
||||
```
|
||||
|
||||
```
|
||||
...
|
||||
Ephemeral Containers:
|
||||
debugger-8xzrl:
|
||||
Container ID: docker://b888f9adfd15bd5739fefaa39e1df4dd3c617b9902082b1cfdc29c4028ffb2eb
|
||||
Image: busybox
|
||||
Image ID: docker-pullable://busybox@sha256:1828edd60c5efd34b2bf5dd3282ec0cc04d47b2ff9caa0b6d4f07a21d1c08084
|
||||
Port: <none>
|
||||
Host Port: <none>
|
||||
State: Running
|
||||
Started: Wed, 12 Feb 2020 14:25:42 +0100
|
||||
Ready: False
|
||||
Restart Count: 0
|
||||
Environment: <none>
|
||||
Mounts: <none>
|
||||
...
|
||||
```
|
||||
|
||||
<!--
|
||||
Use `kubectl delete` to remove the Pod when you're finished:
|
||||
-->
|
||||
使用 `kubectl delete` 来移除已经结束掉的 Pod:
|
||||
|
||||
```shell
|
||||
kubectl delete pod ephemeral-demo
|
||||
```
|
||||
|
||||
<!--
|
||||
## Debugging using a copy of the Pod
|
||||
-->
|
||||
## 通过 Pod 副本调试
|
||||
|
||||
<!--
|
||||
Sometimes Pod configuration options make it difficult to troubleshoot in certain
|
||||
situations. For example, you can't run `kubectl exec` to troubleshoot your
|
||||
container if your container image does not include a shell or if your application
|
||||
crashes on startup. In these situations you can use `kubectl debug` to create a
|
||||
copy of the Pod with configuration values changed to aid debugging.
|
||||
-->
|
||||
有些时候 Pod 的配置参数使得在某些情况下很难执行故障排查。
|
||||
例如,在容器镜像中不包含 shell 或者你的应用程序在启动时崩溃的情况下,
|
||||
就不能通过运行 `kubectl exec` 来排查容器故障。
|
||||
在这些情况下,你可以使用 `kubectl debug` 来创建 Pod 的副本,通过更改配置帮助调试。
|
||||
|
||||
<!--
|
||||
### Copying a Pod while adding a new container
|
||||
-->
|
||||
### 在添加新的容器时创建 Pod 副本
|
||||
|
||||
<!--
|
||||
Adding a new container can be useful when your application is running but not
|
||||
behaving as you expect and you'd like to add additional troubleshooting
|
||||
utilities to the Pod.
|
||||
-->
|
||||
当应用程序正在运行但其表现不符合预期时,你会希望在 Pod 中添加额外的调试工具,
|
||||
这时添加新容器是很有用的。
|
||||
|
||||
<!--
|
||||
For example, maybe your application's container images are built on `busybox`
|
||||
but you need debugging utilities not included in `busybox`. You can simulate
|
||||
this scenario using `kubectl run`:
|
||||
-->
|
||||
例如,应用的容器镜像是建立在 `busybox` 的基础上,
|
||||
但是你需要 `busybox` 中并不包含的调试工具。
|
||||
你可以使用 `kubectl run` 模拟这个场景:
|
||||
|
||||
```shell
|
||||
kubectl run myapp --image=busybox:1.28 --restart=Never -- sleep 1d
|
||||
```
|
||||
<!--
|
||||
Run this command to create a copy of `myapp` named `myapp-debug` that adds a
|
||||
new Ubuntu container for debugging:
|
||||
-->
|
||||
通过运行以下命令,建立 `myapp` 的一个名为 `myapp-debug` 的副本,
|
||||
新增了一个用于调试的 Ubuntu 容器,
|
||||
|
||||
```shell
|
||||
kubectl debug myapp -it --image=ubuntu --share-processes --copy-to=myapp-debug
|
||||
```
|
||||
|
||||
```
|
||||
Defaulting debug container name to debugger-w7xmf.
|
||||
If you don't see a command prompt, try pressing enter.
|
||||
root@myapp-debug:/#
|
||||
```
|
||||
<!--
|
||||
{{< note >}}
|
||||
* `kubectl debug` automatically generates a container name if you don't choose
|
||||
one using the `--container` flag.
|
||||
* The `-i` flag causes `kubectl debug` to attach to the new container by
|
||||
default. You can prevent this by specifying `--attach=false`. If your session
|
||||
becomes disconnected you can reattach using `kubectl attach`.
|
||||
* The `--share-processes` allows the containers in this Pod to see processes
|
||||
from the other containers in the Pod. For more information about how this
|
||||
works, see [Share Process Namespace between Containers in a Pod](
|
||||
/docs/tasks/configure-pod-container/share-process-namespace/).
|
||||
{{< /note >}}
|
||||
-->
|
||||
{{< note >}}
|
||||
* 如果你没有使用 `--container` 指定新的容器名,`kubectl debug` 会自动生成的。
|
||||
* 默认情况下,`-i` 标志使 `kubectl debug` 附加到新容器上。
|
||||
你可以通过指定 `--attach=false` 来防止这种情况。
|
||||
如果你的会话断开连接,你可以使用 `kubectl attach` 重新连接。
|
||||
* `--share-processes` 允许在此 Pod 中的其他容器中查看该容器的进程。
|
||||
参阅[在 Pod 中的容器之间共享进程命名空间](/zh/docs/tasks/configure-pod-container/share-process-namespace/)
|
||||
获取更多信息。
|
||||
{{< /note >}}
|
||||
|
||||
<!--
|
||||
Don't forget to clean up the debugging Pod when you're finished with it:
|
||||
-->
|
||||
不要忘了清理调试 Pod:
|
||||
|
||||
```shell
|
||||
kubectl delete pod myapp myapp-debug
|
||||
```
|
||||
|
||||
<!--
|
||||
### Copying a Pod while changing its command
|
||||
-->
|
||||
### 在改变 Pod 命令时创建 Pod 副本
|
||||
|
||||
<!--
|
||||
Sometimes it's useful to change the command for a container, for example to
|
||||
add a debugging flag or because the application is crashing.
|
||||
-->
|
||||
有时更改容器的命令很有用,例如添加调试标志或因为应用崩溃。
|
||||
|
||||
<!--
|
||||
To simulate a crashing application, use `kubectl run` to create a container
|
||||
that immediately exits:
|
||||
-->
|
||||
为了模拟应用崩溃的场景,使用 `kubectl run` 命令创建一个立即退出的容器:
|
||||
|
||||
```
|
||||
kubectl run --image=busybox:1.28 myapp -- false
|
||||
```
|
||||
|
||||
<!--
|
||||
You can see using `kubectl describe pod myapp` that this container is crashing:
|
||||
-->
|
||||
使用 `kubectl describe pod myapp` 命令,你可以看到容器崩溃了:
|
||||
|
||||
```
|
||||
Containers:
|
||||
myapp:
|
||||
Image: busybox
|
||||
...
|
||||
Args:
|
||||
false
|
||||
State: Waiting
|
||||
Reason: CrashLoopBackOff
|
||||
Last State: Terminated
|
||||
Reason: Error
|
||||
Exit Code: 1
|
||||
```
|
||||
|
||||
<!--
|
||||
You can use `kubectl debug` to create a copy of this Pod with the command
|
||||
changed to an interactive shell:
|
||||
-->
|
||||
你可以使用 `kubectl debug` 命令创建该 Pod 的一个副本,
|
||||
在该副本中命令改变为交互式 shell:
|
||||
|
||||
```
|
||||
kubectl debug myapp -it --copy-to=myapp-debug --container=myapp -- sh
|
||||
```
|
||||
|
||||
```
|
||||
If you don't see a command prompt, try pressing enter.
|
||||
/ #
|
||||
```
|
||||
|
||||
<!--
|
||||
Now you have an interactive shell that you can use to perform tasks like
|
||||
checking filesystem paths or running the container command manually.
|
||||
-->
|
||||
现在你有了一个可以执行类似检查文件系统路径或者手动运行容器命令的交互式 shell。
|
||||
|
||||
<!--
|
||||
{{< note >}}
|
||||
* To change the command of a specific container you must
|
||||
specify its name using `--container` or `kubectl debug` will instead
|
||||
create a new container to run the command you specified.
|
||||
* The `-i` flag causes `kubectl debug` to attach to the container by default.
|
||||
You can prevent this by specifying `--attach=false`. If your session becomes
|
||||
disconnected you can reattach using `kubectl attach`.
|
||||
{{< /note >}}
|
||||
-->
|
||||
{{< note >}}
|
||||
* 要更改指定容器的命令,你必须用 `--container` 命令指定容器的名字,
|
||||
否则 `kubectl debug` 将建立一个新的容器运行你指定的命令。
|
||||
* 默认情况下,标志 `-i` 使 `kubectl debug` 附加到容器。
|
||||
你可通过指定 `--attach=false` 来防止这种情况。
|
||||
如果你的断开连接,可以使用 `kubectl attach` 重新连接。
|
||||
{{< /note >}}
|
||||
|
||||
<!--
|
||||
Don't forget to clean up the debugging Pod when you're finished with it:
|
||||
-->
|
||||
不要忘了清理调试 Pod:
|
||||
|
||||
```shell
|
||||
kubectl delete pod myapp myapp-debug
|
||||
```
|
||||
<!--
|
||||
### Copying a Pod while changing container images
|
||||
|
||||
In some situations you may want to change a misbehaving Pod from its normal
|
||||
production container images to an image containing a debugging build or
|
||||
additional utilities.
|
||||
|
||||
As an example, create a Pod using `kubectl run`:
|
||||
-->
|
||||
### 在更改容器镜像时创建 Pod 副本
|
||||
|
||||
在某些情况下,你可能想从正常生产容器镜像中
|
||||
把行为异常的 Pod 改变为包含调试版本或者附加应用的镜像。
|
||||
|
||||
下面的例子,用 `kubectl run`创建一个 Pod:
|
||||
|
||||
```
|
||||
kubectl run myapp --image=busybox:1.28 --restart=Never -- sleep 1d
|
||||
```
|
||||
<!--
|
||||
Now use `kubectl debug` to make a copy and change its container image
|
||||
to `ubuntu`:
|
||||
-->
|
||||
现在可以使用 `kubectl debug` 创建一个副本
|
||||
并改变容器镜像为 `ubuntu`:
|
||||
|
||||
```
|
||||
kubectl debug myapp --copy-to=myapp-debug --set-image=*=ubuntu
|
||||
```
|
||||
|
||||
<!--
|
||||
The syntax of `--set-image` uses the same `container_name=image` syntax as
|
||||
`kubectl set image`. `*=ubuntu` means change the image of all containers
|
||||
to `ubuntu`.
|
||||
|
||||
Don't forget to clean up the debugging Pod when you're finished with it:
|
||||
-->
|
||||
`--set-image` 与 `container_name=image` 使用相同的 `kubectl set image` 语法。
|
||||
`*=ubuntu` 表示把所有容器的镜像改为 `ubuntu`。
|
||||
|
||||
```shell
|
||||
kubectl delete pod myapp myapp-debug
|
||||
```
|
||||
|
||||
<!--
|
||||
## Debugging via a shell on the node {#node-shell-session}
|
||||
|
||||
If none of these approaches work, you can find the Node on which the Pod is
|
||||
running and create a privileged Pod running in the host namespaces. To create
|
||||
an interactive shell on a node using `kubectl debug`, run:
|
||||
-->
|
||||
## 在节点上通过 shell 来进行调试 {#node-shell-session}
|
||||
|
||||
如果这些方法都不起作用,你可以找到运行 Pod 的节点,然后在节点上部署一个运行在宿主名字空间的特权 Pod。
|
||||
|
||||
你可以通过`kubectl debug` 在节点上创建一个交互式 shell:
|
||||
|
||||
```shell
|
||||
kubectl debug node/mynode -it --image=ubuntu
|
||||
```
|
||||
|
||||
```
|
||||
Creating debugging pod node-debugger-mynode-pdx84 with container debugger on node mynode.
|
||||
If you don't see a command prompt, try pressing enter.
|
||||
root@ek8s:/#
|
||||
```
|
||||
|
||||
<!--
|
||||
When creating a debugging session on a node, keep in mind that:
|
||||
|
||||
* `kubectl debug` automatically generates the name of the new Pod based on
|
||||
the name of the Node.
|
||||
* The container runs in the host IPC, Network, and PID namespaces.
|
||||
* The root filesystem of the Node will be mounted at `/host`.
|
||||
|
||||
Don't forget to clean up the debugging Pod when you're finished with it:
|
||||
-->
|
||||
当在节点上创建调试会话,注意以下要点:
|
||||
* `kubectl debug` 基于节点的名字自动生成新的 Pod 的名字。
|
||||
* 新的调试容器运行在宿主命名空间里(IPC, 网络 还有PID命名空间)。
|
||||
* 节点的根文件系统会被挂载在 `/host`。
|
||||
|
||||
当你完成节点调试时,不要忘记清理调试 Pod:
|
||||
|
||||
```shell
|
||||
kubectl delete pod node-debugger-mynode-pdx84
|
||||
```
|
File diff suppressed because it is too large
Load Diff
|
@ -1,240 +0,0 @@
|
|||
---
|
||||
title: 获取正在运行容器的 Shell
|
||||
content_type: task
|
||||
---
|
||||
|
||||
<!--
|
||||
---
|
||||
reviewers:
|
||||
- caesarxuchao
|
||||
- mikedanese
|
||||
title: Get a Shell to a Running Container
|
||||
content_type: task
|
||||
---
|
||||
-->
|
||||
|
||||
<!-- overview -->
|
||||
|
||||
<!--
|
||||
This page shows how to use `kubectl exec` to get a shell to a
|
||||
running Container.
|
||||
-->
|
||||
|
||||
本文介绍怎样使用 `kubectl exec` 命令获取正在运行容器的 Shell。
|
||||
|
||||
|
||||
|
||||
|
||||
## {{% heading "prerequisites" %}}
|
||||
|
||||
|
||||
{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}
|
||||
|
||||
|
||||
|
||||
|
||||
<!-- steps -->
|
||||
|
||||
<!--
|
||||
## Getting a shell to a Container
|
||||
-->
|
||||
|
||||
## 获取容器的 Shell
|
||||
|
||||
<!--
|
||||
In this exercise, you create a Pod that has one Container. The Container
|
||||
runs the nginx image. Here is the configuration file for the Pod:
|
||||
-->
|
||||
|
||||
在本练习中,你将创建包含一个容器的 Pod。容器运行 nginx 镜像。下面是 Pod 的配置文件:
|
||||
|
||||
{{< codenew file="application/shell-demo.yaml" >}}
|
||||
|
||||
<!--
|
||||
Create the Pod:
|
||||
-->
|
||||
|
||||
创建 Pod:
|
||||
|
||||
```shell
|
||||
kubectl create -f https://k8s.io/examples/application/shell-demo.yaml
|
||||
```
|
||||
|
||||
<!--
|
||||
Verify that the Container is running:
|
||||
-->
|
||||
|
||||
检查容器是否运行正常:
|
||||
|
||||
```shell
|
||||
kubectl get pod shell-demo
|
||||
```
|
||||
|
||||
<!--
|
||||
Get a shell to the running Container:
|
||||
-->
|
||||
|
||||
获取正在运行容器的 Shell:
|
||||
|
||||
```shell
|
||||
kubectl exec -it shell-demo -- /bin/bash
|
||||
```
|
||||
{{< note >}}
|
||||
|
||||
<!--
|
||||
The double dash symbol "--" is used to separate the arguments you want to pass to the command from the kubectl arguments.
|
||||
-->
|
||||
双破折号 "--" 用于将要传递给命令的参数与 kubectl 的参数分开。
|
||||
{{< /note >}}
|
||||
|
||||
<!--
|
||||
In your shell, list the root directory:
|
||||
-->
|
||||
|
||||
在 shell 中,打印根目录:
|
||||
|
||||
```shell
|
||||
root@shell-demo:/# ls /
|
||||
```
|
||||
|
||||
<!--
|
||||
In your shell, experiment with other commands. Here are
|
||||
some examples:
|
||||
-->
|
||||
|
||||
在 shell 中,实验其他命令。下面是一些示例:
|
||||
|
||||
```shell
|
||||
root@shell-demo:/# ls /
|
||||
root@shell-demo:/# cat /proc/mounts
|
||||
root@shell-demo:/# cat /proc/1/maps
|
||||
root@shell-demo:/# apt-get update
|
||||
root@shell-demo:/# apt-get install -y tcpdump
|
||||
root@shell-demo:/# tcpdump
|
||||
root@shell-demo:/# apt-get install -y lsof
|
||||
root@shell-demo:/# lsof
|
||||
root@shell-demo:/# apt-get install -y procps
|
||||
root@shell-demo:/# ps aux
|
||||
root@shell-demo:/# ps aux | grep nginx
|
||||
```
|
||||
|
||||
<!--
|
||||
## Writing the root page for nginx
|
||||
-->
|
||||
|
||||
## 编写 nginx 的根页面
|
||||
|
||||
<!--
|
||||
Look again at the configuration file for your Pod. The Pod
|
||||
has an `emptyDir` volume, and the Container mounts the volume
|
||||
at `/usr/share/nginx/html`.
|
||||
-->
|
||||
|
||||
在看一下 Pod 的配置文件。该 Pod 有个 `emptyDir` 卷,容器将该卷挂载到了 `/usr/share/nginx/html`。
|
||||
|
||||
<!--
|
||||
In your shell, create an `index.html` file in the `/usr/share/nginx/html`
|
||||
directory:
|
||||
-->
|
||||
|
||||
在 shell 中,在 `/usr/share/nginx/html` 目录创建一个 `index.html` 文件:
|
||||
|
||||
```shell
|
||||
root@shell-demo:/# echo Hello shell demo > /usr/share/nginx/html/index.html
|
||||
```
|
||||
|
||||
<!--
|
||||
In your shell, send a GET request to the nginx server:
|
||||
-->
|
||||
|
||||
在 shell 中,向 nginx 服务器发送 GET 请求:
|
||||
|
||||
```shell
|
||||
root@shell-demo:/# apt-get update
|
||||
root@shell-demo:/# apt-get install curl
|
||||
root@shell-demo:/# curl localhost
|
||||
```
|
||||
|
||||
<!--
|
||||
The output shows the text that you wrote to the `index.html` file:
|
||||
-->
|
||||
|
||||
输出结果显示了你在 `index.html` 中写入的文本。
|
||||
|
||||
```shell
|
||||
Hello shell demo
|
||||
```
|
||||
|
||||
<!--
|
||||
When you are finished with your shell, enter `exit`.
|
||||
-->
|
||||
|
||||
当用完 shell 后,输入 `exit` 退出。
|
||||
|
||||
<!--
|
||||
## Running individual commands in a Container
|
||||
-->
|
||||
|
||||
## 在容器中运行单个命令
|
||||
|
||||
<!--
|
||||
In an ordinary command window, not your shell, list the environment
|
||||
variables in the running Container:
|
||||
-->
|
||||
|
||||
在普通的命令窗口(而不是 shell)中,打印环境运行容器中的变量:
|
||||
|
||||
```shell
|
||||
kubectl exec shell-demo env
|
||||
```
|
||||
|
||||
<!--
|
||||
Experiment running other commands. Here are some examples:
|
||||
-->
|
||||
|
||||
实验运行其他命令。下面是一些示例:
|
||||
|
||||
```shell
|
||||
kubectl exec shell-demo ps aux
|
||||
kubectl exec shell-demo ls /
|
||||
kubectl exec shell-demo cat /proc/1/mounts
|
||||
```
|
||||
|
||||
|
||||
|
||||
<!-- discussion -->
|
||||
|
||||
<!--
|
||||
## Opening a shell when a Pod has more than one Container
|
||||
-->
|
||||
|
||||
## 当 Pod 包含多个容器时打开 shell
|
||||
|
||||
<!--
|
||||
If a Pod has more than one Container, use `--container` or `-c` to
|
||||
specify a Container in the `kubectl exec` command. For example,
|
||||
suppose you have a Pod named my-pod, and the Pod has two containers
|
||||
named main-app and helper-app. The following command would open a
|
||||
shell to the main-app Container.
|
||||
-->
|
||||
|
||||
如果 Pod 有多个容器,`--container` 或者 `-c` 可以在 `kubectl exec` 命令中指定容器。
|
||||
例如,您有个名为 my-pod 的容器,该 Pod 有两个容器分别为 main-app 和 healper-app。
|
||||
下面的命令将会打开一个 shell 访问 main-app 容器。
|
||||
|
||||
```shell
|
||||
kubectl exec -it my-pod --container main-app -- /bin/bash
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
|
||||
* [kubectl exec](/docs/reference/generated/kubectl/kubectl-commands/#exec)
|
||||
|
||||
|
||||
|
||||
|
||||
|
|
@ -1,113 +0,0 @@
|
|||
---
|
||||
content_type: concept
|
||||
title: 资源监控工具
|
||||
---
|
||||
<!--
|
||||
reviewers:
|
||||
- mikedanese
|
||||
content_type: concept
|
||||
title: Tools for Monitoring Resources
|
||||
-->
|
||||
|
||||
<!-- overview -->
|
||||
|
||||
<!--
|
||||
To scale an application and provide a reliable service, you need to
|
||||
understand how the application behaves when it is deployed. You can examine
|
||||
application performance in a Kubernetes cluster by examining the containers,
|
||||
[pods](/docs/concepts/workloads/pods/),
|
||||
[services](/docs/concepts/services-networking/service/), and
|
||||
the characteristics of the overall cluster. Kubernetes provides detailed
|
||||
information about an application's resource usage at each of these levels.
|
||||
This information allows you to evaluate your application's performance and
|
||||
where bottlenecks can be removed to improve overall performance.
|
||||
-->
|
||||
要扩展应用程序并提供可靠的服务,你需要了解应用程序在部署时的行为。
|
||||
你可以通过检测容器检查 Kubernetes 集群中的应用程序性能,
|
||||
[Pods](/zh/docs/concepts/workloads/pods),
|
||||
[服务](/zh/docs/concepts/services-networking/service/)
|
||||
和整个集群的特征。
|
||||
Kubernetes 在每个级别上提供有关应用程序资源使用情况的详细信息。
|
||||
此信息使你可以评估应用程序的性能,以及在何处可以消除瓶颈以提高整体性能。
|
||||
|
||||
<!-- body -->
|
||||
|
||||
<!--
|
||||
In Kubernetes, application monitoring does not depend on a single monitoring solution.
|
||||
On new clusters, you can use [resource metrics](#resource-metrics-pipeline) or
|
||||
[full metrics](#full-metrics-pipeline) pipelines to collect monitoring statistics.
|
||||
-->
|
||||
在 Kubernetes 中,应用程序监控不依赖单个监控解决方案。
|
||||
在新集群上,你可以使用[资源度量](#resource-metrics-pipeline)或
|
||||
[完整度量](#full-metrics-pipeline)管道来收集监视统计信息。
|
||||
|
||||
<!--
|
||||
## Resource metrics pipeline
|
||||
|
||||
The resource metrics pipeline provides a limited set of metrics related to
|
||||
cluster components such as the
|
||||
[Horizontal Pod Autoscaler](/docs/tasks/run-application/horizontal-pod-autoscale)
|
||||
controller, as well as the `kubectl top` utility.
|
||||
These metrics are collected by the lightweight, short-term, in-memory
|
||||
[metrics-server](https://github.com/kubernetes-sigs/metrics-server) and
|
||||
are exposed via the `metrics.k8s.io` API.
|
||||
-->
|
||||
## 资源度量管道 {#resource-metrics-pipeline}
|
||||
|
||||
资源指标管道提供了一组与集群组件,例如
|
||||
[Horizontal Pod Autoscaler](/zh/docs/tasks/run-application/horizontal-pod-autoscale/)
|
||||
控制器以及 `kubectl top` 实用程序相关的有限度量。
|
||||
这些指标是由轻量级的、短期、内存存储的
|
||||
[metrics-server](https://github.com/kubernetes-sigs/metrics-server) 收集的,
|
||||
通过 `metrics.k8s.io` 公开。
|
||||
|
||||
<!--
|
||||
metrics-server discovers all nodes on the cluster and
|
||||
queries each node's
|
||||
[kubelet](/docs/reference/command-line-tools-reference/kubelet/) for CPU and
|
||||
memory usage. The kubelet acts as a bridge between the Kubernetes master and
|
||||
the nodes, managing the pods and containers running on a machine. The kubelet
|
||||
translates each pod into its constituent containers and fetches individual
|
||||
container usage statistics from the container runtime through the container
|
||||
runtime interface. The kubelet fetches this information from the integrated
|
||||
cAdvisor for the legacy Docker integration. It then exposes the aggregated pod
|
||||
resource usage statistics through the metrics-server Resource Metrics API.
|
||||
This API is served at `/metrics/resource/v1beta1` on the kubelet's authenticated and
|
||||
read-only ports.
|
||||
-->
|
||||
度量服务器发现集群中的所有节点,并且查询每个节点的
|
||||
[kubelet](/zh/docs/reference/command-line-tools-reference/kubelet/)
|
||||
以获取 CPU 和内存使用情况。
|
||||
Kubelet 充当 Kubernetes 主节点与节点之间的桥梁,管理机器上运行的 Pod 和容器。
|
||||
kubelet 将每个 Pod 转换为其组成的容器,并在容器运行时通过容器运行时接口
|
||||
获取各个容器使用情况统计信息。
|
||||
kubelet 从集成的 cAdvisor 获取此信息,以进行旧式 Docker 集成。
|
||||
然后,它通过 metrics-server Resource Metrics API 公开聚合的 pod 资源使用情况统计信息。
|
||||
该 API 在 kubelet 的经过身份验证和只读的端口上的 `/metrics/resource/v1beta1` 中提供。
|
||||
|
||||
<!--
|
||||
## Full metrics pipeline
|
||||
|
||||
A full metrics pipeline gives you access to richer metrics. Kubernetes can
|
||||
respond to these metrics by automatically scaling or adapting the cluster
|
||||
based on its current state, using mechanisms such as the Horizontal Pod
|
||||
Autoscaler. The monitoring pipeline fetches metrics from the kubelet and
|
||||
then exposes them to Kubernetes via an adapter by implementing either the
|
||||
`custom.metrics.k8s.io` or `external.metrics.k8s.io` API.
|
||||
-->
|
||||
## 完整度量管道 {#full-metrics-pipeline}
|
||||
|
||||
一个完整度量管道可以让你访问更丰富的度量。
|
||||
Kubernetes 还可以根据集群的当前状态,使用 Pod 水平自动扩缩器等机制,
|
||||
通过自动调用扩展或调整集群来响应这些度量。
|
||||
监控管道从 kubelet 获取度量值,然后通过适配器将它们公开给 Kubernetes,
|
||||
方法是实现 `custom.metrics.k8s.io` 或 `external.metrics.k8s.io` API。
|
||||
|
||||
<!--
|
||||
[Prometheus](https://prometheus.io), a CNCF project, can natively monitor Kubernetes, nodes, and Prometheus itself.
|
||||
Full metrics pipeline projects that are not part of the CNCF are outside the scope of Kubernetes documentation.
|
||||
-->
|
||||
[Prometheus](https://prometheus.io) 是一个 CNCF 项目,可以原生监控 Kubernetes、
|
||||
节点和 Prometheus 本身。
|
||||
完整度量管道项目不属于 CNCF 的一部分,不在 Kubernetes 文档的范围之内。
|
||||
|
|
@ -1,17 +1,19 @@
|
|||
---
|
||||
title: 监控、日志和调试
|
||||
description: 设置监控和日志记录以对集群进行故障排除或调试容器化应用程序。
|
||||
weight: 20
|
||||
content_type: concept
|
||||
no_list: true
|
||||
---
|
||||
<!--
|
||||
title: "Monitoring, Logging, and Debugging"
|
||||
description: Set up monitoring and logging to troubleshoot a cluster, or debug a containerized application.
|
||||
weight: 20
|
||||
reviewers:
|
||||
- brendandburns
|
||||
- davidopp
|
||||
content_type: concept
|
||||
title: 故障诊断
|
||||
---
|
||||
|
||||
<!--
|
||||
reviewers:
|
||||
- brendandburns
|
||||
- davidopp
|
||||
content_type: concept
|
||||
title: Troubleshooting
|
||||
no_list: true
|
||||
-->
|
||||
|
||||
<!-- overview -->
|
||||
|
@ -23,12 +25,14 @@ two sections:
|
|||
有时候事情会出错。本指南旨在解决这些问题。它包含两个部分:
|
||||
|
||||
<!--
|
||||
* [Troubleshooting your application](/docs/tasks/debug-application-cluster/debug-application/) - Useful for users who are deploying code into Kubernetes and wondering why it is not working.
|
||||
* [Troubleshooting your cluster](/docs/tasks/debug-application-cluster/debug-cluster/) - Useful for cluster administrators and people whose Kubernetes cluster is unhappy.
|
||||
* [Debugging your application](/docs/tasks/debug/debug-application/) - Useful
|
||||
for users who are deploying code into Kubernetes and wondering why it is not working.
|
||||
* [Debugging your cluster](/docs/tasks/debug/debug-cluster/) - Useful
|
||||
for cluster administrators and people whose Kubernetes cluster is unhappy.
|
||||
-->
|
||||
* [应用排错](/zh/docs/tasks/debug-application-cluster/debug-application/) -
|
||||
* [应用排错](/zh/docs/tasks/debug/debug-application/) -
|
||||
针对部署代码到 Kubernetes 并想知道代码为什么不能正常运行的用户。
|
||||
* [集群排错](/zh/docs/tasks/debug-application-cluster/debug-cluster/) -
|
||||
* [集群排错](/zh/docs/tasks/debug/debug-cluster/) -
|
||||
针对集群管理员以及 Kubernetes 集群表现异常的用户。
|
||||
|
||||
<!--
|
|
@ -13,5 +13,6 @@ weight: 20
|
|||
<!--
|
||||
This doc contains a set of resources for fixing issues with containerized applications. It covers things like common issues with Kubernetes resources (like Pods, Services, or StatefulSets), advice on making sense of container termination messages, and ways to debug running containers.
|
||||
-->
|
||||
此文档包含了一些用来解决容器应用问题的资源。它涵盖了 Kubernetes 资源(如 Pod、Service 或 StatefulSets)
|
||||
的常见问题,理解容器终止消息的建议以及调试正在运行的容器的方法。
|
||||
该文档包含一组用于解决容器化应用程序问题的资源。
|
||||
它涵盖了诸如 Kubernetes 资源(如 Pod、Service 或 StatefulSets)的常见问题、
|
||||
关于理解容器终止消息的建议以及调试正在运行的容器的方法。
|
||||
|
|
|
@ -1,6 +1,7 @@
|
|||
---
|
||||
title: 调试 Init 容器
|
||||
content_type: task
|
||||
weight: 40
|
||||
---
|
||||
|
||||
<!--
|
||||
|
@ -14,6 +15,7 @@ reviewers:
|
|||
- smarterclayton
|
||||
title: Debug Init Containers
|
||||
content_type: task
|
||||
weight: 40
|
||||
-->
|
||||
|
||||
<!-- overview -->
|
|
@ -1,10 +1,16 @@
|
|||
---
|
||||
title: 应用故障排查
|
||||
title: 调试 Pod
|
||||
content_type: concept
|
||||
weight: 10
|
||||
---
|
||||
<!--
|
||||
title: Troubleshoot Applications
|
||||
|
||||
<!--
|
||||
reviewers:
|
||||
- mikedanese
|
||||
- thockin
|
||||
title: Debug Pods
|
||||
content_type: concept
|
||||
weight: 10
|
||||
-->
|
||||
|
||||
<!-- overview -->
|
||||
|
@ -12,12 +18,12 @@ content_type: concept
|
|||
<!--
|
||||
This guide is to help users debug applications that are deployed into Kubernetes and not behaving correctly.
|
||||
This is *not* a guide for people who want to debug their cluster. For that you should check out
|
||||
[this guide](/docs/admin/cluster-troubleshooting).
|
||||
[this guide](/docs/tasks/debug/debug-cluster).
|
||||
-->
|
||||
|
||||
本指南帮助用户调试那些部署到 Kubernetes 上后没有正常运行的应用。
|
||||
本指南 *并非* 指导用户如何调试集群。
|
||||
如果想调试集群的话,请参阅[这里](/zh/docs/tasks/debug-application-cluster/debug-cluster/)。
|
||||
本指南 **并非** 指导用户如何调试集群。
|
||||
如果想调试集群的话,请参阅[这里](/zh/docs/tasks/debug/debug-cluster)。
|
||||
|
||||
|
||||
<!-- body -->
|
||||
|
@ -34,18 +40,18 @@ your Service?
|
|||
-->
|
||||
## 诊断问题 {#diagnosing-the-problem}
|
||||
|
||||
故障排查的第一步是先给问题分类。问题是什么?是关于 Pods、Replication Controller 还是 Service?
|
||||
故障排查的第一步是先给问题分类。问题是什么?是关于 Pod、Replication Controller 还是 Service?
|
||||
|
||||
* [调试 Pods](#debugging-pods)
|
||||
* [调试副本控制器](#debugging-replication-controllers)
|
||||
* [调试服务](#debugging-services)
|
||||
* [调试 Pod](#debugging-pods)
|
||||
* [调试 Replication Controller](#debugging-replication-controllers)
|
||||
* [调试 Service](#debugging-services)
|
||||
|
||||
<!--
|
||||
### Debugging Pods
|
||||
|
||||
The first step in debugging a Pod is taking a look at it. Check the current state of the Pod and recent events with the following command:
|
||||
-->
|
||||
### 调试 Pods {#debugging-pods}
|
||||
### 调试 Pod {#debugging-pods}
|
||||
|
||||
调试 Pod 的第一步是查看 Pod 信息。用如下命令查看 Pod 的当前状态和最近的事件:
|
||||
|
||||
|
@ -79,7 +85,8 @@ your pod. Reasons include:
|
|||
|
||||
<!--
|
||||
* **You don't have enough resources**: You may have exhausted the supply of CPU or Memory in your cluster, in this case
|
||||
you need to delete Pods, adjust resource requests, or add new nodes to your cluster. See [Compute Resources document](/docs/user-guide/compute-resources/#my-pods-are-pending-with-event-message-failedscheduling) for more information.
|
||||
you need to delete Pods, adjust resource requests, or add new nodes to your cluster. See
|
||||
[Compute Resources document](/docs/concepts/configuration/manage-resources-containers/) for more information.
|
||||
|
||||
* **You are using `hostPort`**: When you bind a Pod to a `hostPort` there are a limited number of places that pod can be
|
||||
scheduled. In most cases, `hostPort` is unnecessary, try using a Service object to expose your Pod. If you do require
|
||||
|
@ -207,7 +214,7 @@ First, verify that there are endpoints for the service. For every Service object
|
|||
|
||||
You can view this resource with:
|
||||
-->
|
||||
### 调试服务 {#debugging-services}
|
||||
### 调试 Service {#debugging-services}
|
||||
|
||||
服务支持在多个 Pod 间负载均衡。
|
||||
有一些常见的问题可以造成服务无法正常工作。
|
||||
|
@ -228,7 +235,7 @@ For example, if your Service is for an nginx container with 3 replicas, you woul
|
|||
IP addresses in the Service's endpoints.
|
||||
-->
|
||||
确保 Endpoints 与服务成员 Pod 个数一致。
|
||||
例如,如果你的 Service 用来运行 3 个副本的 nginx 容器,你应该会在服务的 Endpoints
|
||||
例如,如果你的 Service 用来运行 3 个副本的 nginx 容器,你应该会在 Service 的 Endpoints
|
||||
中看到 3 个不同的 IP 地址。
|
||||
|
||||
<!--
|
||||
|
@ -273,22 +280,26 @@ Verify that the pod's `containerPort` matches up with the Service's `targetPort`
|
|||
<!--
|
||||
#### Network traffic is not forwarded
|
||||
|
||||
Please see [debugging service](/docs/tasks/debug-application-cluster/debug-service/) for more information.
|
||||
Please see [debugging service](/docs/tasks/debug/debug-applications/debug-service/) for more information.
|
||||
-->
|
||||
#### 网络流量未被转发
|
||||
|
||||
请参阅[调试 service](/zh/docs/tasks/debug-application-cluster/debug-service/) 了解更多信息。
|
||||
请参阅[调试 Service](/zh/docs/tasks/debug/debug-applications/debug-service/) 了解更多信息。
|
||||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
<!--
|
||||
If none of the above solves your problem, follow the instructions in [Debugging Service document](/docs/user-guide/debugging-services) to make sure that your `Service` is running, has `Endpoints`, and your `Pods` are actually serving; you have DNS working, iptables rules installed, and kube-proxy does not seem to be misbehaving.
|
||||
If none of the above solves your problem, follow the instructions in
|
||||
[Debugging Service document](/docs/tasks/debug/debug-applications/debug-service/)
|
||||
to make sure that your `Service` is running, has `Endpoints`, and your `Pods` are
|
||||
actually serving; you have DNS working, iptables rules installed, and kube-proxy
|
||||
does not seem to be misbehaving.
|
||||
|
||||
You may also visit [troubleshooting document](/docs/troubleshooting/) for more information.
|
||||
You may also visit [troubleshooting document](/docs/tasks/debug/overview/) for more information.
|
||||
-->
|
||||
如果上述方法都不能解决你的问题,请按照
|
||||
[调试服务文档](/zh/docs/tasks/debug-application-cluster/debug-service/)中的介绍,
|
||||
如果上述方法都不能解决你的问题,
|
||||
请按照[调试 Service 文档](/zh/docs/tasks/debug/debug-applications/debug-service/)中的介绍,
|
||||
确保你的 `Service` 处于 Running 态,有 `Endpoints` 被创建,`Pod` 真的在提供服务;
|
||||
DNS 服务已配置并正常工作,iptables 规则也以安装并且 `kube-proxy` 也没有异常行为。
|
||||
|
||||
你也可以访问[故障排查文档](/zh/docs/tasks/debug-application-cluster/troubleshooting/ )来获取更多信息。
|
||||
你也可以访问[故障排查文档](/zh/docs/tasks/debug/overview/)来获取更多信息。
|
|
@ -3,6 +3,14 @@ title: 调试运行中的 Pod
|
|||
content_type: task
|
||||
---
|
||||
|
||||
<!--
|
||||
reviewers:
|
||||
- verb
|
||||
- soltysh
|
||||
title: Debug Running Pods
|
||||
content_type: task
|
||||
-->
|
||||
|
||||
<!-- overview -->
|
||||
<!--
|
||||
This page explains how to debug Pods running (or crashing) on a Node.
|
||||
|
@ -13,24 +21,426 @@ This page explains how to debug Pods running (or crashing) on a Node.
|
|||
|
||||
<!--
|
||||
* Your {{< glossary_tooltip text="Pod" term_id="pod" >}} should already be
|
||||
scheduled and running. If your Pod is not yet running, start with [Troubleshoot
|
||||
Applications](/docs/tasks/debug-application-cluster/debug-application/).
|
||||
scheduled and running. If your Pod is not yet running, start with [Debugging
|
||||
Pods](/docs/tasks/debug/debug-application/).
|
||||
* For some of the advanced debugging steps you need to know on which Node the
|
||||
Pod is running and have shell access to run commands on that Node. You don't
|
||||
need that access to run the standard debug steps that use `kubectl`.
|
||||
-->
|
||||
* 你的 {{< glossary_tooltip text="Pod" term_id="pod" >}} 应该已经被调度并正在运行中,
|
||||
如果你的 Pod 还没有运行,请参阅[应用问题排查](/zh/docs/tasks/debug-application-cluster/debug-application/)。
|
||||
如果你的 Pod 还没有运行,请参阅[调试 Pod](/zh/docs/tasks/debug/debug-application/)。
|
||||
|
||||
* 对于一些高级调试步骤,你应该知道 Pod 具体运行在哪个节点上,在该节点上有权限去运行一些命令。
|
||||
* 对于一些高级调试步骤,你应该知道 Pod 具体运行在哪个节点上,并具有在该节点上运行命令的 shell 访问权限。
|
||||
你不需要任何访问权限就可以使用 `kubectl` 去运行一些标准调试步骤。
|
||||
|
||||
<!-- steps -->
|
||||
<!--
|
||||
## Using `kubectl describe pod` to fetch details about pods
|
||||
-->
|
||||
## 使用 `kubectl describe pod` 命令获取 Pod 详情
|
||||
|
||||
<!--
|
||||
For this example we'll use a Deployment to create two pods, similar to the earlier example.
|
||||
-->
|
||||
与之前的例子类似,我们使用一个 Deployment 来创建两个 Pod。
|
||||
|
||||
{{< codenew file="application/nginx-with-request.yaml" >}}
|
||||
|
||||
<!--
|
||||
Create deployment by running following command:
|
||||
-->
|
||||
使用如下命令创建 Deployment:
|
||||
|
||||
```shell
|
||||
kubectl apply -f https://k8s.io/examples/application/nginx-with-request.yaml
|
||||
```
|
||||
|
||||
```
|
||||
deployment.apps/nginx-deployment created
|
||||
```
|
||||
|
||||
<!--
|
||||
Check pod status by following command:
|
||||
-->
|
||||
使用如下命令查看 Pod 状态:
|
||||
|
||||
```shell
|
||||
kubectl get pods
|
||||
```
|
||||
|
||||
```
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
nginx-deployment-67d4bdd6f5-cx2nz 1/1 Running 0 13s
|
||||
nginx-deployment-67d4bdd6f5-w6kd7 1/1 Running 0 13s
|
||||
```
|
||||
|
||||
<!--
|
||||
We can retrieve a lot more information about each of these pods using `kubectl describe pod`. For example:
|
||||
-->
|
||||
我们可以使用 `kubectl describe pod` 命令来查询每个 Pod 的更多信息,比如:
|
||||
|
||||
```shell
|
||||
kubectl describe pod nginx-deployment-67d4bdd6f5-w6kd7
|
||||
```
|
||||
|
||||
```none
|
||||
Name: nginx-deployment-67d4bdd6f5-w6kd7
|
||||
Namespace: default
|
||||
Priority: 0
|
||||
Node: kube-worker-1/192.168.0.113
|
||||
Start Time: Thu, 17 Feb 2022 16:51:01 -0500
|
||||
Labels: app=nginx
|
||||
pod-template-hash=67d4bdd6f5
|
||||
Annotations: <none>
|
||||
Status: Running
|
||||
IP: 10.88.0.3
|
||||
IPs:
|
||||
IP: 10.88.0.3
|
||||
IP: 2001:db8::1
|
||||
Controlled By: ReplicaSet/nginx-deployment-67d4bdd6f5
|
||||
Containers:
|
||||
nginx:
|
||||
Container ID: containerd://5403af59a2b46ee5a23fb0ae4b1e077f7ca5c5fb7af16e1ab21c00e0e616462a
|
||||
Image: nginx
|
||||
Image ID: docker.io/library/nginx@sha256:2834dc507516af02784808c5f48b7cbe38b8ed5d0f4837f16e78d00deb7e7767
|
||||
Port: 80/TCP
|
||||
Host Port: 0/TCP
|
||||
State: Running
|
||||
Started: Thu, 17 Feb 2022 16:51:05 -0500
|
||||
Ready: True
|
||||
Restart Count: 0
|
||||
Limits:
|
||||
cpu: 500m
|
||||
memory: 128Mi
|
||||
Requests:
|
||||
cpu: 500m
|
||||
memory: 128Mi
|
||||
Environment: <none>
|
||||
Mounts:
|
||||
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bgsgp (ro)
|
||||
Conditions:
|
||||
Type Status
|
||||
Initialized True
|
||||
Ready True
|
||||
ContainersReady True
|
||||
PodScheduled True
|
||||
Volumes:
|
||||
kube-api-access-bgsgp:
|
||||
Type: Projected (a volume that contains injected data from multiple sources)
|
||||
TokenExpirationSeconds: 3607
|
||||
ConfigMapName: kube-root-ca.crt
|
||||
ConfigMapOptional: <nil>
|
||||
DownwardAPI: true
|
||||
QoS Class: Guaranteed
|
||||
Node-Selectors: <none>
|
||||
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
|
||||
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
|
||||
Events:
|
||||
Type Reason Age From Message
|
||||
---- ------ ---- ---- -------
|
||||
Normal Scheduled 34s default-scheduler Successfully assigned default/nginx-deployment-67d4bdd6f5-w6kd7 to kube-worker-1
|
||||
Normal Pulling 31s kubelet Pulling image "nginx"
|
||||
Normal Pulled 30s kubelet Successfully pulled image "nginx" in 1.146417389s
|
||||
Normal Created 30s kubelet Created container nginx
|
||||
Normal Started 30s kubelet Started container nginx
|
||||
```
|
||||
|
||||
<!--
|
||||
Here you can see configuration information about the container(s) and Pod (labels, resource requirements, etc.), as well as status information about the container(s) and Pod (state, readiness, restart count, events, etc.).
|
||||
-->
|
||||
在这里,您可以看到有关容器和 Pod 的配置信息(标签、资源需求等),
|
||||
以及有关容器和 Pod 的状态信息(状态、就绪、重启计数、事件等) 。
|
||||
|
||||
<!--
|
||||
The container state is one of Waiting, Running, or Terminated. Depending on the state, additional information will be provided -- here you can see that for a container in Running state, the system tells you when the container started.
|
||||
-->
|
||||
容器状态是 Waiting、Running 和 Terminated 之一。
|
||||
根据状态的不同,还有对应的额外的信息 —— 在这里你可以看到,
|
||||
对于处于运行状态的容器,系统会告诉你容器的启动时间。
|
||||
|
||||
<!--
|
||||
Ready tells you whether the container passed its last readiness probe. (In this case, the container does not have a readiness probe configured; the container is assumed to be ready if no readiness probe is configured.)
|
||||
-->
|
||||
Ready 指示是否通过了最后一个就绪态探测。
|
||||
(在本例中,容器没有配置就绪态探测;如果没有配置就绪态探测,则假定容器已经就绪。)
|
||||
|
||||
<!--
|
||||
Restart Count tells you how many times the container has been restarted; this information can be useful for detecting crash loops in containers that are configured with a restart policy of 'always.'
|
||||
-->
|
||||
Restart Count 告诉你容器已重启的次数;
|
||||
这些信息对于定位配置了 “Always” 重启策略的容器持续崩溃问题非常有用。
|
||||
|
||||
<!--
|
||||
Currently the only Condition associated with a Pod is the binary Ready condition, which indicates that the pod is able to service requests and should be added to the load balancing pools of all matching services.
|
||||
-->
|
||||
目前,唯一与 Pod 有关的状态是 Ready 状况,该状况表明 Pod 能够为请求提供服务,
|
||||
并且应该添加到相应服务的负载均衡池中。
|
||||
|
||||
<!--
|
||||
Lastly, you see a log of recent events related to your Pod. The system compresses multiple identical events by indicating the first and last time it was seen and the number of times it was seen. "From" indicates the component that is logging the event, "SubobjectPath" tells you which object (e.g. container within the pod) is being referred to, and "Reason" and "Message" tell you what happened.
|
||||
-->
|
||||
最后,你还可以看到与 Pod 相关的近期事件。
|
||||
系统通过指示第一次和最后一次看到事件以及看到该事件的次数来压缩多个相同的事件。
|
||||
“From” 标明记录事件的组件,
|
||||
“SubobjectPath” 告诉你引用了哪个对象(例如 Pod 中的容器),
|
||||
“Reason” 和 “Message” 告诉你发生了什么。
|
||||
|
||||
<!--
|
||||
## Example: debugging Pending Pods
|
||||
|
||||
A common scenario that you can detect using events is when you've created a Pod that won't fit on any node. For example, the Pod might request more resources than are free on any node, or it might specify a label selector that doesn't match any nodes. Let's say we created the previous Deployment with 5 replicas (instead of 2) and requesting 600 millicores instead of 500, on a four-node cluster where each (virtual) machine has 1 CPU. In that case one of the Pods will not be able to schedule. (Note that because of the cluster addon pods such as fluentd, skydns, etc., that run on each node, if we requested 1000 millicores then none of the Pods would be able to schedule.)
|
||||
-->
|
||||
## 例子: 调试 Pending 状态的 Pod
|
||||
|
||||
可以使用事件来调试的一个常见的场景是,你创建 Pod 无法被调度到任何节点。
|
||||
比如,Pod 请求的资源比较多,没有任何一个节点能够满足,或者它指定了一个标签,没有节点可匹配。
|
||||
假定我们创建之前的 Deployment 时指定副本数是 5(不再是 2),并且请求 600 毫核(不再是 500),
|
||||
对于一个 4 个节点的集群,若每个节点只有 1 个 CPU,这时至少有一个 Pod 不能被调度。
|
||||
(需要注意的是,其他集群插件 Pod,比如 fluentd、skydns 等等会在每个节点上运行,
|
||||
如果我们需求 1000 毫核,将不会有 Pod 会被调度。)
|
||||
|
||||
```shell
|
||||
kubectl get pods
|
||||
```
|
||||
|
||||
```
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
nginx-deployment-1006230814-6winp 1/1 Running 0 7m
|
||||
nginx-deployment-1006230814-fmgu3 1/1 Running 0 7m
|
||||
nginx-deployment-1370807587-6ekbw 1/1 Running 0 1m
|
||||
nginx-deployment-1370807587-fg172 0/1 Pending 0 1m
|
||||
nginx-deployment-1370807587-fz9sd 0/1 Pending 0 1m
|
||||
```
|
||||
|
||||
<!--
|
||||
To find out why the nginx-deployment-1370807587-fz9sd pod is not running, we can use `kubectl describe pod` on the pending Pod and look at its events:
|
||||
-->
|
||||
为了查找 Pod nginx-deployment-1370807587-fz9sd 没有运行的原因,我们可以使用
|
||||
`kubectl describe pod` 命令描述 Pod,查看其事件:
|
||||
|
||||
```shell
|
||||
kubectl describe pod nginx-deployment-1370807587-fz9sd
|
||||
```
|
||||
|
||||
```none
|
||||
Name: nginx-deployment-1370807587-fz9sd
|
||||
Namespace: default
|
||||
Node: /
|
||||
Labels: app=nginx,pod-template-hash=1370807587
|
||||
Status: Pending
|
||||
IP:
|
||||
Controllers: ReplicaSet/nginx-deployment-1370807587
|
||||
Containers:
|
||||
nginx:
|
||||
Image: nginx
|
||||
Port: 80/TCP
|
||||
QoS Tier:
|
||||
memory: Guaranteed
|
||||
cpu: Guaranteed
|
||||
Limits:
|
||||
cpu: 1
|
||||
memory: 128Mi
|
||||
Requests:
|
||||
cpu: 1
|
||||
memory: 128Mi
|
||||
Environment Variables:
|
||||
Volumes:
|
||||
default-token-4bcbi:
|
||||
Type: Secret (a volume populated by a Secret)
|
||||
SecretName: default-token-4bcbi
|
||||
Events:
|
||||
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
|
||||
--------- -------- ----- ---- ------------- -------- ------ -------
|
||||
1m 48s 7 {default-scheduler } Warning FailedScheduling pod (nginx-deployment-1370807587-fz9sd) failed to fit in any node
|
||||
fit failure on node (kubernetes-node-6ta5): Node didn't have enough resource: CPU, requested: 1000, used: 1420, capacity: 2000
|
||||
fit failure on node (kubernetes-node-wul5): Node didn't have enough resource: CPU, requested: 1000, used: 1100, capacity: 2000
|
||||
```
|
||||
|
||||
<!--
|
||||
Here you can see the event generated by the scheduler saying that the Pod failed to schedule for reason `FailedScheduling` (and possibly others). The message tells us that there were not enough resources for the Pod on any of the nodes.
|
||||
-->
|
||||
这里你可以看到由调度器记录的事件,它表明了 Pod 不能被调度的原因是 `FailedScheduling`(也可能是其他值)。
|
||||
其 message 部分表明没有任何节点拥有足够多的资源。
|
||||
|
||||
<!--
|
||||
To correct this situation, you can use `kubectl scale` to update your Deployment to specify four or fewer replicas. (Or you could leave the one Pod pending, which is harmless.)
|
||||
-->
|
||||
要纠正这种情况,可以使用 `kubectl scale` 更新 Deployment,以指定 4 个或更少的副本。
|
||||
(或者你可以让 Pod 继续保持这个状态,这是无害的。)
|
||||
|
||||
<!--
|
||||
Events such as the ones you saw at the end of `kubectl describe pod` are persisted in etcd and provide high-level information on what is happening in the cluster. To list all events you can use
|
||||
-->
|
||||
你在 `kubectl describe pod` 结尾处看到的事件都保存在 etcd 中,
|
||||
并提供关于集群中正在发生的事情的高级信息。
|
||||
如果需要列出所有事件,可使用命令:
|
||||
|
||||
```shell
|
||||
kubectl get events
|
||||
```
|
||||
|
||||
<!--
|
||||
but you have to remember that events are namespaced. This means that if you're interested in events for some namespaced object (e.g. what happened with Pods in namespace `my-namespace`) you need to explicitly provide a namespace to the command:
|
||||
-->
|
||||
但是,需要注意的是,事件是区分名字空间的。
|
||||
如果你对某些名字空间域的对象(比如 `my-namespace` 名字下的 Pod)的事件感兴趣,
|
||||
你需要显式地在命令行中指定名字空间:
|
||||
|
||||
```shell
|
||||
kubectl get events --namespace=my-namespace
|
||||
```
|
||||
|
||||
<!--
|
||||
To see events from all namespaces, you can use the `--all-namespaces` argument.
|
||||
-->
|
||||
查看所有 namespace 的事件,可使用 `--all-namespaces` 参数。
|
||||
|
||||
<!--
|
||||
In addition to `kubectl describe pod`, another way to get extra information about a pod (beyond what is provided by `kubectl get pod`) is to pass the `-o yaml` output format flag to `kubectl get pod`. This will give you, in YAML format, even more information than `kubectl describe pod`--essentially all of the information the system has about the Pod. Here you will see things like annotations (which are key-value metadata without the label restrictions, that is used internally by Kubernetes system components), restart policy, ports, and volumes.
|
||||
-->
|
||||
除了 `kubectl describe pod` 以外,另一种获取 Pod 额外信息(除了 `kubectl get pod`)的方法
|
||||
是给 `kubectl get pod` 增加 `-o yaml` 输出格式参数。
|
||||
该命令将以 YAML 格式为你提供比 `kubectl describe pod` 更多的信息 —— 实际上是系统拥有的关于 Pod 的所有信息。
|
||||
在这里,你将看到注解(没有标签限制的键值元数据,由 Kubernetes 系统组件在内部使用)、
|
||||
重启策略、端口和卷等。
|
||||
|
||||
```shell
|
||||
kubectl get pod nginx-deployment-1006230814-6winp -o yaml
|
||||
```
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
creationTimestamp: "2022-02-17T21:51:01Z"
|
||||
generateName: nginx-deployment-67d4bdd6f5-
|
||||
labels:
|
||||
app: nginx
|
||||
pod-template-hash: 67d4bdd6f5
|
||||
name: nginx-deployment-67d4bdd6f5-w6kd7
|
||||
namespace: default
|
||||
ownerReferences:
|
||||
- apiVersion: apps/v1
|
||||
blockOwnerDeletion: true
|
||||
controller: true
|
||||
kind: ReplicaSet
|
||||
name: nginx-deployment-67d4bdd6f5
|
||||
uid: 7d41dfd4-84c0-4be4-88ab-cedbe626ad82
|
||||
resourceVersion: "1364"
|
||||
uid: a6501da1-0447-4262-98eb-c03d4002222e
|
||||
spec:
|
||||
containers:
|
||||
- image: nginx
|
||||
imagePullPolicy: Always
|
||||
name: nginx
|
||||
ports:
|
||||
- containerPort: 80
|
||||
protocol: TCP
|
||||
resources:
|
||||
limits:
|
||||
cpu: 500m
|
||||
memory: 128Mi
|
||||
requests:
|
||||
cpu: 500m
|
||||
memory: 128Mi
|
||||
terminationMessagePath: /dev/termination-log
|
||||
terminationMessagePolicy: File
|
||||
volumeMounts:
|
||||
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
|
||||
name: kube-api-access-bgsgp
|
||||
readOnly: true
|
||||
dnsPolicy: ClusterFirst
|
||||
enableServiceLinks: true
|
||||
nodeName: kube-worker-1
|
||||
preemptionPolicy: PreemptLowerPriority
|
||||
priority: 0
|
||||
restartPolicy: Always
|
||||
schedulerName: default-scheduler
|
||||
securityContext: {}
|
||||
serviceAccount: default
|
||||
serviceAccountName: default
|
||||
terminationGracePeriodSeconds: 30
|
||||
tolerations:
|
||||
- effect: NoExecute
|
||||
key: node.kubernetes.io/not-ready
|
||||
operator: Exists
|
||||
tolerationSeconds: 300
|
||||
- effect: NoExecute
|
||||
key: node.kubernetes.io/unreachable
|
||||
operator: Exists
|
||||
tolerationSeconds: 300
|
||||
volumes:
|
||||
- name: kube-api-access-bgsgp
|
||||
projected:
|
||||
defaultMode: 420
|
||||
sources:
|
||||
- serviceAccountToken:
|
||||
expirationSeconds: 3607
|
||||
path: token
|
||||
- configMap:
|
||||
items:
|
||||
- key: ca.crt
|
||||
path: ca.crt
|
||||
name: kube-root-ca.crt
|
||||
- downwardAPI:
|
||||
items:
|
||||
- fieldRef:
|
||||
apiVersion: v1
|
||||
fieldPath: metadata.namespace
|
||||
path: namespace
|
||||
status:
|
||||
conditions:
|
||||
- lastProbeTime: null
|
||||
lastTransitionTime: "2022-02-17T21:51:01Z"
|
||||
status: "True"
|
||||
type: Initialized
|
||||
- lastProbeTime: null
|
||||
lastTransitionTime: "2022-02-17T21:51:06Z"
|
||||
status: "True"
|
||||
type: Ready
|
||||
- lastProbeTime: null
|
||||
lastTransitionTime: "2022-02-17T21:51:06Z"
|
||||
status: "True"
|
||||
type: ContainersReady
|
||||
- lastProbeTime: null
|
||||
lastTransitionTime: "2022-02-17T21:51:01Z"
|
||||
status: "True"
|
||||
type: PodScheduled
|
||||
containerStatuses:
|
||||
- containerID: containerd://5403af59a2b46ee5a23fb0ae4b1e077f7ca5c5fb7af16e1ab21c00e0e616462a
|
||||
image: docker.io/library/nginx:latest
|
||||
imageID: docker.io/library/nginx@sha256:2834dc507516af02784808c5f48b7cbe38b8ed5d0f4837f16e78d00deb7e7767
|
||||
lastState: {}
|
||||
name: nginx
|
||||
ready: true
|
||||
restartCount: 0
|
||||
started: true
|
||||
state:
|
||||
running:
|
||||
startedAt: "2022-02-17T21:51:05Z"
|
||||
hostIP: 192.168.0.113
|
||||
phase: Running
|
||||
podIP: 10.88.0.3
|
||||
podIPs:
|
||||
- ip: 10.88.0.3
|
||||
- ip: 2001:db8::1
|
||||
qosClass: Guaranteed
|
||||
startTime: "2022-02-17T21:51:01Z"
|
||||
```
|
||||
|
||||
<!--
|
||||
## Examining pod logs {#examine-pod-logs}
|
||||
|
||||
First, look at the logs of the affected container:
|
||||
|
||||
```shell
|
||||
kubectl logs ${POD_NAME} ${CONTAINER_NAME}
|
||||
```
|
||||
|
||||
If your container has previously crashed, you can access the previous container's crash log with:
|
||||
|
||||
```shell
|
||||
kubectl logs --previous ${POD_NAME} ${CONTAINER_NAME}
|
||||
```
|
||||
-->
|
||||
## 检查 Pod 的日志 {#examine-pod-logs}
|
||||
|
||||
|
@ -40,10 +450,6 @@ First, look at the logs of the affected container:
|
|||
kubectl logs ${POD_NAME} ${CONTAINER_NAME}
|
||||
```
|
||||
|
||||
<!--
|
||||
If your container has previously crashed, you can access the previous container's crash log with:
|
||||
-->
|
||||
|
||||
如果你的容器之前崩溃过,你可以通过下面命令访问之前容器的崩溃日志:
|
||||
|
||||
```shell
|
||||
|
@ -57,6 +463,30 @@ If the {{< glossary_tooltip text="container image" term_id="image" >}} includes
|
|||
debugging utilities, as is the case with images built from Linux and Windows OS
|
||||
base images, you can run commands inside a specific container with
|
||||
`kubectl exec`:
|
||||
|
||||
```shell
|
||||
kubectl exec ${POD_NAME} -c ${CONTAINER_NAME} -- ${CMD} ${ARG1} ${ARG2} ... ${ARGN}
|
||||
```
|
||||
|
||||
{{< note >}}
|
||||
`-c ${CONTAINER_NAME}` is optional. You can omit it for Pods that only contain a single container.
|
||||
{{< /note >}}
|
||||
|
||||
As an example, to look at the logs from a running Cassandra pod, you might run
|
||||
|
||||
```shell
|
||||
kubectl exec cassandra -- cat /var/log/cassandra/system.log
|
||||
```
|
||||
|
||||
You can run a shell that's connected to your terminal using the `-i` and `-t`
|
||||
arguments to `kubectl exec`, for example:
|
||||
|
||||
```shell
|
||||
kubectl exec -it cassandra -- sh
|
||||
```
|
||||
|
||||
For more details, see [Get a Shell to a Running Container](
|
||||
/docs/tasks/debug/debug-application/get-shell-running-container/).
|
||||
-->
|
||||
## 使用容器 exec 进行调试 {#container-exec}
|
||||
|
||||
|
@ -67,41 +497,23 @@ base images, you can run commands inside a specific container with
|
|||
```shell
|
||||
kubectl exec ${POD_NAME} -c ${CONTAINER_NAME} -- ${CMD} ${ARG1} ${ARG2} ... ${ARGN}
|
||||
```
|
||||
|
||||
<!--
|
||||
`-c ${CONTAINER_NAME}` is optional. You can omit it for Pods that only contain a single container.
|
||||
-->
|
||||
|
||||
{{< note >}}
|
||||
`-c ${CONTAINER_NAME}` 是可选择的。如果Pod中仅包含一个容器,就可以忽略它。
|
||||
{{< /note >}}
|
||||
|
||||
<!--
|
||||
As an example, to look at the logs from a running Cassandra pod, you might run
|
||||
-->
|
||||
|
||||
例如,要查看正在运行的 Cassandra Pod 中的日志,可以运行:
|
||||
例如,要查看正在运行的 Cassandra pod中的日志,可以运行:
|
||||
|
||||
```shell
|
||||
kubectl exec cassandra -- cat /var/log/cassandra/system.log
|
||||
```
|
||||
|
||||
<!--
|
||||
You can run a shell that's connected to your terminal using the `-i` and `-t`
|
||||
arguments to `kubectl exec`, for example:
|
||||
-->
|
||||
|
||||
你可以在 `kubectl exec` 命令后面加上 `-i` 和 `-t` 来运行一个连接到你的终端的 Shell,比如:
|
||||
|
||||
```shell
|
||||
kubectl exec -it cassandra -- sh
|
||||
```
|
||||
|
||||
<!--
|
||||
For more details, see [Get a Shell to a Running Container](
|
||||
/docs/tasks/debug-application-cluster/get-shell-running-container/).
|
||||
-->
|
||||
若要了解更多内容,可查看[获取正在运行容器的 Shell](/zh/docs/tasks/debug-application-cluster/get-shell-running-container/)。
|
||||
若要了解更多内容,可查看[获取正在运行容器的 Shell](/zh/docs/tasks/debug/debug-application/get-shell-running-container/)。
|
||||
|
||||
<!--
|
||||
## Debugging with an ephemeral debug container {#ephemeral-container}
|
||||
|
@ -127,14 +539,18 @@ https://github.com/GoogleContainerTools/distroless).
|
|||
You can use the `kubectl debug` command to add ephemeral containers to a
|
||||
running Pod. First, create a pod for the example:
|
||||
|
||||
This section use the `pause` container image in examples because it does not
|
||||
```shell
|
||||
kubectl run ephemeral-demo --image=k8s.gcr.io/pause:3.1 --restart=Never
|
||||
```
|
||||
|
||||
The examples in this section use the `pause` container image because it does not
|
||||
contain debugging utilities, but this method works with all container
|
||||
images.
|
||||
-->
|
||||
## 使用临时容器来调试的例子 {#ephemeral-container-example}
|
||||
|
||||
你可以使用 `kubectl debug` 命令来给正在运行中的 Pod 增加一个临时容器。
|
||||
首先,像下例一样创建一个 Pod:
|
||||
首先,像示例一样创建一个 pod:
|
||||
|
||||
```shell
|
||||
kubectl run ephemeral-demo --image=k8s.gcr.io/pause:3.1 --restart=Never
|
||||
|
@ -147,20 +563,6 @@ kubectl run ephemeral-demo --image=k8s.gcr.io/pause:3.1 --restart=Never
|
|||
<!--
|
||||
If you attempt to use `kubectl exec` to create a shell you will see an error
|
||||
because there is no shell in this container image.
|
||||
|
||||
```
|
||||
OCI runtime exec failed: exec failed: container_linux.go:346: starting container process caused "exec: \"sh\": executable file not found in $PATH": unknown
|
||||
```
|
||||
|
||||
You can instead add a debugging container using `kubectl debug`. If you
|
||||
specify the `-i`/`--interactive` argument, `kubectl` will automatically attach
|
||||
to the console of the Ephemeral Container.
|
||||
|
||||
```
|
||||
Defaulting debug container name to debugger-8xzrl.
|
||||
If you don't see a command prompt, try pressing enter.
|
||||
/ #
|
||||
```
|
||||
-->
|
||||
如果你尝试使用 `kubectl exec` 来创建一个 shell,你将会看到一个错误,因为这个容器镜像中没有 shell。
|
||||
|
||||
|
@ -172,6 +574,12 @@ kubectl exec -it ephemeral-demo -- sh
|
|||
OCI runtime exec failed: exec failed: container_linux.go:346: starting container process caused "exec: \"sh\": executable file not found in $PATH": unknown
|
||||
```
|
||||
|
||||
<!--
|
||||
You can instead add a debugging container using `kubectl debug`. If you
|
||||
specify the `-i`/`--interactive` argument, `kubectl` will automatically attach
|
||||
to the console of the Ephemeral Container.
|
||||
-->
|
||||
|
||||
你可以改为使用 `kubectl debug` 添加调试容器。
|
||||
如果你指定 `-i` 或者 `--interactive` 参数,`kubectl` 将自动挂接到临时容器的控制台。
|
||||
|
||||
|
@ -192,19 +600,22 @@ here because `kubectl run` does not enable [process namespace sharing](
|
|||
/docs/tasks/configure-pod-container/share-process-namespace/) in the pod it
|
||||
creates.
|
||||
|
||||
{{< note >}}
|
||||
The `--target` parameter must be supported by the {{< glossary_tooltip
|
||||
text="Container Runtime" term_id="container-runtime" >}}. When not supported,
|
||||
the Ephemeral Container may not be started, or it may be started with an
|
||||
isolated process namespace so that `ps` does not reveal processes in other containers.
|
||||
isolated process namespace so that `ps` does not reveal processes in other
|
||||
containers.
|
||||
{{< /note >}}
|
||||
|
||||
You can view the state of the newly created ephemeral container using `kubectl describe`:
|
||||
-->
|
||||
此命令添加一个新的 busybox 容器并将其挂接到该容器。`--target` 参数指定另一个容器的进程命名空间。
|
||||
这是必需的,因为 `kubectl run` 不能在它创建的 Pod
|
||||
中启用[共享进程命名空间](/zh/docs/tasks/configure-pod-container/share-process-namespace/)。
|
||||
这是必需的,因为 `kubectl run` 不能在它创建的pod中启用
|
||||
[共享进程命名空间](/zh/docs/tasks/configure-pod-container/share-process-namespace/)。
|
||||
|
||||
{{< note >}}
|
||||
{{< glossary_tooltip text="容器运行时" term_id="container-runtime" >}}必须支持`--target`参数。
|
||||
{{< glossary_tooltip text="容器运行时" term_id="container-runtime" >}}必须支持 `--target` 参数。
|
||||
如果不支持,则临时容器可能不会启动,或者可能使用隔离的进程命名空间启动,
|
||||
以便 `ps` 不显示其他容器内的进程。
|
||||
{{< /note >}}
|
||||
|
@ -236,7 +647,7 @@ Ephemeral Containers:
|
|||
<!--
|
||||
Use `kubectl delete` to remove the Pod when you're finished:
|
||||
-->
|
||||
完成后,使用 `kubectl delete` 来移除 Pod:
|
||||
使用 `kubectl delete` 来移除已经结束掉的 Pod:
|
||||
|
||||
```shell
|
||||
kubectl delete pod ephemeral-demo
|
||||
|
@ -255,7 +666,7 @@ crashes on startup. In these situations you can use `kubectl debug` to create a
|
|||
copy of the Pod with configuration values changed to aid debugging.
|
||||
-->
|
||||
有些时候 Pod 的配置参数使得在某些情况下很难执行故障排查。
|
||||
例如,在容器镜像中不包含 Shell 或者你的应用程序在启动时崩溃的情况下,
|
||||
例如,在容器镜像中不包含 shell 或者你的应用程序在启动时崩溃的情况下,
|
||||
就不能通过运行 `kubectl exec` 来排查容器故障。
|
||||
在这些情况下,你可以使用 `kubectl debug` 来创建 Pod 的副本,通过更改配置帮助调试。
|
||||
|
||||
|
@ -269,15 +680,15 @@ Adding a new container can be useful when your application is running but not
|
|||
behaving as you expect and you'd like to add additional troubleshooting
|
||||
utilities to the Pod.
|
||||
-->
|
||||
当应用程序正在运行但其表现不符合预期时,添加新容器可能会帮助你符合预期,
|
||||
并且你会希望在 Pod 中添加额外的调试工具。
|
||||
当应用程序正在运行但其表现不符合预期时,你会希望在 Pod 中添加额外的调试工具,
|
||||
这时添加新容器是很有用的。
|
||||
|
||||
<!--
|
||||
For example, maybe your application's container images are built on `busybox`
|
||||
but you need debugging utilities not included in `busybox`. You can simulate
|
||||
this scenario using `kubectl run`:
|
||||
-->
|
||||
例如,可能你的应用的容器镜像是基于 `busybox` 构造的,
|
||||
例如,应用的容器镜像是建立在 `busybox` 的基础上,
|
||||
但是你需要 `busybox` 中并不包含的调试工具。
|
||||
你可以使用 `kubectl run` 模拟这个场景:
|
||||
|
||||
|
@ -312,19 +723,19 @@ root@myapp-debug:/#
|
|||
/docs/tasks/configure-pod-container/share-process-namespace/).
|
||||
-->
|
||||
{{< note >}}
|
||||
* 如果你没有使用 `--container` 指定新的容器名,`kubectl debug` 会自动生成容器名称。
|
||||
* 如果你没有使用 `--container` 指定新的容器名,`kubectl debug` 会自动生成的。
|
||||
* 默认情况下,`-i` 标志使 `kubectl debug` 附加到新容器上。
|
||||
你可以通过指定 `--attach=false` 来防止这种情况。
|
||||
如果你的会话断开连接,你可以使用 `kubectl attach` 重新连接。
|
||||
* `--share-processes` 允许在此 Pod 中的其他容器中查看该容器的进程。
|
||||
参阅[在 Pod 中的容器之间共享进程命名空间](/zh/docs/tasks/configure-pod-container/share-process-namespace/),
|
||||
参阅[在 Pod 中的容器之间共享进程命名空间](/zh/docs/tasks/configure-pod-container/share-process-namespace/)
|
||||
获取更多信息。
|
||||
{{< /note >}}
|
||||
|
||||
<!--
|
||||
Don't forget to clean up the debugging Pod when you're finished with it:
|
||||
-->
|
||||
结束之后,不要忘了清理调试 Pod:
|
||||
不要忘了清理调试 Pod:
|
||||
|
||||
```shell
|
||||
kubectl delete pod myapp myapp-debug
|
||||
|
@ -393,18 +804,16 @@ checking filesystem paths or running the container command manually.
|
|||
现在你有了一个可以执行类似检查文件系统路径或者手动运行容器命令的交互式 shell。
|
||||
|
||||
<!--
|
||||
{{< note >}}
|
||||
* To change the command of a specific container you must
|
||||
specify its name using `--container` or `kubectl debug` will instead
|
||||
create a new container to run the command you specified.
|
||||
* The `-i` flag causes `kubectl debug` to attach to the container by default.
|
||||
You can prevent this by specifying `--attach=false`. If your session becomes
|
||||
disconnected you can reattach using `kubectl attach`.
|
||||
{{< /note >}}
|
||||
-->
|
||||
{{< note >}}
|
||||
* 要更改指定容器的命令,你必须用 `--container` 命令指定容器的名字,
|
||||
否则 `kubectl debug` 将创建一个新的容器运行你指定的命令。
|
||||
否则 `kubectl debug` 将建立一个新的容器运行你指定的命令。
|
||||
* 默认情况下,标志 `-i` 使 `kubectl debug` 附加到容器。
|
||||
你可通过指定 `--attach=false` 来防止这种情况。
|
||||
如果你的断开连接,可以使用 `kubectl attach` 重新连接。
|
||||
|
@ -413,7 +822,7 @@ checking filesystem paths or running the container command manually.
|
|||
<!--
|
||||
Don't forget to clean up the debugging Pod when you're finished with it:
|
||||
-->
|
||||
结束之后,不要忘了清理调试 Pod:
|
||||
不要忘了清理调试 Pod:
|
||||
|
||||
```shell
|
||||
kubectl delete pod myapp myapp-debug
|
||||
|
@ -429,10 +838,10 @@ As an example, create a Pod using `kubectl run`:
|
|||
-->
|
||||
### 在更改容器镜像时创建 Pod 副本
|
||||
|
||||
在某些情况下,你可能想改变行为异常的 Pod,
|
||||
将其中的正常生产容器镜像更改为包含调试版本或者额外工具的镜像。
|
||||
在某些情况下,你可能想从正常生产容器镜像中
|
||||
把行为异常的 Pod 改变为包含调试版本或者附加应用的镜像。
|
||||
|
||||
作为示例,用 `kubectl run` 创建一个 Pod:
|
||||
下面的例子,用 `kubectl run`创建一个 Pod:
|
||||
|
||||
```
|
||||
kubectl run myapp --image=busybox:1.28 --restart=Never -- sleep 1d
|
||||
|
@ -441,7 +850,8 @@ kubectl run myapp --image=busybox:1.28 --restart=Never -- sleep 1d
|
|||
Now use `kubectl debug` to make a copy and change its container image
|
||||
to `ubuntu`:
|
||||
-->
|
||||
现在可以使用 `kubectl debug` 创建一个副本并改变容器镜像为 `ubuntu`:
|
||||
现在可以使用 `kubectl debug` 创建一个副本
|
||||
并改变容器镜像为 `ubuntu`:
|
||||
|
||||
```
|
||||
kubectl debug myapp --copy-to=myapp-debug --set-image=*=ubuntu
|
||||
|
@ -454,7 +864,7 @@ to `ubuntu`.
|
|||
|
||||
Don't forget to clean up the debugging Pod when you're finished with it:
|
||||
-->
|
||||
`--set-image` 使用与 `kubectl set image` 相同的 `container_name=image` 语法。
|
||||
`--set-image` 与 `container_name=image` 使用相同的 `kubectl set image` 语法。
|
||||
`*=ubuntu` 表示把所有容器的镜像改为 `ubuntu`。
|
||||
|
||||
```shell
|
||||
|
@ -468,11 +878,11 @@ If none of these approaches work, you can find the Node on which the Pod is
|
|||
running and create a privileged Pod running in the host namespaces. To create
|
||||
an interactive shell on a node using `kubectl debug`, run:
|
||||
-->
|
||||
## 通过节点上的 Shell 来进行调试 {#node-shell-session}
|
||||
## 在节点上通过 shell 来进行调试 {#node-shell-session}
|
||||
|
||||
如果这些方法都不起作用,你可以找到运行 Pod 的节点,然后在节点上部署一个运行在宿主名字空间的特权 Pod。
|
||||
|
||||
你可以通过 `kubectl debug` 在节点上创建一个交互式 Shell:
|
||||
你可以通过`kubectl debug` 在节点上创建一个交互式 shell:
|
||||
|
||||
```shell
|
||||
kubectl debug node/mynode -it --image=ubuntu
|
||||
|
@ -494,9 +904,9 @@ When creating a debugging session on a node, keep in mind that:
|
|||
|
||||
Don't forget to clean up the debugging Pod when you're finished with it:
|
||||
-->
|
||||
在节点上创建调试会话时,注意以下要点:
|
||||
当在节点上创建调试会话,注意以下要点:
|
||||
* `kubectl debug` 基于节点的名字自动生成新的 Pod 的名字。
|
||||
* 新的调试容器运行在宿主 IPC、宿主网络、宿主 PID 名字空间内。
|
||||
* 新的调试容器运行在宿主命名空间里(IPC, 网络 还有PID命名空间)。
|
||||
* 节点的根文件系统会被挂载在 `/host`。
|
||||
|
||||
当你完成节点调试时,不要忘记清理调试 Pod:
|
||||
|
|
|
@ -1,6 +1,7 @@
|
|||
---
|
||||
content_type: concept
|
||||
title: 调试 Service
|
||||
weight: 20
|
||||
---
|
||||
|
||||
<!--
|
||||
|
@ -9,6 +10,7 @@ reviewers:
|
|||
- bowei
|
||||
content_type: concept
|
||||
title: Debug Services
|
||||
weight: 20
|
||||
-->
|
||||
|
||||
<!-- overview -->
|
||||
|
@ -636,7 +638,7 @@ they are running fine and not crashing.
|
|||
|
||||
The "RESTARTS" column says that these pods are not crashing frequently or being
|
||||
restarted. Frequent restarts could lead to intermittent connectivity issues.
|
||||
If the restart count is high, read more about how to [debug pods](/docs/tasks/debug-application-cluster/debug-pod-replication-controller/#debugging-pods).
|
||||
If the restart count is high, read more about how to [debug pods](/docs/tasks/debug/debug-application/debug-pods).
|
||||
|
||||
Inside the Kubernetes system is a control loop which evaluates the selector of
|
||||
every Service and saves the results into a corresponding Endpoints object.
|
||||
|
@ -646,7 +648,7 @@ every Service and saves the results into a corresponding Endpoints object.
|
|||
"AGE" 列表明这些 Pod 已经启动一个小时了,这意味着它们运行良好,而未崩溃。
|
||||
|
||||
"RESTARTS" 列表明 Pod 没有经常崩溃或重启。经常性崩溃可能导致间歇性连接问题。
|
||||
如果重启次数过大,通过[调试 pod](/zh/docs/tasks/debug-application-cluster/debug-application/#debugging-pods)
|
||||
如果重启次数过大,通过[调试 Pod](/zh/docs/tasks/debug/debug-application/debug-pods)
|
||||
了解相关技术。
|
||||
|
||||
在 Kubernetes 系统中有一个控制回路,它评估每个 Service 的选择算符,并将结果保存到 Endpoints 对象中。
|
||||
|
@ -1091,7 +1093,7 @@ Service is not working. Please let us know what is going on, so we can help
|
|||
investigate!
|
||||
|
||||
Contact us on
|
||||
[Slack](/docs/tasks/debug-application-cluster/troubleshooting/#slack) or
|
||||
[Slack](/docs/tasks/debug/overview/#slack) or
|
||||
[Forum](https://discuss.kubernetes.io) or
|
||||
[GitHub](https://github.com/kubernetes/kubernetes).
|
||||
-->
|
||||
|
@ -1102,7 +1104,7 @@ Contact us on
|
|||
然而 Service 还是没有正常工作。这种情况下,请告诉我们,以便我们可以帮助调查!
|
||||
|
||||
通过
|
||||
[Slack](/zh/docs/tasks/debug-application-cluster/troubleshooting/#slack) 或者
|
||||
[Slack](/zh/docs/tasks/debug/overview/#slack) 或者
|
||||
[Forum](https://discuss.kubernetes.io) 或者
|
||||
[GitHub](https://github.com/kubernetes/kubernetes)
|
||||
联系我们。
|
||||
|
@ -1110,7 +1112,6 @@ Contact us on
|
|||
## {{% heading "whatsnext" %}}
|
||||
|
||||
<!--
|
||||
Visit [troubleshooting document](/docs/tasks/debug-application-cluster/troubleshooting/)
|
||||
for more information.
|
||||
Visit [troubleshooting document](/docs/tasks/debug/overview/) for more information.
|
||||
-->
|
||||
访问[故障排查文档](/zh/docs/tasks/debug-application-cluster/troubleshooting/) 获取更多信息。
|
||||
|
|
|
@ -1,7 +1,21 @@
|
|||
---
|
||||
title: 调试StatefulSet
|
||||
title: 调试 StatefulSet
|
||||
content_type: task
|
||||
weight: 30
|
||||
---
|
||||
<!--
|
||||
reviewers:
|
||||
- bprashanth
|
||||
- enisoc
|
||||
- erictune
|
||||
- foxish
|
||||
- janetkuo
|
||||
- kow3ns
|
||||
- smarterclayton
|
||||
title: Debug a StatefulSet
|
||||
content_type: task
|
||||
weight: 30
|
||||
-->
|
||||
|
||||
<!-- overview -->
|
||||
<!--
|
||||
|
@ -40,18 +54,18 @@ If you find that any Pods listed are in `Unknown` or `Terminating` state for an
|
|||
refer to the [Deleting StatefulSet Pods](/docs/tasks/run-application/delete-stateful-set/) task for
|
||||
instructions on how to deal with them.
|
||||
You can debug individual Pods in a StatefulSet using the
|
||||
[Debugging Pods](/docs/tasks/debug-application-cluster/debug-pod-replication-controller/) guide.
|
||||
[Debugging Pods](/docs/tasks/debug/debug-application/debug-pods/) guide.
|
||||
-->
|
||||
如果你发现列出的任何 Pod 长时间处于 `Unknown` 或 `Terminating` 状态,请参阅
|
||||
[删除 StatefulSet Pods](/zh/docs/tasks/run-application/delete-stateful-set/)
|
||||
[删除 StatefulSet Pod](/zh/docs/tasks/run-application/delete-stateful-set/)
|
||||
了解如何处理它们的说明。
|
||||
你可以参考[调试 Pods](/zh/docs/tasks/debug-application-cluster/debug-pod-replication-controller/)
|
||||
你可以参考[调试 Pod](/zh/docs/tasks/debug/debug-application/debug-pods/)
|
||||
来调试 StatefulSet 中的各个 Pod。
|
||||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
<!--
|
||||
Learn more about [debugging an init-container](/docs/tasks/debug-application-cluster/debug-init-containers/).
|
||||
Learn more about [debugging an init-container](/docs/tasks/debug/debug-application/debug-init-containers/).
|
||||
-->
|
||||
* 进一步了解如何[调试 Init 容器](/zh/docs/tasks/debug-application-cluster/debug-init-containers/)
|
||||
* 进一步了解如何[调试 Init 容器](/zh/docs/tasks/debug/debug-application/debug-init-containers/)。
|
||||
|
|
@ -17,10 +17,12 @@ termination message.
|
|||
本文介绍如何编写和读取容器的终止消息。
|
||||
|
||||
<!--
|
||||
Termination messages provide a way for containers to write information about
|
||||
fatal events to a location where it can be easily retrieved and surfaced by
|
||||
tools like dashboards and monitoring software. In most cases, information that
|
||||
you put in a termination message should also be written to the general
|
||||
Termination messages provide a way for containers to write
|
||||
information about fatal events to a location where it can
|
||||
be easily retrieved and surfaced by tools like dashboards
|
||||
and monitoring software. In most cases, information that you
|
||||
put in a termination message should also be written to
|
||||
the general
|
||||
[Kubernetes logs](/docs/concepts/cluster-administration/logging/).
|
||||
-->
|
||||
终止消息为容器提供了一种方法,可以将有关致命事件的信息写入某个位置,
|
||||
|
@ -48,59 +50,58 @@ the container starts.
|
|||
|
||||
{{< codenew file="debug/termination.yaml" >}}
|
||||
|
||||
1. <!--Create a Pod based on the YAML configuration file:-->基于 YAML 配置文件创建 Pod:
|
||||
<!-- 1. Create a Pod based on the YAML configuration file: -->
|
||||
1. 基于 YAML 配置文件创建 Pod:
|
||||
|
||||
```shell
|
||||
kubectl create -f https://k8s.io/examples/debug/termination.yaml
|
||||
```
|
||||
kubectl apply -f https://k8s.io/examples/debug/termination.yaml
|
||||
|
||||
<!--In the YAML file, in the `command` and `args` fields, you can see that the
|
||||
<!--
|
||||
In the YAML file, in the `command` and `args` fields, you can see that the
|
||||
container sleeps for 10 seconds and then writes "Sleep expired" to
|
||||
the `/dev/termination-log` file. After the container writes
|
||||
the "Sleep expired" message, it terminates.-->
|
||||
the "Sleep expired" message, it terminates.
|
||||
-->
|
||||
YAML 文件中,在 `command` 和 `args` 字段,你可以看到容器休眠 10 秒然后将 "Sleep expired"
|
||||
写入 `/dev/termination-log` 文件。
|
||||
容器写完 "Sleep expired" 消息后就终止了。
|
||||
|
||||
1. <!--Display information about the Pod:-->显示 Pod 的信息:
|
||||
<!-- 1. Display information about the Pod: -->
|
||||
1. 显示 Pod 的信息:
|
||||
|
||||
```shell
|
||||
kubectl get pod termination-demo
|
||||
```
|
||||
kubectl get pod termination-demo
|
||||
|
||||
<!--Repeat the preceding command until the Pod is no longer running.-->
|
||||
重复前面的命令直到 Pod 不再运行。
|
||||
|
||||
1. <!--Display detailed information about the Pod:-->
|
||||
显示 Pod 的详细信息:
|
||||
<!-- 1. Display detailed information about the Pod: -->
|
||||
1. 显示 Pod 的详细信息:
|
||||
|
||||
```shell
|
||||
kubectl get pod --output=yaml
|
||||
```
|
||||
kubectl get pod termination-demo --output=yaml
|
||||
|
||||
<!--The output includes the "Sleep expired" message:-->输出结果包含 "Sleep expired" 消息:
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
...
|
||||
lastState:
|
||||
terminated:
|
||||
containerID: ...
|
||||
exitCode: 0
|
||||
finishedAt: ...
|
||||
message: |
|
||||
Sleep expired
|
||||
...
|
||||
```
|
||||
<!--The output includes the "Sleep expired" message:-->
|
||||
输出结果包含 "Sleep expired" 消息:
|
||||
|
||||
1. <!--Use a Go template to filter the output so that it includes only the termination message:-->
|
||||
使用 Go 模板过滤输出结果,使其只含有终止消息:
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
...
|
||||
lastState:
|
||||
terminated:
|
||||
containerID: ...
|
||||
exitCode: 0
|
||||
finishedAt: ...
|
||||
message: |
|
||||
Sleep expired
|
||||
...
|
||||
|
||||
```shell
|
||||
kubectl get pod termination-demo -o go-template="{{range .status.containerStatuses}}{{.lastState.terminated.message}}{{end}}"
|
||||
```
|
||||
<!--
|
||||
1. Use a Go template to filter the output so that it includes
|
||||
only the termination message:
|
||||
-->
|
||||
1. 使用 Go 模板过滤输出结果,使其只含有终止消息:
|
||||
|
||||
kubectl get pod termination-demo -o go-template="{{range .status.containerStatuses}}{{.lastState.terminated.message}}{{end}}"
|
||||
|
||||
<!
|
||||
<!--
|
||||
If you are running a multi-container pod, you can use a Go template to include the container's name. By doing so, you can discover which of the containers is failing:
|
||||
-->
|
||||
如果你正在运行多容器 Pod,则可以使用 Go 模板来包含容器的名称。这样,你可以发现哪些容器出现故障:
|
|
@ -0,0 +1,457 @@
|
|||
---
|
||||
title: 集群故障排查
|
||||
description: 调试常见的集群问题。
|
||||
weight: 20
|
||||
no_list: true
|
||||
---
|
||||
<!--
|
||||
reviewers:
|
||||
- davidopp
|
||||
title: "Troubleshooting Clusters"
|
||||
description: Debugging common cluster issues.
|
||||
weight: 20
|
||||
no_list: true
|
||||
-->
|
||||
|
||||
<!-- overview -->
|
||||
|
||||
<!--
|
||||
This doc is about cluster troubleshooting; we assume you have already ruled out your application as the root cause of the
|
||||
problem you are experiencing. See
|
||||
the [application troubleshooting guide](/docs/tasks/debug/debug-application/) for tips on application debugging.
|
||||
You may also visit the [troubleshooting overview document](/docs/tasks/debug/) for more information.
|
||||
-->
|
||||
本篇文档是介绍集群故障排查的;我们假设对于你碰到的问题,你已经排除了是由应用程序造成的。
|
||||
对于应用的调试,请参阅[应用故障排查指南](/zh/docs/tasks/debug/debug-application/)。
|
||||
你也可以访问[故障排查](/zh/docs/tasks/debug/)来获取更多的信息。
|
||||
|
||||
<!-- body -->
|
||||
|
||||
<!--
|
||||
## Listing your cluster
|
||||
|
||||
The first thing to debug in your cluster is if your nodes are all registered correctly.
|
||||
|
||||
Run the following command:
|
||||
-->
|
||||
## 列举集群节点
|
||||
|
||||
调试的第一步是查看所有的节点是否都已正确注册。
|
||||
|
||||
运行以下命令:
|
||||
|
||||
```shell
|
||||
kubectl get nodes
|
||||
```
|
||||
|
||||
<!--
|
||||
And verify that all of the nodes you expect to see are present and that they are all in the `Ready` state.
|
||||
|
||||
To get detailed information about the overall health of your cluster, you can run:
|
||||
-->
|
||||
验证你所希望看见的所有节点都能够显示出来,并且都处于 `Ready` 状态。
|
||||
|
||||
为了了解你的集群的总体健康状况详情,你可以运行:
|
||||
|
||||
```shell
|
||||
kubectl cluster-info dump
|
||||
```
|
||||
|
||||
<!--
|
||||
### Example: debugging a down/unreachable node
|
||||
|
||||
Sometimes when debugging it can be useful to look at the status of a node -- for example, because you've noticed strange behavior of a Pod that's running on the node, or to find out why a Pod won't schedule onto the node. As with Pods, you can use `kubectl describe node` and `kubectl get node -o yaml` to retrieve detailed information about nodes. For example, here's what you'll see if a node is down (disconnected from the network, or kubelet dies and won't restart, etc.). Notice the events that show the node is NotReady, and also notice that the pods are no longer running (they are evicted after five minutes of NotReady status).
|
||||
-->
|
||||
### 示例:调试关闭/无法访问的节点
|
||||
|
||||
有时在调试时查看节点的状态很有用——例如,因为你注意到在节点上运行的 Pod 的奇怪行为,
|
||||
或者找出为什么 Pod 不会调度到节点上。与 Pod 一样,你可以使用 `kubectl describe node`
|
||||
和 `kubectl get node -o yaml` 来检索有关节点的详细信息。
|
||||
例如,如果节点关闭(与网络断开连接,或者 kubelet 进程挂起并且不会重新启动等),
|
||||
你将看到以下内容。请注意显示节点为 NotReady 的事件,并注意 Pod 不再运行(它们在 NotReady 状态五分钟后被驱逐)。
|
||||
|
||||
```shell
|
||||
kubectl get nodes
|
||||
```
|
||||
|
||||
```none
|
||||
NAME STATUS ROLES AGE VERSION
|
||||
kube-worker-1 NotReady <none> 1h v1.23.3
|
||||
kubernetes-node-bols Ready <none> 1h v1.23.3
|
||||
kubernetes-node-st6x Ready <none> 1h v1.23.3
|
||||
kubernetes-node-unaj Ready <none> 1h v1.23.3
|
||||
```
|
||||
|
||||
```shell
|
||||
kubectl describe node kube-worker-1
|
||||
```
|
||||
|
||||
```none
|
||||
Name: kube-worker-1
|
||||
Roles: <none>
|
||||
Labels: beta.kubernetes.io/arch=amd64
|
||||
beta.kubernetes.io/os=linux
|
||||
kubernetes.io/arch=amd64
|
||||
kubernetes.io/hostname=kube-worker-1
|
||||
kubernetes.io/os=linux
|
||||
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /run/containerd/containerd.sock
|
||||
node.alpha.kubernetes.io/ttl: 0
|
||||
volumes.kubernetes.io/controller-managed-attach-detach: true
|
||||
CreationTimestamp: Thu, 17 Feb 2022 16:46:30 -0500
|
||||
Taints: node.kubernetes.io/unreachable:NoExecute
|
||||
node.kubernetes.io/unreachable:NoSchedule
|
||||
Unschedulable: false
|
||||
Lease:
|
||||
HolderIdentity: kube-worker-1
|
||||
AcquireTime: <unset>
|
||||
RenewTime: Thu, 17 Feb 2022 17:13:09 -0500
|
||||
Conditions:
|
||||
Type Status LastHeartbeatTime LastTransitionTime Reason Message
|
||||
---- ------ ----------------- ------------------ ------ -------
|
||||
NetworkUnavailable False Thu, 17 Feb 2022 17:09:13 -0500 Thu, 17 Feb 2022 17:09:13 -0500 WeaveIsUp Weave pod has set this
|
||||
MemoryPressure Unknown Thu, 17 Feb 2022 17:12:40 -0500 Thu, 17 Feb 2022 17:13:52 -0500 NodeStatusUnknown Kubelet stopped posting node status.
|
||||
DiskPressure Unknown Thu, 17 Feb 2022 17:12:40 -0500 Thu, 17 Feb 2022 17:13:52 -0500 NodeStatusUnknown Kubelet stopped posting node status.
|
||||
PIDPressure Unknown Thu, 17 Feb 2022 17:12:40 -0500 Thu, 17 Feb 2022 17:13:52 -0500 NodeStatusUnknown Kubelet stopped posting node status.
|
||||
Ready Unknown Thu, 17 Feb 2022 17:12:40 -0500 Thu, 17 Feb 2022 17:13:52 -0500 NodeStatusUnknown Kubelet stopped posting node status.
|
||||
Addresses:
|
||||
InternalIP: 192.168.0.113
|
||||
Hostname: kube-worker-1
|
||||
Capacity:
|
||||
cpu: 2
|
||||
ephemeral-storage: 15372232Ki
|
||||
hugepages-2Mi: 0
|
||||
memory: 2025188Ki
|
||||
pods: 110
|
||||
Allocatable:
|
||||
cpu: 2
|
||||
ephemeral-storage: 14167048988
|
||||
hugepages-2Mi: 0
|
||||
memory: 1922788Ki
|
||||
pods: 110
|
||||
System Info:
|
||||
Machine ID: 9384e2927f544209b5d7b67474bbf92b
|
||||
System UUID: aa829ca9-73d7-064d-9019-df07404ad448
|
||||
Boot ID: 5a295a03-aaca-4340-af20-1327fa5dab5c
|
||||
Kernel Version: 5.13.0-28-generic
|
||||
OS Image: Ubuntu 21.10
|
||||
Operating System: linux
|
||||
Architecture: amd64
|
||||
Container Runtime Version: containerd://1.5.9
|
||||
Kubelet Version: v1.23.3
|
||||
Kube-Proxy Version: v1.23.3
|
||||
Non-terminated Pods: (4 in total)
|
||||
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
|
||||
--------- ---- ------------ ---------- --------------- ------------- ---
|
||||
default nginx-deployment-67d4bdd6f5-cx2nz 500m (25%) 500m (25%) 128Mi (6%) 128Mi (6%) 23m
|
||||
default nginx-deployment-67d4bdd6f5-w6kd7 500m (25%) 500m (25%) 128Mi (6%) 128Mi (6%) 23m
|
||||
kube-system kube-proxy-dnxbz 0 (0%) 0 (0%) 0 (0%) 0 (0%) 28m
|
||||
kube-system weave-net-gjxxp 100m (5%) 0 (0%) 200Mi (10%) 0 (0%) 28m
|
||||
Allocated resources:
|
||||
(Total limits may be over 100 percent, i.e., overcommitted.)
|
||||
Resource Requests Limits
|
||||
-------- -------- ------
|
||||
cpu 1100m (55%) 1 (50%)
|
||||
memory 456Mi (24%) 256Mi (13%)
|
||||
ephemeral-storage 0 (0%) 0 (0%)
|
||||
hugepages-2Mi 0 (0%) 0 (0%)
|
||||
Events:
|
||||
...
|
||||
```
|
||||
|
||||
```shell
|
||||
kubectl get node kube-worker-1 -o yaml
|
||||
```
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Node
|
||||
metadata:
|
||||
annotations:
|
||||
kubeadm.alpha.kubernetes.io/cri-socket: /run/containerd/containerd.sock
|
||||
node.alpha.kubernetes.io/ttl: "0"
|
||||
volumes.kubernetes.io/controller-managed-attach-detach: "true"
|
||||
creationTimestamp: "2022-02-17T21:46:30Z"
|
||||
labels:
|
||||
beta.kubernetes.io/arch: amd64
|
||||
beta.kubernetes.io/os: linux
|
||||
kubernetes.io/arch: amd64
|
||||
kubernetes.io/hostname: kube-worker-1
|
||||
kubernetes.io/os: linux
|
||||
name: kube-worker-1
|
||||
resourceVersion: "4026"
|
||||
uid: 98efe7cb-2978-4a0b-842a-1a7bf12c05f8
|
||||
spec: {}
|
||||
status:
|
||||
addresses:
|
||||
- address: 192.168.0.113
|
||||
type: InternalIP
|
||||
- address: kube-worker-1
|
||||
type: Hostname
|
||||
allocatable:
|
||||
cpu: "2"
|
||||
ephemeral-storage: "14167048988"
|
||||
hugepages-2Mi: "0"
|
||||
memory: 1922788Ki
|
||||
pods: "110"
|
||||
capacity:
|
||||
cpu: "2"
|
||||
ephemeral-storage: 15372232Ki
|
||||
hugepages-2Mi: "0"
|
||||
memory: 2025188Ki
|
||||
pods: "110"
|
||||
conditions:
|
||||
- lastHeartbeatTime: "2022-02-17T22:20:32Z"
|
||||
lastTransitionTime: "2022-02-17T22:20:32Z"
|
||||
message: Weave pod has set this
|
||||
reason: WeaveIsUp
|
||||
status: "False"
|
||||
type: NetworkUnavailable
|
||||
- lastHeartbeatTime: "2022-02-17T22:20:15Z"
|
||||
lastTransitionTime: "2022-02-17T22:13:25Z"
|
||||
message: kubelet has sufficient memory available
|
||||
reason: KubeletHasSufficientMemory
|
||||
status: "False"
|
||||
type: MemoryPressure
|
||||
- lastHeartbeatTime: "2022-02-17T22:20:15Z"
|
||||
lastTransitionTime: "2022-02-17T22:13:25Z"
|
||||
message: kubelet has no disk pressure
|
||||
reason: KubeletHasNoDiskPressure
|
||||
status: "False"
|
||||
type: DiskPressure
|
||||
- lastHeartbeatTime: "2022-02-17T22:20:15Z"
|
||||
lastTransitionTime: "2022-02-17T22:13:25Z"
|
||||
message: kubelet has sufficient PID available
|
||||
reason: KubeletHasSufficientPID
|
||||
status: "False"
|
||||
type: PIDPressure
|
||||
- lastHeartbeatTime: "2022-02-17T22:20:15Z"
|
||||
lastTransitionTime: "2022-02-17T22:15:15Z"
|
||||
message: kubelet is posting ready status. AppArmor enabled
|
||||
reason: KubeletReady
|
||||
status: "True"
|
||||
type: Ready
|
||||
daemonEndpoints:
|
||||
kubeletEndpoint:
|
||||
Port: 10250
|
||||
nodeInfo:
|
||||
architecture: amd64
|
||||
bootID: 22333234-7a6b-44d4-9ce1-67e31dc7e369
|
||||
containerRuntimeVersion: containerd://1.5.9
|
||||
kernelVersion: 5.13.0-28-generic
|
||||
kubeProxyVersion: v1.23.3
|
||||
kubeletVersion: v1.23.3
|
||||
machineID: 9384e2927f544209b5d7b67474bbf92b
|
||||
operatingSystem: linux
|
||||
osImage: Ubuntu 21.10
|
||||
systemUUID: aa829ca9-73d7-064d-9019-df07404ad448
|
||||
```
|
||||
|
||||
<!--
|
||||
For now, digging deeper into the cluster requires logging into the relevant machines. Here are the locations
|
||||
of the relevant log files. On systemd-based systems, you may need to use `journalctl` instead of examining log files.
|
||||
-->
|
||||
## 查看日志
|
||||
|
||||
目前,深入挖掘集群需要登录相关机器。以下是相关日志文件的位置。
|
||||
在基于 systemd 的系统上,你可能需要使用 `journalctl` 而不是检查日志文件。
|
||||
|
||||
<!--
|
||||
### Control Plane nodes
|
||||
|
||||
* `/var/log/kube-apiserver.log` - API Server, responsible for serving the API
|
||||
* `/var/log/kube-scheduler.log` - Scheduler, responsible for making scheduling decisions
|
||||
* `/var/log/kube-controller-manager.log` - a component that runs most Kubernetes built-in {{<glossary_tooltip text="controllers" term_id="controller">}}, with the notable exception of scheduling (the kube-scheduler handles scheduling).
|
||||
-->
|
||||
### 控制平面节点
|
||||
|
||||
* `/var/log/kube-apiserver.log` —— API 服务器 API
|
||||
* `/var/log/kube-scheduler.log` —— 调度器,负责制定调度决策
|
||||
* `/var/log/kube-controller-manager.log` —— 运行大多数 Kubernetes
|
||||
内置{{<glossary_tooltip text="控制器" term_id="controller">}}的组件,除了调度(kube-scheduler 处理调度)。
|
||||
|
||||
<!--
|
||||
### Worker Nodes
|
||||
|
||||
* `/var/log/kubelet.log` - logs from the kubelet, responsible for running containers on the node
|
||||
* `/var/log/kube-proxy.log` - logs from `kube-proxy`, which is responsible for directing traffic to Service endpoints
|
||||
-->
|
||||
|
||||
### 工作节点
|
||||
|
||||
* `/var/log/kubelet.log` —— 来自 `kubelet` 的日志,负责在节点运行容器
|
||||
* `/var/log/kube-proxy.log` —— 来自 `kube-proxy` 的日志, 负责将流量转发到服务端点
|
||||
|
||||
<!--
|
||||
## Cluster failure modes
|
||||
|
||||
This is an incomplete list of things that could go wrong, and how to adjust your cluster setup to mitigate the problems.
|
||||
-->
|
||||
## 集群故障模式
|
||||
|
||||
这是可能出错的事情的不完整列表,以及如何调整集群设置以缓解问题。
|
||||
|
||||
<!--
|
||||
### Contributing causes
|
||||
|
||||
- VM(s) shutdown
|
||||
- Network partition within cluster, or between cluster and users
|
||||
- Crashes in Kubernetes software
|
||||
- Data loss or unavailability of persistent storage (e.g. GCE PD or AWS EBS volume)
|
||||
- Operator error, for example misconfigured Kubernetes software or application software
|
||||
-->
|
||||
### 贡献原因
|
||||
|
||||
- 虚拟机关闭
|
||||
- 集群内或集群与用户之间的网络分区
|
||||
- Kubernetes 软件崩溃
|
||||
- 持久存储(例如 GCE PD 或 AWS EBS 卷)的数据丢失或不可用
|
||||
- 操作员错误,例如配置错误的 Kubernetes 软件或应用程序软件
|
||||
|
||||
<!--
|
||||
### Specific scenarios:
|
||||
|
||||
- Apiserver VM shutdown or apiserver crashing
|
||||
- Results
|
||||
- unable to stop, update, or start new pods, services, replication controller
|
||||
- existing pods and services should continue to work normally, unless they depend on the Kubernetes API
|
||||
- Apiserver backing storage lost
|
||||
- Results
|
||||
- apiserver should fail to come up
|
||||
- kubelets will not be able to reach it but will continue to run the same pods and provide the same service proxying
|
||||
- manual recovery or recreation of apiserver state necessary before apiserver is restarted
|
||||
-->
|
||||
### 具体情况
|
||||
|
||||
- API 服务器所在的 VM 关机或者 API 服务器崩溃
|
||||
- 结果
|
||||
- 不能停止、更新或者启动新的 Pod、服务或副本控制器
|
||||
- 现有的 Pod 和服务在不依赖 Kubernetes API 的情况下应该能继续正常工作
|
||||
- API 服务器的后端存储丢失
|
||||
- 结果
|
||||
- API 服务器应该不能启动
|
||||
- kubelet 将不能访问 API 服务器,但是能够继续运行之前的 Pod 和提供相同的服务代理
|
||||
- 在 API 服务器重启之前,需要手动恢复或者重建 API 服务器的状态
|
||||
<!--
|
||||
- Supporting services (node controller, replication controller manager, scheduler, etc) VM shutdown or crashes
|
||||
- currently those are colocated with the apiserver, and their unavailability has similar consequences as apiserver
|
||||
- in future, these will be replicated as well and may not be co-located
|
||||
- they do not have their own persistent state
|
||||
- Individual node (VM or physical machine) shuts down
|
||||
- Results
|
||||
- pods on that Node stop running
|
||||
- Network partition
|
||||
- Results
|
||||
- partition A thinks the nodes in partition B are down; partition B thinks the apiserver is down. (Assuming the master VM ends up in partition A.)
|
||||
-->
|
||||
- Kubernetes 服务组件(节点控制器、副本控制器管理器、调度器等)所在的 VM 关机或者崩溃
|
||||
- 当前,这些控制器是和 API 服务器在一起运行的,它们不可用的现象是与 API 服务器类似的
|
||||
- 将来,这些控制器也会复制为多份,并且可能不在运行于同一节点上
|
||||
- 它们没有自己的持久状态
|
||||
- 单个节点(VM 或者物理机)关机
|
||||
- 结果
|
||||
- 此节点上的所有 Pod 都停止运行
|
||||
- 网络分裂
|
||||
- 结果
|
||||
- 分区 A 认为分区 B 中所有的节点都已宕机;分区 B 认为 API 服务器宕机
|
||||
(假定主控节点所在的 VM 位于分区 A 内)。
|
||||
<!--
|
||||
- Kubelet software fault
|
||||
- Results
|
||||
- crashing kubelet cannot start new pods on the node
|
||||
- kubelet might delete the pods or not
|
||||
- node marked unhealthy
|
||||
- replication controllers start new pods elsewhere
|
||||
- Cluster operator error
|
||||
- Results
|
||||
- loss of pods, services, etc
|
||||
- lost of apiserver backing store
|
||||
- users unable to read API
|
||||
- etc.
|
||||
-->
|
||||
- kubelet 软件故障
|
||||
- 结果
|
||||
- 崩溃的 kubelet 就不能在其所在的节点上启动新的 Pod
|
||||
- kubelet 可能删掉 Pod 或者不删
|
||||
- 节点被标识为非健康态
|
||||
- 副本控制器会在其它的节点上启动新的 Pod
|
||||
- 集群操作错误
|
||||
- 结果
|
||||
- 丢失 Pod 或服务等等
|
||||
- 丢失 API 服务器的后端存储
|
||||
- 用户无法读取API
|
||||
- 等等
|
||||
|
||||
<!--
|
||||
### Mitigations:
|
||||
|
||||
- Action: Use IaaS provider's automatic VM restarting feature for IaaS VMs
|
||||
- Mitigates: Apiserver VM shutdown or apiserver crashing
|
||||
- Mitigates: Supporting services VM shutdown or crashes
|
||||
|
||||
- Action: Use IaaS providers reliable storage (e.g. GCE PD or AWS EBS volume) for VMs with apiserver+etcd
|
||||
- Mitigates: Apiserver backing storage lost
|
||||
|
||||
- Action: Use [high-availability](/docs/setup/production-environment/tools/kubeadm/high-availability/) configuration
|
||||
- Mitigates: Control plane node shutdown or control plane components (scheduler, API server, controller-manager) crashing
|
||||
- Will tolerate one or more simultaneous node or component failures
|
||||
- Mitigates: API server backing storage (i.e., etcd's data directory) lost
|
||||
- Assumes HA (highly-available) etcd configuration
|
||||
-->
|
||||
### 缓解措施
|
||||
|
||||
- 措施:对于 IaaS 上的 VM,使用 IaaS 的自动 VM 重启功能
|
||||
- 缓解:API 服务器 VM 关机或 API 服务器崩溃
|
||||
- 缓解:Kubernetes 服务组件所在的 VM 关机或崩溃
|
||||
|
||||
- 措施: 对于运行 API 服务器和 etcd 的 VM,使用 IaaS 提供的可靠的存储(例如 GCE PD 或者 AWS EBS 卷)
|
||||
- 缓解:API 服务器后端存储的丢失
|
||||
|
||||
- 措施:使用[高可用性](/zh/docs/setup/production-environment/tools/kubeadm/high-availability/)的配置
|
||||
- 缓解:主控节点 VM 关机或者主控节点组件(调度器、API 服务器、控制器管理器)崩馈
|
||||
- 将容许一个或多个节点或组件同时出现故障
|
||||
- 缓解:API 服务器后端存储(例如 etcd 的数据目录)丢失
|
||||
- 假定你使用了高可用的 etcd 配置
|
||||
|
||||
<!--
|
||||
- Action: Snapshot apiserver PDs/EBS-volumes periodically
|
||||
- Mitigates: Apiserver backing storage lost
|
||||
- Mitigates: Some cases of operator error
|
||||
- Mitigates: Some cases of Kubernetes software fault
|
||||
|
||||
- Action: use replication controller and services in front of pods
|
||||
- Mitigates: Node shutdown
|
||||
- Mitigates: Kubelet software fault
|
||||
|
||||
- Action: applications (containers) designed to tolerate unexpected restarts
|
||||
- Mitigates: Node shutdown
|
||||
- Mitigates: Kubelet software fault
|
||||
-->
|
||||
- 措施:定期对 API 服务器的 PDs/EBS 卷执行快照操作
|
||||
- 缓解:API 服务器后端存储丢失
|
||||
- 缓解:一些操作错误的场景
|
||||
- 缓解:一些 Kubernetes 软件本身故障的场景
|
||||
|
||||
- 措施:在 Pod 的前面使用副本控制器或服务
|
||||
- 缓解:节点关机
|
||||
- 缓解:kubelet 软件故障
|
||||
|
||||
- 措施:应用(容器)设计成容许异常重启
|
||||
- 缓解:节点关机
|
||||
- 缓解:kubelet 软件故障
|
||||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
<!--
|
||||
* Learn about the metrics available in the [Resource Metrics Pipeline](resource-metrics-pipeline)
|
||||
* Discover additional tools for [monitoring resource usage](resource-usage-monitoring)
|
||||
* Use Node Problem Detector to [monitor node health](monitor-node-health)
|
||||
* Use `crictl` to [debug Kubernetes nodes](crictl)
|
||||
* Get more information about [Kubernetes auditing](audit)
|
||||
* Use `telepresence` to [develop and debug services locally](local-debugging)
|
||||
-->
|
||||
* 了解 [资源指标管道](resource-metrics-pipeline) 中可用的指标
|
||||
* 发现用于[监控资源使用](resource-usage-monitoring)的其他工具
|
||||
* 使用节点问题检测器[监控节点健康](monitor-node-health)
|
||||
* 使用 `crictl` 来[调试 Kubernetes 节点](crictl)
|
||||
* 获取更多关于 [Kubernetes 审计](audit)的信息
|
||||
* 使用 `telepresence` [本地开发和调试服务](local-debugging)
|
|
@ -13,12 +13,12 @@ content_type: task
|
|||
{{% thirdparty-content %}}
|
||||
|
||||
<!--
|
||||
Kubernetes applications usually consist of multiple, separate services, each running in its own container. Developing and debugging these services on a remote Kubernetes cluster can be cumbersome, requiring you to [get a shell on a running container](/docs/tasks/debug-application-cluster/get-shell-running-container/) in order to run debugging tools.
|
||||
Kubernetes applications usually consist of multiple, separate services, each running in its own container. Developing and debugging these services on a remote Kubernetes cluster can be cumbersome, requiring you to [get a shell on a running container](/docs/tasks/debug/debug-application/get-shell-running-container/) in order to run debugging tools.
|
||||
-->
|
||||
|
||||
Kubernetes 应用程序通常由多个独立的服务组成,每个服务都在自己的容器中运行。
|
||||
在远端的 Kubernetes 集群上开发和调试这些服务可能很麻烦,需要
|
||||
[在运行的容器上打开 Shell](/zh/docs/tasks/debug-application-cluster/get-shell-running-container/),
|
||||
在远端的 Kubernetes 集群上开发和调试这些服务可能很麻烦,
|
||||
需要[在运行的容器上打开 Shell](/zh/docs/tasks/debug/debug-application/get-shell-running-container/),
|
||||
以运行调试工具。
|
||||
|
||||
<!--
|
||||
|
|
|
@ -47,12 +47,12 @@ The Metrics API, and the metrics pipeline that it enables, only offers the minim
|
|||
CPU and memory metrics to enable automatic scaling using HPA and / or VPA.
|
||||
If you would like to provide a more complete set of metrics, you can complement
|
||||
the simpler Metrics API by deploying a second
|
||||
[metrics pipeline](/docs/tasks/debug-application-cluster/resource-usage-monitoring/#full-metrics-pipeline)
|
||||
[metrics pipeline](/docs/tasks/debug/debug-cluster/resource-usage-monitoring/#full-metrics-pipeline)
|
||||
that uses the _Custom Metrics API_.
|
||||
-->
|
||||
Metrics API 及其启用的指标管道仅提供最少的 CPU 和内存指标,以启用使用 HPA 和/或 VPA 的自动扩展。
|
||||
如果你想提供更完整的指标集,你可以通过部署使用 _Custom Metrics API_ 的第二个
|
||||
[指标管道](/zh/docs/tasks/debug-application-cluster/resource-usage-monitoring/#full-metrics-pipeline) 来作为简单的 Metrics API 的补充。
|
||||
[指标管道](/zh/docs/tasks/debug/debug-cluster/resource-usage-monitoring/#full-metrics-pipeline)来作为简单的 Metrics API 的补充。
|
||||
{{< /note >}}
|
||||
|
||||
<!--
|
||||
|
|
Loading…
Reference in New Issue