[zh] Sync troubleshooting-cni-plugin-related-errors and security-agent

pull/49930/head
windsonsea 2025-02-27 14:55:20 +08:00
parent 040f4b8dc9
commit 6c88d0e753
2 changed files with 67 additions and 85 deletions

View File

@ -26,8 +26,8 @@ alternative runtimes.
-->
Kubernetes 对与 Docker Engine 直接集成的支持已被弃用且已经被删除。
大多数应用程序不直接依赖于托管容器的运行时。但是,仍然有大量的遥测和监控代理依赖
docker 来收集容器元数据、日志和指标。
本文汇总了一些信息和链接:信息用于阐述如何探查这些依赖,链接用于解释如何迁移这些代理去使用通用的工具或其他容器运行
Docker 来收集容器元数据、日志和指标。
本文汇总了一些如何探查这些依赖的信息以及如何迁移这些代理去使用通用工具或其他容器运行时的参考链接
<!--
## Telemetry and security agents
@ -49,13 +49,14 @@ directly on nodes.
<!--
Historically, Kubernetes was written to work specifically with Docker Engine.
Kubernetes took care of networking and scheduling, relying on Docker Engine for launching
and running containers (within Pods) on a node. Some information that is relevant to telemetry,
such as a pod name, is only available from Kubernetes components. Other data, such as container
metrics, is not the responsibility of the container runtime. Early telemetry agents needed to query the
container runtime **and** Kubernetes to report an accurate picture. Over time, Kubernetes gained
the ability to support multiple runtimes, and now supports any runtime that is compatible with
the [container runtime interface](/docs/concepts/architecture/cri/).
Kubernetes took care of networking and scheduling, relying on Docker Engine for
launching and running containers (within Pods) on a node. Some information that
is relevant to telemetry, such as a pod name, is only available from Kubernetes
components. Other data, such as container metrics, is not the responsibility of
the container runtime. Early telemetry agents needed to query the container
runtime *and* Kubernetes to report an accurate picture. Over time, Kubernetes
gained the ability to support multiple runtimes, and now supports any runtime
that is compatible with the [container runtime interface](/docs/concepts/architecture/cri/).
-->
从历史上看Kubernetes 是专门为与 Docker Engine 一起工作而编写的。
Kubernetes 负责网络和调度,依靠 Docker Engine
@ -124,13 +125,13 @@ kubectl get pods --all-namespaces \
| grep '/var/run/docker.sock'
```
{{< note >}}
<!--
There are alternative ways for a pod to access Docker on the host. For instance, the parent
directory `/var/run` may be mounted instead of the full path (like in [this
example](https://gist.github.com/itaysk/7bc3e56d69c4d72a549286d98fd557dd)).
The script above only detects the most common uses.
-->
{{< note >}}
对于 Pod 来说,访问宿主机上的 Docker 还有其他方式。
例如,可以挂载 `/var/run` 的父目录而非其完整路径
(就像[这个例子](https://gist.github.com/itaysk/7bc3e56d69c4d72a549286d98fd557dd))。
@ -165,8 +166,7 @@ Please contact the vendor to get up to date instructions for migrating from dock
-->
本节旨在汇总有关可能依赖于容器运行时的各种遥测和安全代理的信息。
我们通过
[谷歌文档](https://docs.google.com/document/d/1ZFi4uKit63ga5sxEiZblfb-c23lFhvy6RXVPikS8wf0/edit#)
我们通过[谷歌文档](https://docs.google.com/document/d/1ZFi4uKit63ga5sxEiZblfb-c23lFhvy6RXVPikS8wf0/edit#)
提供了为各类遥测和安全代理供应商准备的持续更新的迁移指导。
请与供应商联系,获取从 dockershim 迁移的最新说明。
@ -193,8 +193,8 @@ The pod that accesses Docker Engine may have a name containing any of:
- `datadog`
- `dd-agent`
-->
如何迁移:
[Kubernetes 中对于 Docker 的弃用](https://docs.datadoghq.com/agent/guide/docker-deprecation/)
如何迁移:
[Kubernetes 中对于 Docker 的弃用](https://docs.datadoghq.com/agent/guide/docker-deprecation/)
名字中包含以下字符串的 Pod 可能访问 Docker Engine
- `datadog-agent`
@ -219,10 +219,12 @@ The pod accessing Docker may have name containing:
如何迁移:
[在 Dynatrace 上从 Docker-only 迁移到通用容器指标](https://community.dynatrace.com/t5/Best-practices/Migrating-from-Docker-only-to-generic-container-metrics-in/m-p/167030#M49)
Containerd 支持公告:[在基于 containerd 的 Kubernetes 环境的获取容器的自动化全栈可见性](https://www.dynatrace.com/news/blog/get-automated-full-stack-visibility-into-containerd-based-kubernetes-environments/)
containerd 支持公告:[在基于 containerd 的 Kubernetes 环境的获取容器的自动化全栈可见性](https://www.dynatrace.com/news/blog/get-automated-full-stack-visibility-into-containerd-based-kubernetes-environments/)
CRI-O 支持公告:[在基于 CRI-O 的 Kubernetes 环境获取容器的自动化全栈可见性(测试版)](https://www.dynatrace.com/news/blog/get-automated-full-stack-visibility-into-your-cri-o-kubernetes-containers-beta/)
名字中包含以下字符串的 Pod 可能访问 Docker
- `dynatrace-oneagent`
### [Falco](https://falco.org)
@ -236,12 +238,12 @@ The pod accessing Docker may have name containing:
- `falco`
-->
如何迁移:
[迁移 Falco 从 dockershim](https://falco.org/docs/getting-started/deployment/#docker-deprecation-in-kubernetes)
[迁移 Falco 从 dockershim](https://falco.org/docs/getting-started/deployment/#docker-deprecation-in-kubernetes)
Falco 支持任何与 CRI 兼容的运行时(默认配置中使用 containerd该文档解释了所有细节。
名字中包含以下字符串的 Pod 可能访问 Docker
- `falco`
- `falco`
### [Prisma Cloud Compute](https://docs.paloaltonetworks.com/prisma/prisma-cloud.html)
@ -258,7 +260,6 @@ The pod accessing Docker may be named like:
- `twistlock-defender-ds`
### [SignalFx (Splunk)](https://www.splunk.com/en_us/investor-relations/acquisitions/signalfx.html)
<!--
@ -267,20 +268,6 @@ The SignalFx Smart Agent (deprecated) uses several different monitors for Kubern
The `kubelet-stats` monitor was previously deprecated by the vendor, in favor of `kubelet-metrics`.
The `docker-container-stats` monitor is the one affected by dockershim removal.
Do not use the `docker-container-stats` with container runtimes other than Docker Engine.
How to migrate from dockershim-dependent agent:
1. Remove `docker-container-stats` from the list of [configured monitors](https://github.com/signalfx/signalfx-agent/blob/main/docs/monitor-config.md).
Note, keeping this monitor enabled with non-dockershim runtime will result in incorrect metrics
being reported when docker is installed on node and no metrics when docker is not installed.
2. [Enable and configure `kubelet-metrics`](https://github.com/signalfx/signalfx-agent/blob/main/docs/monitors/kubelet-metrics.md) monitor.
{{< note >}}
The set of collected metrics will change. Review your alerting rules and dashboards.
{{< /note >}}
The Pod accessing Docker may be named something like:
- `signalfx-agent`
-->
SignalFx Smart Agent已弃用在 Kubernetes 集群上使用了多种不同的监视器,
包括 `kubernetes-cluster``kubelet-stats/kubelet-metrics``docker-container-stats`。
@ -288,17 +275,29 @@ SignalFx Smart Agent已弃用在 Kubernetes 集群上使用了多种不同
`docker-container-stats` 监视器受 dockershim 移除的影响。
不要为 `docker-container-stats` 监视器使用 Docker Engine 之外的运行时。
<!--
How to migrate from dockershim-dependent agent:
1. Remove `docker-container-stats` from the list of [configured monitors](https://github.com/signalfx/signalfx-agent/blob/main/docs/monitor-config.md).
Note, keeping this monitor enabled with non-dockershim runtime will result in incorrect metrics
being reported when docker is installed on node and no metrics when docker is not installed.
2. [Enable and configure `kubelet-metrics`](https://github.com/signalfx/signalfx-agent/blob/main/docs/monitors/kubelet-metrics.md) monitor.
-->
如何从依赖 dockershim 的代理迁移:
1. 从[所配置的监视器](https://github.com/signalfx/signalfx-agent/blob/main/docs/monitor-config.md)中移除 `docker-container-stats`
注意,若节点上已经安装了 Docker在非 dockershim 环境中启用此监视器后会导致报告错误的指标;
如果节点未安装 Docker则无法获得指标。
2. [启用和配置 `kubelet-metrics`](https://github.com/signalfx/signalfx-agent/blob/main/docs/monitors/kubelet-metrics.md) 监视器。
{{< note >}}
<!--
The set of collected metrics will change. Review your alerting rules and dashboards.
-->
收集的指标会发生变化。具体请查看你的告警规则和仪表盘。
{{< /note >}}
名字中包含以下字符串的 Pod 可能访问 Docker
- `signalfx-agent`
### Yahoo Kubectl Flame
@ -307,5 +306,5 @@ SignalFx Smart Agent已弃用在 Kubernetes 集群上使用了多种不同
Flame does not support container runtimes other than Docker. See
[https://github.com/yahoo/kubectl-flame/issues/51](https://github.com/yahoo/kubectl-flame/issues/51)
-->
Flame 不支持 Docker 以外的容器运行时,具体可见 [https://github.com/yahoo/kubectl-flame/issues/51](https://github.com/yahoo/kubectl-flame/issues/51)
Flame 不支持 Docker 以外的容器运行时,具体参见
[https://github.com/yahoo/kubectl-flame/issues/51](https://github.com/yahoo/kubectl-flame/issues/51)

View File

@ -19,38 +19,35 @@ To avoid CNI plugin-related errors, verify that you are using or upgrading to a
container runtime that has been tested to work correctly with your version of
Kubernetes.
-->
为了避免 CNI 插件相关的错误,需要验证你正在使用或升级到一个经过测试的容器运行时,
该容器运行时能够在你的 Kubernetes 版本上正常工作。
为了避免 CNI 插件相关的错误,需要验证你正在使用或升级到的容器运行时经过测试能够在你的
Kubernetes 版本上正常工作。
<!--
## About the "Incompatible CNI versions" and "Failed to destroy network for sandbox" errors
-->
## 关于 "Incompatible CNI versions" 和 "Failed to destroy network for sandbox" 错误 {#about-the-incompatible-cni-versions-and-failed-to-destroy-network-for-sandbox-errors}
<!--
Service issues exist for pod CNI network setup and tear down in containerd
v1.6.0-v1.6.3 when the CNI plugins have not been upgraded and/or the CNI config
version is not declared in the CNI config files. The containerd team reports, "these issues are resolved in containerd v1.6.4."
version is not declared in the CNI config files. The containerd team reports,
"these issues are resolved in containerd v1.6.4."
With containerd v1.6.0-v1.6.3, if you do not upgrade the CNI plugins and/or
declare the CNI config version, you might encounter the following "Incompatible
CNI versions" or "Failed to destroy network for sandbox" error conditions.
-->
在 containerd v1.6.0-v1.6.3 中,当配置或清除 Pod CNI 网络时,如果 CNI 插件没有升级和/或
CNI 配置文件中没有声明 CNI 配置版本时会出现服务问题。containerd 团队报告说:
在 containerd v1.6.0 到 v1.6.3 中,当配置或清除 Pod CNI 网络时,如果 CNI 插件没有升级和/或
CNI 配置文件中没有声明 CNI 配置版本会出现服务问题。containerd 团队报告说:
“这些问题在 containerd v1.6.4 中得到了解决。”
在使用 containerd v1.6.0-v1.6.3 时,如果你不升级 CNI 插件和/或声明 CNI 配置版本,
在使用 containerd v1.6.0v1.6.3 时,如果你不升级 CNI 插件和/或声明 CNI 配置版本,
你可能会遇到以下 "Incompatible CNI versions" 或 "Failed to destroy network for sandbox"
错误状况。
<!--
### Incompatible CNI versions error
-->
### Incompatible CNI versions 错误 {#incompatible-cni-versions-error}
<!--
@ -59,8 +56,7 @@ the config because the config version is later than the plugin version, the
containerd log will likely show an error message on startup of a pod similar
to:
-->
如果因为配置版本比插件版本新,导致你的 CNI 插件版本与配置中的插件版本无法正确匹配时,
如果因为配置版本比插件版本新,导致你的 CNI 插件版本与配置中的插件版本无法正确匹配,
在启动 Pod 时containerd 日志可能会显示类似的错误信息:
```
@ -70,22 +66,19 @@ incompatible CNI versions; config is \"1.0.0\", plugin supports [\"0.1.0\" \"0.2
<!--
To fix this issue, [update your CNI plugins and CNI config files](#updating-your-cni-plugins-and-cni-config-files).
-->
为了解决这个问题,需要[更新你的 CNI 插件和 CNI 配置文件](#updating-your-cni-plugins-and-cni-config-files)。
<!--
### Failed to destroy network for sandbox error
-->
### Failed to destroy network for sandbox 错误 {#failed-to-destroy-network-for-sandbox-error}
<!--
If the version of the plugin is missing in the CNI plugin config, the pod may
run. However, stopping the pod generates an error similar to:
-->
如果 CNI 插件配置中未给出插件的版本,
Pod 可能可以运行。但是,停止 Pod 时会产生类似于以下错误:
Pod 可以运行。但是,停止 Pod 时会产生以下类似错误:
```
ERROR[2022-04-26T00:43:24.518165483Z] StopPodSandbox for "b" failed
@ -98,7 +91,6 @@ attached. To recover from this problem, [edit the CNI config file](#updating-you
the missing version information. The next attempt to stop the pod should
be successful.
-->
此错误使 Pod 处于未就绪状态,且仍然挂接到某网络名字空间上。
为修复这一问题,[编辑 CNI 配置文件](#updating-your-cni-plugins-and-cni-config-files)以添加缺失的版本信息。
下一次尝试停止 Pod 应该会成功。
@ -106,7 +98,6 @@ be successful.
<!--
### Updating your CNI plugins and CNI config files
-->
### 更新你的 CNI 插件和 CNI 配置文件 {#updating-your-cni-plugins-and-cni-config-files}
<!--
@ -116,55 +107,50 @@ your CNI plugins and editing the CNI config files.
Here's an overview of the typical steps for each node:
-->
如果你使用 containerd v1.6.0-v1.6.3 并遇到 "Incompatible CNI versions" 或者
如果你使用 containerd v1.6.0 到 v1.6.3 并遇到 "Incompatible CNI versions" 或者
"Failed to destroy network for sandbox" 错误,考虑更新你的 CNI 插件并编辑 CNI 配置文件。
以下是针对各节点要执行的典型步骤的概述:
<!--
1. [Safely drain and cordon the
node](/docs/tasks/administer-cluster/safely-drain-node/).
1. [Safely drain and cordon the node](/docs/tasks/administer-cluster/safely-drain-node/).
-->
1. [安全地腾空并隔离节点](/zh-cn/docs/tasks/administer-cluster/safely-drain-node/)。
<!--
2. After stopping your container runtime and kubelet services, perform the
following upgrade operations:
- If you're running CNI plugins, upgrade them to the latest version.
- If you're using non-CNI plugins, replace them with CNI plugins. Use the
latest version of the plugins.
- Update the plugin configuration file to specify or match a version of the
CNI specification that the plugin supports, as shown in the following ["An
example containerd configuration
file"](#an-example-containerd-configuration-file) section.
- For `containerd`, ensure that you have installed the latest version (v1.0.0
or later) of the CNI loopback plugin.
- Upgrade node components (for example, the kubelet) to Kubernetes v1.24
- Upgrade to or install the most current version of the container runtime.
-->
1. After stopping your container runtime and kubelet services, perform the
following upgrade operations:
- If you're running CNI plugins, upgrade them to the latest version.
- If you're using non-CNI plugins, replace them with CNI plugins. Use the
latest version of the plugins.
- Update the plugin configuration file to specify or match a version of the
CNI specification that the plugin supports, as shown in the following
["An example containerd configuration file"](#an-example-containerd-configuration-file) section.
- For `containerd`, ensure that you have installed the latest version (v1.0.0 or later)
of the CNI loopback plugin.
- Upgrade node components (for example, the kubelet) to Kubernetes v1.24
- Upgrade to or install the most current version of the container runtime.
-->
2. 停止容器运行时和 kubelet 服务后,执行以下升级操作:
- 如果你正在运行 CNI 插件,请将它们升级到最新版本。
- 如果你使用的是非 CNI 插件,请将它们替换为 CNI 插件,并使用最新版本的插件。
- 更新插件配置文件以指定或匹配 CNI 规范支持的插件版本,
如后文 ["containerd 配置文件示例"](#an-example-containerd-configuration-file)章节所示。
- 对于 `containerd`,请确保你已安装 CNI loopback 插件的最新版本v1.0.0 或更高版本)。
- 将节点组件(例如 kubelet升级到 Kubernetes v1.24
- 升级到或安装最新版本的容器运行时。
- 如果你正在运行 CNI 插件,请将它们升级到最新版本。
- 如果你使用的是非 CNI 插件,请将它们替换为 CNI 插件,并使用最新版本的插件。
- 更新插件配置文件以指定或匹配 CNI 规范支持的插件版本,
如后文 ["containerd 配置文件示例"](#an-example-containerd-configuration-file)章节所示。
- 对于 `containerd`,请确保你已安装 CNI loopback 插件的最新版本v1.0.0 或更高版本)。
- 将节点组件(例如 kubelet升级到 Kubernetes v1.24
- 升级到或安装最新版本的容器运行时。
<!--
3. Bring the node back into your cluster by restarting your container runtime
and kubelet. Uncordon the node (`kubectl uncordon <nodename>`).
1. Bring the node back into your cluster by restarting your container runtime
and kubelet. Uncordon the node (`kubectl uncordon <nodename>`).
-->
3. 通过重新启动容器运行时和 kubelet 将节点重新加入到集群。取消节点隔离(`kubectl uncordon <nodename>`)。
<!--
## An example containerd configuration file
-->
## containerd 配置文件示例 {#an-example-containerd-configuration-file}
<!--
@ -174,7 +160,6 @@ which supports a recent version of the CNI specification (v1.0.0).
Please see the documentation from your plugin and networking provider for
further instructions on configuring your system.
-->
以下示例显示了 `containerd` 运行时 v1.6.x 的配置,
它支持最新版本的 CNI 规范v1.0.0)。
请参阅你的插件和网络提供商的文档,以获取有关你系统配置的进一步说明。
@ -190,7 +175,6 @@ internally by containerd, and is set to use CNI v1.0.0. This also means that the
version of the `loopback` plugin must be v1.0.0 or later when this newer version
`containerd` is started.
-->
在 Kubernetes 中作为其默认行为containerd 运行时为 Pod 添加一个本地回路接口:`lo`。
containerd 运行时通过 CNI 插件 `loopback` 配置本地回路接口。
`loopback` 插件作为 `containerd` 发布包的一部分,扮演 `cni` 角色。
@ -203,7 +187,6 @@ The following bash command generates an example CNI config. Here, the 1.0.0
value for the config version is assigned to the `cniVersion` field for use when
`containerd` invokes the CNI bridge plugin.
-->
以下 Bash 命令生成一个 CNI 配置示例。这里,`cniVersion` 字段被设置为配置版本值 1.0.0
以供 `containerd` 调用 CNI 桥接插件时使用。