Update disruptions.md

pull/35949/head
yanrongshi 2022-08-14 14:27:12 +08:00
parent fac477b9e4
commit f257e5116c
1 changed files with 56 additions and 54 deletions

View File

@ -18,12 +18,12 @@ weight: 60
<!--
This guide is for application owners who want to build
highly available applications, and thus need to understand
what types of Disruptions can happen to Pods.
what types of disruptions can happen to Pods.
-->
本指南针对的是希望构建高可用性应用程序的应用所有者,他们有必要了解可能发生在 Pod 上的干扰类型。
本指南针对的是希望构建高可用性应用的应用所有者,他们有必要了解可能发生在 Pod 上的干扰类型。
<!--
It is also for Cluster Administrators who want to perform automated
It is also for cluster administrators who want to perform automated
cluster actions, like upgrading and autoscaling clusters.
-->
文档同样适用于想要执行自动化集群操作(例如升级和自动扩展集群)的集群管理员。
@ -31,7 +31,7 @@ cluster actions, like upgrading and autoscaling clusters.
<!-- body -->
<!--
## Voluntary and Involuntary Disruptions
## Voluntary and involuntary disruptions
Pods do not disappear until someone (a person or a controller) destroys them, or
there is an unavoidable hardware or system software error.
@ -44,7 +44,7 @@ Pod 不会消失,除非有人(用户或控制器)将其销毁,或者出
We call these unavoidable cases *involuntary disruptions* to
an application. Examples are:
-->
我们把这些不可避免的情况称为应用的*非自愿干扰Involuntary Disruptions*。例如:
我们把这些不可避免的情况称为应用的**非自愿干扰Involuntary Disruptions**。例如:
<!--
- a hardware failure of the physical machine backing the node
@ -74,9 +74,9 @@ We call other cases *voluntary disruptions*. These include both
actions initiated by the application owner and those initiated by a Cluster
Administrator. Typical application owner actions include:
-->
我们称其他情况为*自愿干扰Voluntary Disruptions*。
包括由应用程序所有者发起的操作和由集群管理员发起的操作。典型的应用程序所有者的操
作包括:
我们称其他情况为**自愿干扰Voluntary Disruptions**。
包括由应用所有者发起的操作和由集群管理员发起的操作。
典型的应用所有者的操作包括:
<!--
- deleting the deployment or other controller that manages the pod
@ -88,7 +88,7 @@ Administrator. Typical application owner actions include:
- 直接删除 Pod例如因为误操作
<!--
Cluster Administrator actions include:
Cluster administrator actions include:
- [Draining a node](/docs/tasks/administer-cluster/safely-drain-node/) for repair or upgrade.
- Draining a node from a cluster to scale the cluster down (learn about
@ -126,7 +126,7 @@ deleting deployments or pods bypasses Pod Disruption Budgets.
{{< /caution >}}
<!--
## Dealing with Disruptions
## Dealing with disruptions
Here are some ways to mitigate involuntary disruptions:
-->
@ -135,7 +135,7 @@ Here are some ways to mitigate involuntary disruptions:
以下是减轻非自愿干扰的一些方法:
<!--
- Ensure your pod [requests the resources](/docs/tasks/configure-pod-container/assign-cpu-ram-container) it needs.
- Ensure your pod [requests the resources](/docs/tasks/configure-pod-container/assign-memory-resource) it needs.
- Replicate your application if you need higher availability. (Learn about running replicated
[stateless](/docs/tasks/run-application/run-stateless-application-deployment/)
and [stateful](/docs/tasks/run-application/run-replicated-stateful-application/) applications.)
@ -146,12 +146,12 @@ and [stateful](/docs/tasks/run-application/run-replicated-stateful-application/)
[multi-zone cluster](/docs/setup/multiple-zones).)
-->
- 确保 Pod 在请求中给出[所需资源](/zh-cn/docs/tasks/configure-pod-container/assign-memory-resource/)。
- 如果需要更高的可用性,请复制应用程序
- 如果需要更高的可用性,请复制应用。
(了解有关运行多副本的[无状态](/zh-cn/docs/tasks/run-application/run-stateless-application-deployment/)
和[有状态](/zh-cn/docs/tasks/run-application/run-replicated-stateful-application/)应用程序的信息。)
- 为了在运行复制应用程序时获得更高的可用性,请跨机架(使用
和[有状态](/zh-cn/docs/tasks/run-application/run-replicated-stateful-application/)应用的信息。)
- 为了在运行复制应用时获得更高的可用性,请跨机架(使用
[反亲和性](/zh-cn/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity)
或跨区域(如果使用[多区域集群](/zh-cn/docs/setup/best-practices/multiple-zones/))扩展应用程序
或跨区域(如果使用[多区域集群](/zh-cn/docs/setup/best-practices/multiple-zones/))扩展应用。
<!--
The frequency of voluntary disruptions varies. On a basic Kubernetes cluster, there are
@ -178,8 +178,8 @@ Kubernetes offers features to help run highly available applications at the same
time as frequent voluntary disruptions. We call this set of features
*Disruption Budgets*.
-->
Kubernetes 提供特性来满足在出现频繁自愿干扰的同时运行高可用的应用程序。我们称这些特性为
*干扰预算Disruption Budget*。
Kubernetes 提供特性来满足在出现频繁自愿干扰的同时运行高可用的应用。我们称这些特性为
**干扰预算Disruption Budget**。
<!--
## Pod disruption budgets
@ -187,9 +187,9 @@ Kubernetes 提供特性来满足在出现频繁自愿干扰的同时运行高可
Kubernetes offers features to help you run highly available applications even when you
introduce frequent voluntary disruptions.
An Application Owner can create a `PodDisruptionBudget` object (PDB) for each application.
A PDB limits the number of pods of a replicated application that are down simultaneously from
voluntary disruptions. For example, a quorum-based application would
As an application owner, you can create a PodDisruptionBudget (PDB) for each application.
A PDB limits the number of Pods of a replicated application that are down simultaneously from
voluntary disruptions. For example, a quorum-based application would
like to ensure that the number of replicas running is never brought below the
number needed for a quorum. A web front end might want to
ensure that the number of replicas serving load never falls below a certain
@ -199,18 +199,17 @@ percentage of the total.
{{< feature-state for_k8s_version="v1.21" state="stable" >}}
即使你会经常引入自愿性干扰Kubernetes 也能够支持你运行高度可用的应用。
即使你会经常引入自愿性干扰Kubernetes 提供的功能也能够支持你运行高度可用的应用。
应用程序所有者可以为每个应用程序创建 `PodDisruptionBudget` 对象PDB
PDB 将限制在同一时间因自愿干扰导致的复制应用程序中宕机的 pod 数量。
例如,基于票选机制的应用程序希望确保运行的副本数永远不会低于仲裁所需的数量。
作为一个应用的所有者,你可以为每个应用创建一个 `PodDisruptionBudget`PDB
PDB 将限制在同一时间因自愿干扰导致的多副本应用中发生宕机的 Pod 数量。
例如,基于票选机制的应用希望确保运行中的副本数永远不会低于票选所需的数量。
Web 前端可能希望确保提供负载的副本数量永远不会低于总数的某个百分比。
<!--
Cluster managers and hosting providers should use tools which
respect PodDisruptionBudgets by calling the [Eviction API](/docs/tasks/administer-cluster/safely-drain-node/#eviction-api)
instead of directly deleting pods or deployments. Examples are the `kubectl drain` command
and the Kubernetes-on-GCE cluster upgrade script (`cluster/gce/upgrade.sh`).
instead of directly deleting pods or deployments.
-->
集群管理员和托管提供商应该使用遵循 PodDisruptionBudgets 的接口
(通过调用[Eviction API](/zh-cn/docs/tasks/administer-cluster/safely-drain-node/#the-eviction-api)
@ -219,38 +218,41 @@ and the Kubernetes-on-GCE cluster upgrade script (`cluster/gce/upgrade.sh`).
<!--
For example, the `kubectl drain` subcommand lets you mark a node as going out of
service. When you run `kubectl drain`, the tool tries to evict all of the Pods on
the Node you'are taking out of service. The eviction request may be temporarily rejected,
and the tool periodically retries all failed requests until all pods
are terminated, or until a configurable timeout is reached.
the Node you're taking out of service. The eviction request that `kubectl` submits on
your behalf may be temporarily rejected, so the tool periodically retries all failed
requests until all Pods on the target node are terminated, or until a configurable timeout is reached.
-->
例如,`kubectl drain` 命令可以用来标记某个节点即将停止服务。
运行 `kubectl drain` 命令时,工具会尝试驱逐机器上的所有 Pod。
`kubectl` 所提交的驱逐请求可能会暂时被拒绝,所以该工具会定时重试失败的请求,
直到所有的 Pod 都被终止,或者达到配置的超时时间。
运行 `kubectl drain` 命令时,工具会尝试驱逐你所停服的节点上的所有 Pod。
`kubectl` 代表你所提交的驱逐请求可能会暂时被拒绝,
所以该工具会周期性地重试所有失败的请求,
直到目标节点上的所有的 Pod 都被终止,或者达到配置的超时时间。
<!--
A PDB specifies the number of replicas that an application can tolerate having, relative to how
many it is intended to have. For example, a Deployment which has a `.spec.replicas: 5` is
supposed to have 5 pods at any given time. If its PDB allows for there to be 4 at a time,
then the Eviction API will allow voluntary disruption of one, but not two pods, at a time.
then the Eviction API will allow voluntary disruption of one (but not two) pods at a time.
-->
PDB 指定应用程序可以容忍的副本数量(相当于应该有多少副本)。
PDB 指定应用可以容忍的副本数量(相当于应该有多少副本)。
例如,具有 `.spec.replicas: 5` 的 Deployment 在任何时间都应该有 5 个 Pod。
如果 PDB 允许其在某一时刻有 4 个副本,那么驱逐 API 将允许同一时刻仅有一个而不是两个 Pod 自愿干扰。
如果 PDB 允许其在某一时刻有 4 个副本,那么驱逐 API 将允许同一时刻仅有一个(而不是两个)Pod 自愿干扰。
<!--
The group of pods that comprise the application is specified using a label selector, the same
as the one used by the application's controller (deployment, stateful-set, etc).
-->
使用标签选择器来指定构成应用程序的一组 Pod这与应用程序的控制器DeploymentStatefulSet 等)
使用标签选择器来指定构成应用的一组 Pod这与应用的控制器DeploymentStatefulSet 等)
选择 Pod 的逻辑一样。
<!--
The "intended" number of pods is computed from the `.spec.replicas` of the pods controller.
The controller is discovered from the pods using the `.metadata.ownerReferences` of the object.
The "intended" number of pods is computed from the `.spec.replicas` of the workload resource
that is managing those pods. The control plane discovers the owning workload resource by
examining the `.metadata.ownerReferences` of the Pod.
-->
Pod 控制器的 `.spec.replicas` 计算“预期的” Pod 数量。
根据 Pod 对象的 `.metadata.ownerReferences` 字段来发现控制器。
Pod 的“预期”数量由管理这些 Pod 的工作负载资源的 `.spec.replicas` 参数计算出来的。
控制平面通过检查 Pod 的
`.metadata.ownerReferences` 来发现关联的工作负载资源。
<!--
[Involuntary disruptions](#voluntary-and-involuntary-disruptions) cannot be prevented by PDBs; however they
@ -262,13 +264,14 @@ PDB 无法防止[非自愿干扰](#voluntary-and-involuntary-disruptions)
<!--
Pods which are deleted or unavailable due to a rolling upgrade to an application do count
against the disruption budget, but controllers (like deployment and stateful-set)
are not limited by PDBs when doing rolling upgrades - the handling of failures
during application updates is configured in spec for the specific workload resource.
against the disruption budget, but workload resources (such as Deployment and StatefulSet)
are not limited by PDBs when doing rolling upgrades. Instead, the handling of failures
during application updates is configured in the spec for the specific workload resource.
-->
由于应用程序的滚动升级而被删除或不可用的 Pod 确实会计入干扰预算,
但是控制器(如 Deployment 和 StatefulSet在进行滚动升级时不受 PDB
的限制。应用程序更新期间的故障处理方式是在对应的工作负载资源的 `spec` 中配置的。
由于应用的滚动升级而被删除或不可用的 Pod 确实会计入干扰预算,
但是工作负载资源(如 Deployment 和 StatefulSet
在进行滚动升级时不受 PDB 的限制。
应用更新期间的故障处理方式是在对应的工作负载资源的 `spec` 中配置的。
<!--
When a pod is evicted using the eviction API, it is gracefully
@ -282,14 +285,13 @@ hornoring the
中的 `terminationGracePeriodSeconds` 配置值。
<!--
## PDB Example
## PodDisruptionBudget example {#pdb-example}
Consider a cluster with 3 nodes, `node-1` through `node-3`.
The cluster is running several applications. One of them has 3 replicas initially called
`pod-a`, `pod-b`, and `pod-c`. Another, unrelated pod without a PDB, called `pod-x`, is also shown.
Initially, the pods are laid out as follows:
-->
## PDB 例子 {#pdb-example}
## PodDisruptionBudget 例子 {#pdb-example}
假设集群有 3 个节点,`node-1` 到 `node-3`。集群上运行了一些应用。
其中一个应用有 3 个副本,分别是 `pod-a``pod-b` 和 `pod-c`
@ -316,7 +318,7 @@ This puts the cluster in this state:
-->
例如,假设集群管理员想要重启系统,升级内核版本来修复内核中的缺陷。
集群管理员首先使用 `kubectl drain` 命令尝试`node-1` 节点。
集群管理员首先使用 `kubectl drain` 命令尝试`node-1` 节点。
命令尝试驱逐 `pod-a``pod-x`。操作立即就成功了。
两个 Pod 同时进入 `terminating` 状态。这时的集群处于下面的状态:
@ -426,7 +428,7 @@ can happen, according to:
- the type of controller
- the cluster's resource capacity
-->
- 应用程序需要多少个副本
- 应用需要多少个副本
- 优雅关闭应用实例需要多长时间
- 启动应用新实例需要多长时间
- 控制器的类型
@ -449,7 +451,7 @@ may make sense in these scenarios:
there is natural specialization of roles
- when third-party tools or services are used to automate cluster management
-->
- 当有许多应用程序团队共用一个 Kubernetes 集群,并且有自然的专业角色
- 当有许多应用团队共用一个 Kubernetes 集群,并且有自然的专业角色
- 当第三方工具或服务用于集群自动化管理
<!--
@ -491,11 +493,11 @@ the nodes in your cluster, such as a node or system software upgrade, here are s
- 接受升级期间的停机时间。
- 故障转移到另一个完整的副本集群。
- 没有停机时间,但是对于重复的节点和人工协调成本可能是昂贵的。
- 编写可容忍干扰的应用程序和使用 PDB。
- 编写可容忍干扰的应用和使用 PDB。
- 不停机。
- 最小的资源重复。
- 允许更多的集群管理自动化。
- 编写可容忍干扰的应用程序是棘手的,但对于支持容忍自愿干扰所做的工作,和支持自动扩缩和容忍非
- 编写可容忍干扰的应用是棘手的,但对于支持容忍自愿干扰所做的工作,和支持自动扩缩和容忍非
自愿干扰所做工作相比,有大量的重叠
## {{% heading "whatsnext" %}}