[zh] sync pod-scheduling-readiness.md
parent
1bb518d9e0
commit
8e89830e92
|
@ -3,7 +3,6 @@ title: Pod 调度就绪态
|
|||
content_type: concept
|
||||
weight: 40
|
||||
---
|
||||
|
||||
<!--
|
||||
title: Pod Scheduling Readiness
|
||||
content_type: concept
|
||||
|
@ -27,7 +26,7 @@ to be considered for scheduling.
|
|||
Pod 一旦创建就被认为准备好进行调度。
|
||||
Kubernetes 调度程序尽职尽责地寻找节点来放置所有待处理的 Pod。
|
||||
然而,在实际环境中,会有一些 Pod 可能会长时间处于"缺少必要资源"状态。
|
||||
这些 Pod 实际上以一种不必要的方式扰乱了调度器(以及下游的集成方,如 Cluster AutoScaler)。
|
||||
这些 Pod 实际上以一种不必要的方式扰乱了调度器(以及 Cluster AutoScaler 这类下游的集成方)。
|
||||
|
||||
通过指定或删除 Pod 的 `.spec.schedulingGates`,可以控制 Pod 何时准备好被纳入考量进行调度。
|
||||
|
||||
|
@ -47,7 +46,8 @@ each schedulingGate can be removed in arbitrary order, but addition of a new sch
|
|||
该字段只能在创建 Pod 时初始化(由客户端创建,或在准入期间更改)。
|
||||
创建后,每个 schedulingGate 可以按任意顺序删除,但不允许添加新的调度门控。
|
||||
|
||||
{{< figure src="/docs/images/podSchedulingGates.svg" alt="pod-scheduling-gates-diagram" caption="<!--Figure. Pod SchedulingGates-->数字。Pod SchedulingGates" class="diagram-large" link="https://mermaid.live/edit#pako:eNplkktTwyAUhf8KgzuHWpukaYszutGlK3caFxQuCVMCGSDVTKf_XfKyPlhxz4HDB9wT5lYAptgHFuBRsdKxenFMClMYFIdfUdRYgbiD6ItJTEbR8wpEq5UpUfnDTf-5cbPoJjcbXdcaE61RVJIiqJvQ_Y30D-OCt-t3tFjcR5wZayiVnIGmkv4NiEfX9jijKTmmRH5jf0sRugOP0HyHUc1m6KGMFP27cM28fwSJDluPpNKaXqVJzmFNfHD2APRKSjnNFx9KhIpmzSfhVls3eHdTRrwG8QnxKfEZUUNeYTDBNbiaKRF_5dSfX-BQQQ0FpnEqQLJWhwIX5hyXsjbYl85wTINrgeC2EZd_xFQy7b_VJ6GCdd-itkxALE84dE3fAqXyIUZya6Qqe711OspVCI2ny2Vv35QqVO3-htt66ZWomAvVcZcv8yTfsiSFfJOydZoKvl_ttjLJVlJsblcJw-czwQ0zr9ZeqGDgeR77b2jD8xdtjtDn" >}}
|
||||
{{< figure src="/docs/images/podSchedulingGates.svg" alt="pod-scheduling-gates-diagram" caption="<!--Figure. Pod SchedulingGates-->图:Pod SchedulingGates" class="diagram-large" link="https://mermaid.live/edit#pako:eNplkktTwyAUhf8KgzuHWpukaYszutGlK3caFxQuCVMCGSDVTKf_XfKyPlhxz4HDB9wT5lYAptgHFuBRsdKxenFMClMYFIdfUdRYgbiD6ItJTEbR8wpEq5UpUfnDTf-5cbPoJjcbXdcaE61RVJIiqJvQ_Y30D-OCt-t3tFjcR5wZayiVnIGmkv4NiEfX9jijKTmmRH5jf0sRugOP0HyHUc1m6KGMFP27cM28fwSJDluPpNKaXqVJzmFNfHD2APRKSjnNFx9KhIpmzSfhVls3eHdTRrwG8QnxKfEZUUNeYTDBNbiaKRF_5dSfX-BQQQ0FpnEqQLJWhwIX5hyXsjbYl85wTINrgeC2EZd_xFQy7b_VJ6GCdd-itkxALE84dE3fAqXyIUZya6Qqe711OspVCI2ny2Vv35QqVO3-htt66ZWomAvVcZcv8yTfsiSFfJOydZoKvl_ttjLJVlJsblcJw-czwQ0zr9ZeqGDgeR77b2jD8xdtjtDn" >}}
|
||||
|
||||
<!--
|
||||
## Usage example
|
||||
|
||||
|
@ -93,7 +93,7 @@ The output is:
|
|||
输出是:
|
||||
|
||||
```none
|
||||
[{"name":"foo"},{"name":"bar"}]
|
||||
[{"name":"example.com/foo"},{"name":"example.com/bar"}]
|
||||
```
|
||||
|
||||
<!--
|
||||
|
@ -126,7 +126,8 @@ kubectl get pod test-pod -o wide
|
|||
Given the test-pod doesn't request any CPU/memory resources, it's expected that this Pod's state get
|
||||
transited from previous `SchedulingGated` to `Running`:
|
||||
-->
|
||||
鉴于 test-pod 不请求任何 CPU/内存资源,预计此 Pod 的状态会从之前的 `SchedulingGated` 转变为 `Running`:
|
||||
鉴于 test-pod 不请求任何 CPU/内存资源,预计此 Pod 的状态会从之前的
|
||||
`SchedulingGated` 转变为 `Running`:
|
||||
|
||||
```none
|
||||
NAME READY STATUS RESTARTS AGE IP NODE
|
||||
|
@ -146,9 +147,61 @@ scheduling. You can use `scheduler_pending_pods{queue="gated"}` to check the met
|
|||
以区分 Pod 是否已尝试调度但被宣称不可调度,或明确标记为未准备好调度。
|
||||
你可以使用 `scheduler_pending_pods{queue="gated"}` 来检查指标结果。
|
||||
|
||||
<!--
|
||||
## Mutable Pod Scheduling Directives
|
||||
-->
|
||||
## 可变 Pod 调度指令 {#mutable-pod-scheduling-directives}
|
||||
|
||||
{{< feature-state for_k8s_version="v1.27" state="beta" >}}
|
||||
|
||||
<!--
|
||||
You can mutate scheduling directives of Pods while they have scheduling gates, with certain constraints.
|
||||
At a high level, you can only tighten the scheduling directives of a Pod. In other words, the updated
|
||||
directives would cause the Pods to only be able to be scheduled on a subset of the nodes that it would
|
||||
previously match. More concretely, the rules for updating a Pod's scheduling directives are as follows:
|
||||
-->
|
||||
当 Pod 具有调度门控时,你可以在某些约束条件下改变 Pod 的调度指令。
|
||||
在高层次上,你只能收紧 Pod 的调度指令。换句话说,更新后的指令将导致
|
||||
Pod 只能被调度到它之前匹配的节点子集上。
|
||||
更具体地说,更新 Pod 的调度指令的规则如下:
|
||||
|
||||
<!--
|
||||
1. For `.spec.nodeSelector`, only additions are allowed. If absent, it will be allowed to be set.
|
||||
|
||||
2. For `spec.affinity.nodeAffinity`, if nil, then setting anything is allowed.
|
||||
-->
|
||||
1. 对于 `.spec.nodeSelector`,只允许增加。如果原来未设置,则允许设置此字段。
|
||||
|
||||
2. 对于 `spec.affinity.nodeAffinity`,如果当前值为 nil,则允许设置为任意值。
|
||||
|
||||
<!--
|
||||
3. If `NodeSelectorTerms` was empty, it will be allowed to be set.
|
||||
If not empty, then only additions of `NodeSelectorRequirements` to `matchExpressions`
|
||||
or `fieldExpressions` are allowed, and no changes to existing `matchExpressions`
|
||||
and `fieldExpressions` will be allowed. This is because the terms in
|
||||
`.requiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms`, are ORed
|
||||
while the expressions in `nodeSelectorTerms[].matchExpressions` and
|
||||
`nodeSelectorTerms[].fieldExpressions` are ANDed.
|
||||
-->
|
||||
3. 如果 `NodeSelectorTerms` 之前为空,则允许设置该字段。
|
||||
如果之前不为空,则仅允许增加 `NodeSelectorRequirements` 到 `matchExpressions`
|
||||
或 `fieldExpressions`,且不允许更改当前的 `matchExpressions` 和 `fieldExpressions`。
|
||||
这是因为 `.requiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms`
|
||||
中的条目被执行逻辑或运算,而 `nodeSelectorTerms[].matchExpressions` 和
|
||||
`nodeSelectorTerms[].fieldExpressions` 中的表达式被执行逻辑与运算。
|
||||
|
||||
<!--
|
||||
4. For `.preferredDuringSchedulingIgnoredDuringExecution`, all updates are allowed.
|
||||
This is because preferred terms are not authoritative, and so policy controllers
|
||||
don't validate those terms.
|
||||
-->
|
||||
4. 对于 `.preferredDuringSchedulingIgnoredDuringExecution`,所有更新都被允许。
|
||||
这是因为首选条目不具有权威性,因此策略控制器不会验证这些条目。
|
||||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
<!--
|
||||
* Read the [PodSchedulingReadiness KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-scheduling/3521-pod-scheduling-readiness) for more details
|
||||
-->
|
||||
* 阅读 [PodSchedulingReadiness KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-scheduling/3521-pod-scheduling-readiness) 了解更多详情
|
||||
* 阅读 [PodSchedulingReadiness KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-scheduling/3521-pod-scheduling-readiness)
|
||||
了解更多详情
|
||||
|
|
|
@ -54,7 +54,8 @@ For example,
|
|||
-->
|
||||
## 概念 {#concepts}
|
||||
|
||||
你可以使用命令 [kubectl taint](/docs/reference/generated/kubectl/kubectl-commands#taint) 给节点增加一个污点。比如,
|
||||
你可以使用命令 [kubectl taint](/docs/reference/generated/kubectl/kubectl-commands#taint)
|
||||
给节点增加一个污点。比如:
|
||||
|
||||
```shell
|
||||
kubectl taint nodes node1 key1=value1:NoSchedule
|
||||
|
@ -82,7 +83,7 @@ to schedule onto `node1`:
|
|||
-->
|
||||
你可以在 Pod 规约中为 Pod 设置容忍度。
|
||||
下面两个容忍度均与上面例子中使用 `kubectl taint` 命令创建的污点相匹配,
|
||||
因此如果一个 Pod 拥有其中的任何一个容忍度,都能够被调度到 `node1` :
|
||||
因此如果一个 Pod 拥有其中的任何一个容忍度,都能够被调度到 `node1`:
|
||||
|
||||
```yaml
|
||||
tolerations:
|
||||
|
@ -119,11 +120,10 @@ A toleration "matches" a taint if the keys are the same and the effects are the
|
|||
-->
|
||||
一个容忍度和一个污点相“匹配”是指它们有一样的键名和效果,并且:
|
||||
|
||||
* 如果 `operator` 是 `Exists` (此时容忍度不能指定 `value`),或者
|
||||
* 如果 `operator` 是 `Equal` ,则它们的 `value` 应该相等
|
||||
* 如果 `operator` 是 `Exists`(此时容忍度不能指定 `value`),或者
|
||||
* 如果 `operator` 是 `Equal`,则它们的 `value` 应该相等。
|
||||
|
||||
{{< note >}}
|
||||
|
||||
<!--
|
||||
There are two special cases:
|
||||
|
||||
|
@ -182,7 +182,7 @@ scheduled onto the node (if it is not yet running on the node).
|
|||
<!--
|
||||
For example, imagine you taint a node like this
|
||||
-->
|
||||
例如,假设你给一个节点添加了如下污点
|
||||
例如,假设你给一个节点添加了如下污点:
|
||||
|
||||
```shell
|
||||
kubectl taint nodes node1 key1=value1:NoSchedule
|
||||
|
@ -279,7 +279,7 @@ onto nodes labeled with `dedicated=groupName`.
|
|||
很容易就能做到)。
|
||||
拥有上述容忍度的 Pod 就能够被调度到上述专用节点,同时也能够被调度到集群中的其它节点。
|
||||
如果你希望这些 Pod 只能被调度到上述专用节点,
|
||||
那么你还需要给这些专用节点另外添加一个和上述污点类似的 label (例如:`dedicated=groupName`),
|
||||
那么你还需要给这些专用节点另外添加一个和上述污点类似的 label(例如:`dedicated=groupName`),
|
||||
同时还要在上述准入控制器中给 Pod 增加节点亲和性要求,要求上述 Pod 只能被调度到添加了
|
||||
`dedicated=groupName` 标签的节点上。
|
||||
|
||||
|
@ -310,7 +310,7 @@ manually add tolerations to your pods.
|
|||
我们希望不需要这类硬件的 Pod 不要被调度到这些特殊节点,以便为后继需要这类硬件的 Pod 保留资源。
|
||||
要达到这个目的,可以先给配备了特殊硬件的节点添加污点
|
||||
(例如 `kubectl taint nodes nodename special=true:NoSchedule` 或
|
||||
`kubectl taint nodes nodename special=true:PreferNoSchedule`),
|
||||
`kubectl taint nodes nodename special=true:PreferNoSchedule`),
|
||||
然后给使用了这类特殊硬件的 Pod 添加一个相匹配的容忍度。
|
||||
和专用节点的例子类似,添加这个容忍度的最简单的方法是使用自定义
|
||||
[准入控制器](/zh-cn/docs/reference/access-authn-authz/admission-controllers/)。
|
||||
|
@ -333,7 +333,7 @@ when there are node problems, which is described in the next section.
|
|||
<!--
|
||||
## Taint based Evictions
|
||||
-->
|
||||
## 基于污点的驱逐 {#taint-based-evictions}
|
||||
## 基于污点的驱逐 {#taint-based-evictions}
|
||||
|
||||
{{< feature-state for_k8s_version="v1.18" state="stable" >}}
|
||||
|
||||
|
@ -347,7 +347,7 @@ running on the node as follows
|
|||
* pods that tolerate the taint with a specified `tolerationSeconds` remain
|
||||
bound for the specified amount of time
|
||||
-->
|
||||
前文提到过污点的效果值 `NoExecute` 会影响已经在节点上运行的 Pod,如下
|
||||
前文提到过污点的效果值 `NoExecute` 会影响已经在节点上运行的如下 Pod:
|
||||
|
||||
* 如果 Pod 不能忍受这类污点,Pod 会马上被驱逐。
|
||||
* 如果 Pod 能够忍受这类污点,但是在容忍度定义中没有指定 `tolerationSeconds`,
|
||||
|
@ -384,8 +384,8 @@ are true. The following taints are built in:
|
|||
* `node.kubernetes.io/network-unavailable`:节点网络不可用。
|
||||
* `node.kubernetes.io/unschedulable`: 节点不可调度。
|
||||
* `node.cloudprovider.kubernetes.io/uninitialized`:如果 kubelet 启动时指定了一个“外部”云平台驱动,
|
||||
它将给当前节点添加一个污点将其标志为不可用。在 cloud-controller-manager
|
||||
的一个控制器初始化这个节点后,kubelet 将删除这个污点。
|
||||
它将给当前节点添加一个污点将其标志为不可用。在 cloud-controller-manager
|
||||
的一个控制器初始化这个节点后,kubelet 将删除这个污点。
|
||||
|
||||
<!--
|
||||
In case a node is to be evicted, the node controller or the kubelet adds relevant taints
|
||||
|
@ -395,6 +395,16 @@ controller can remove the relevant taint(s).
|
|||
在节点被驱逐时,节点控制器或者 kubelet 会添加带有 `NoExecute` 效果的相关污点。
|
||||
如果异常状态恢复正常,kubelet 或节点控制器能够移除相关的污点。
|
||||
|
||||
<!--
|
||||
In some cases when the node is unreachable, the API server is unable to communicate
|
||||
with the kubelet on the node. The decision to delete the pods cannot be communicated to
|
||||
the kubelet until communication with the API server is re-established. In the meantime,
|
||||
the pods that are scheduled for deletion may continue to run on the partitioned node.
|
||||
-->
|
||||
在某些情况下,当节点不可达时,API 服务器无法与节点上的 kubelet 进行通信。
|
||||
在与 API 服务器的通信被重新建立之前,删除 Pod 的决定无法传递到 kubelet。
|
||||
同时,被调度进行删除的那些 Pod 可能会继续运行在分区后的节点上。
|
||||
|
||||
{{< note >}}
|
||||
<!--
|
||||
The control plane limits the rate of adding node new taints to nodes. This rate limiting
|
||||
|
@ -518,7 +528,6 @@ tolerations to all daemons, to prevent DaemonSets from breaking.
|
|||
* `node.kubernetes.io/unschedulable` (1.10 or later)
|
||||
* `node.kubernetes.io/network-unavailable` (*host network only*)
|
||||
-->
|
||||
|
||||
DaemonSet 控制器自动为所有守护进程添加如下 `NoSchedule` 容忍度,以防 DaemonSet 崩溃:
|
||||
|
||||
* `node.kubernetes.io/memory-pressure`
|
||||
|
@ -531,7 +540,6 @@ DaemonSet 控制器自动为所有守护进程添加如下 `NoSchedule` 容忍
|
|||
Adding these tolerations ensures backward compatibility. You can also add
|
||||
arbitrary tolerations to DaemonSets.
|
||||
-->
|
||||
|
||||
添加上述容忍度确保了向后兼容,你也可以选择自由向 DaemonSet 添加容忍度。
|
||||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
|
Loading…
Reference in New Issue