Clarify pod scheduling during node graceful termination (#41061)
* clarify the pods scheduling during graceful termination: * Update content/en/docs/concepts/architecture/nodes.md Co-authored-by: Qiming Teng <tengqm@outlook.com> --------- Co-authored-by: Qiming Teng <tengqm@outlook.com>pull/41159/head
parent
b9c88e7ffe
commit
d22f3b970b
|
|
@ -396,7 +396,8 @@ The kubelet attempts to detect node system shutdown and terminates pods running
|
|||
|
||||
Kubelet ensures that pods follow the normal
|
||||
[pod termination process](/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination)
|
||||
during the node shutdown.
|
||||
during the node shutdown. During node shutdown, the kubelet does not accept new
|
||||
Pods (even if those Pods are already bound to the node).
|
||||
|
||||
The Graceful node shutdown feature depends on systemd since it takes advantage of
|
||||
[systemd inhibitor locks](https://www.freedesktop.org/wiki/Software/systemd/inhibit/) to
|
||||
|
|
@ -412,6 +413,20 @@ thus not activating the graceful node shutdown functionality.
|
|||
To activate the feature, the two kubelet config settings should be configured appropriately and
|
||||
set to non-zero values.
|
||||
|
||||
Once systemd detects or notifies node shutdown, the kubelet sets a `NotReady` condition on
|
||||
the Node, with the `reason` set to `"node is shutting down"`. The kube-scheduler honors this condition
|
||||
and does not schedule any Pods onto the affected node; other third-party schedulers are
|
||||
expected to follow the same logic. This means that new Pods won't be scheduled onto that node
|
||||
and therefore none will start.
|
||||
|
||||
The kubelet **also** rejects Pods during the `PodAdmission` phase if an ongoing
|
||||
node shutdown has been detected, so that even Pods with a
|
||||
{{< glossary_tooltip text="toleration" term_id="toleration" >}} for
|
||||
`node.kubernetes.io/not-ready:NoSchedule` do not start there.
|
||||
|
||||
At the same time when kubelet is setting that condition on its Node via the API, the kubelet also begins
|
||||
terminating any Pods that are running locally.
|
||||
|
||||
During a graceful shutdown, kubelet terminates pods in two phases:
|
||||
|
||||
1. Terminate regular pods running on the node.
|
||||
|
|
@ -430,6 +445,16 @@ Graceful node shutdown feature is configured with two
|
|||
[critical pods](/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical)
|
||||
during a node shutdown. This value should be less than `shutdownGracePeriod`.
|
||||
|
||||
{{< note >}}
|
||||
|
||||
There are cases when Node termination was cancelled by the system (or perhaps manually
|
||||
by an administrator). In either of those situations the
|
||||
Node will return to the `Ready` state. However Pods which already started the process
|
||||
of termination
|
||||
will not be restored by kubelet and will need to be re-scheduled.
|
||||
|
||||
{{< /note >}}
|
||||
|
||||
For example, if `shutdownGracePeriod=30s`, and
|
||||
`shutdownGracePeriodCriticalPods=10s`, kubelet will delay the node shutdown by
|
||||
30 seconds. During the shutdown, the first 20 (30-10) seconds would be reserved
|
||||
|
|
|
|||
Loading…
Reference in New Issue