Clarify pod scheduling during node graceful termination (#41061)

* clarify the pods scheduling during graceful termination:

* Update content/en/docs/concepts/architecture/nodes.md

Co-authored-by: Qiming Teng <tengqm@outlook.com>

---------

Co-authored-by: Qiming Teng <tengqm@outlook.com>
pull/41159/head
Sergey Kanzhelev 2023-05-15 13:39:35 -07:00 committed by GitHub
parent b9c88e7ffe
commit d22f3b970b
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 26 additions and 1 deletions

View File

@ -396,7 +396,8 @@ The kubelet attempts to detect node system shutdown and terminates pods running
Kubelet ensures that pods follow the normal
[pod termination process](/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination)
during the node shutdown.
during the node shutdown. During node shutdown, the kubelet does not accept new
Pods (even if those Pods are already bound to the node).
The Graceful node shutdown feature depends on systemd since it takes advantage of
[systemd inhibitor locks](https://www.freedesktop.org/wiki/Software/systemd/inhibit/) to
@ -412,6 +413,20 @@ thus not activating the graceful node shutdown functionality.
To activate the feature, the two kubelet config settings should be configured appropriately and
set to non-zero values.
Once systemd detects or notifies node shutdown, the kubelet sets a `NotReady` condition on
the Node, with the `reason` set to `"node is shutting down"`. The kube-scheduler honors this condition
and does not schedule any Pods onto the affected node; other third-party schedulers are
expected to follow the same logic. This means that new Pods won't be scheduled onto that node
and therefore none will start.
The kubelet **also** rejects Pods during the `PodAdmission` phase if an ongoing
node shutdown has been detected, so that even Pods with a
{{< glossary_tooltip text="toleration" term_id="toleration" >}} for
`node.kubernetes.io/not-ready:NoSchedule` do not start there.
At the same time when kubelet is setting that condition on its Node via the API, the kubelet also begins
terminating any Pods that are running locally.
During a graceful shutdown, kubelet terminates pods in two phases:
1. Terminate regular pods running on the node.
@ -430,6 +445,16 @@ Graceful node shutdown feature is configured with two
[critical pods](/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical)
during a node shutdown. This value should be less than `shutdownGracePeriod`.
{{< note >}}
There are cases when Node termination was cancelled by the system (or perhaps manually
by an administrator). In either of those situations the
Node will return to the `Ready` state. However Pods which already started the process
of termination
will not be restored by kubelet and will need to be re-scheduled.
{{< /note >}}
For example, if `shutdownGracePeriod=30s`, and
`shutdownGracePeriodCriticalPods=10s`, kubelet will delay the node shutdown by
30 seconds. During the shutdown, the first 20 (30-10) seconds would be reserved