document non graceful node shutdown feature (#32406)

Signed-off-by: Ashutosh Kumar <sonasingh46@gmail.com>
pull/32888/head
Ashutosh Kumar 2022-04-12 19:57:24 +05:30 committed by GitHub
parent 15e978d8db
commit 3005445d34
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 65 additions and 0 deletions

View File

@ -450,6 +450,56 @@ Reason: Terminated
Message: Pod was terminated in response to imminent node shutdown.
```
{{< /note >}}
## Non Graceful node shutdown {#non-graceful-node-shutdown}
{{< feature-state state="alpha" for_k8s_version="v1.24" >}}
A node shutdown action may not be detected by kubelet's Node Shutdown Mananger,
either because the command does not trigger the inhibitor locks mechanism used by
kubelet or because of a user error, i.e., the ShutdownGracePeriod and
ShutdownGracePeriodCriticalPods are not configured properly. Please refer to above
section [Graceful Node Shutdown](#graceful-node-shutdown) for more details.
When a node is shutdown but not detected by kubelet's Node Shutdown Manager, the pods
that are part of a StatefulSet will be stuck in terminating status on
the shutdown node and cannot move to a new running node. This is because kubelet on
the shutdown node is not available to delete the pods so the StatefulSet cannot
create a new pod with the same name. If there are volumes used by the pods, the
VolumeAttachments will not be deleted from the original shutdown node so the volumes
used by these pods cannot be attached to a new running node. As a result, the
application running on the StatefulSet cannot function properly. If the original
shutdown node comes up, the pods will be deleted by kubelet and new pods will be
created on a different running node. If the original shutdown node does not come up,
these pods will be stuck in terminating status on the shutdown node forever.
To mitigate the above situation, a user can manually add the taint `node
kubernetes.io/out-of-service` with either `NoExecute` or `NoSchedule` effect to
a Node marking it out-of-service.
If the `NodeOutOfServiceVolumeDetach` [feature gate](/docs/reference/
command-line-tools-reference/feature-gates/) is enabled on
`kube-controller-manager`, and a Node is marked out-of-service with this taint, the
pods on the node will be forcefully deleted if there are no matching tolerations on
it and volume detach operations for the pods terminating on the node will happen
immediately. This allows the Pods on the out-of-service node to recover quickly on a
different node.
During a non-graceful shutdown, Pods are terminated in the two phases:
1. Force delete the Pods that do not have matching `out-of-service` tolerations.
2. Immediately perform detach volume operation for such pods.
{{< note >}}
- Before adding the taint `node.kubernetes.io/out-of-service` , it should be verified
that the node is already in shutdown or power off state (not in the middle of
restarting).
- The user is required to manually remove the out-of-service taint after the pods are
moved to a new node and the user has checked that the shutdown node has been
recovered since the user was the one who originally added the taint.
{{< /note >}}
### Pod Priority based graceful node shutdown {#pod-priority-graceful-node-shutdown}

View File

@ -381,6 +381,21 @@ Example: `node.kubernetes.io/pid-pressure:NoSchedule`
The kubelet checks D-value of the size of `/proc/sys/kernel/pid_max` and the PIDs consumed by Kubernetes on a node to get the number of available PIDs that referred to as the `pid.available` metric. The metric is then compared to the corresponding threshold that can be set on the kubelet to determine if the node condition and taint should be added/removed.
### node.kubernetes.io/out-of-service
Example: `node.kubernetes.io/out-of-service:NoExecute`
A user can manually add the taint to a Node marking it out-of-service. If the `NodeOutOfServiceVolumeDetach`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled on
`kube-controller-manager`, and a Node is marked out-of-service with this taint, the pods on the node will be forcefully deleted if there are no matching tolerations on it and volume detach operations for the pods terminating on the node will happen immediately. This allows the Pods on the out-of-service node to recover quickly on a different node.
{{< caution >}}
Refer to
[Non-graceful node shutdown](/docs/concepts/architecture/nodes/#non-graceful-node-shutdown)
for further details about when and how to use this taint.
{{< /caution >}}
### node.cloudprovider.kubernetes.io/uninitialized
Example: `node.cloudprovider.kubernetes.io/uninitialized:NoSchedule`