Merge pull request #41297 from tengqm/clarify-prestop-hook
Clarify prestop hook invocation conditionpull/41917/head
commit
3342cbf645
|
@ -38,8 +38,8 @@ If a {{< glossary_tooltip term_id="node" >}} dies, the Pods scheduled to that no
|
|||
are [scheduled for deletion](#pod-garbage-collection) after a timeout period.
|
||||
|
||||
Pods do not, by themselves, self-heal. If a Pod is scheduled to a
|
||||
{{< glossary_tooltip text="node" term_id="node" >}} that then fails, the Pod is deleted; likewise, a Pod won't
|
||||
survive an eviction due to a lack of resources or Node maintenance. Kubernetes uses a
|
||||
{{< glossary_tooltip text="node" term_id="node" >}} that then fails, the Pod is deleted; likewise,
|
||||
a Pod won't survive an eviction due to a lack of resources or Node maintenance. Kubernetes uses a
|
||||
higher-level abstraction, called a
|
||||
{{< glossary_tooltip term_id="controller" text="controller" >}}, that handles the work of
|
||||
managing the relatively disposable Pod instances.
|
||||
|
@ -57,8 +57,8 @@ created anew.
|
|||
|
||||
{{< figure src="/images/docs/pod.svg" title="Pod diagram" class="diagram-medium" >}}
|
||||
|
||||
*A multi-container Pod that contains a file puller and a
|
||||
web server that uses a persistent volume for shared storage between the containers.*
|
||||
A multi-container Pod that contains a file puller and a
|
||||
web server that uses a persistent volume for shared storage between the containers.
|
||||
|
||||
## Pod phase
|
||||
|
||||
|
@ -91,9 +91,9 @@ A Pod is granted a term to terminate gracefully, which defaults to 30 seconds.
|
|||
You can use the flag `--force` to [terminate a Pod by force](/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination-forced).
|
||||
{{< /note >}}
|
||||
|
||||
Since Kubernetes 1.27, the kubelet transitions deleted pods, except for
|
||||
[static pods](/docs/tasks/configure-pod-container/static-pod/) and
|
||||
[force-deleted pods](/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination-forced)
|
||||
Since Kubernetes 1.27, the kubelet transitions deleted Pods, except for
|
||||
[static Pods](/docs/tasks/configure-pod-container/static-pod/) and
|
||||
[force-deleted Pods](/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination-forced)
|
||||
without a finalizer, to a terminal phase (`Failed` or `Succeeded` depending on
|
||||
the exit statuses of the pod containers) before their deletion from the API server.
|
||||
|
||||
|
@ -219,13 +219,13 @@ status:
|
|||
...
|
||||
```
|
||||
|
||||
The Pod conditions you add must have names that meet the Kubernetes [label key format](/docs/concepts/overview/working-with-objects/labels/#syntax-and-character-set).
|
||||
|
||||
The Pod conditions you add must have names that meet the Kubernetes
|
||||
[label key format](/docs/concepts/overview/working-with-objects/labels/#syntax-and-character-set).
|
||||
|
||||
### Status for Pod readiness {#pod-readiness-status}
|
||||
|
||||
The `kubectl patch` command does not support patching object status.
|
||||
To set these `status.conditions` for the pod, applications and
|
||||
To set these `status.conditions` for the Pod, applications and
|
||||
{{< glossary_tooltip term_id="operator-pattern" text="operators">}} should use
|
||||
the `PATCH` action.
|
||||
You can use a [Kubernetes client library](/docs/reference/using-api/client-libraries/) to
|
||||
|
@ -247,20 +247,22 @@ When a Pod's containers are Ready but at least one custom condition is missing o
|
|||
After a Pod gets scheduled on a node, it needs to be admitted by the Kubelet and
|
||||
have any volumes mounted. Once these phases are complete, the Kubelet works with
|
||||
a container runtime (using {{< glossary_tooltip term_id="cri" >}}) to set up a
|
||||
runtime sandbox and configure networking for the Pod. If the
|
||||
`PodHasNetworkCondition` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled,
|
||||
Kubelet reports whether a pod has reached this initialization milestone through
|
||||
runtime sandbox and configure networking for the Pod. If the `PodHasNetworkCondition`
|
||||
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled,
|
||||
Kubelet reports whether a Pod has reached this initialization milestone through
|
||||
the `PodHasNetwork` condition in the `status.conditions` field of a Pod.
|
||||
|
||||
The `PodHasNetwork` condition is set to `False` by the Kubelet when it detects a
|
||||
Pod does not have a runtime sandbox with networking configured. This occurs in
|
||||
the following scenarios:
|
||||
* Early in the lifecycle of the Pod, when the kubelet has not yet begun to set up a sandbox for the Pod using the container runtime.
|
||||
* Later in the lifecycle of the Pod, when the Pod sandbox has been destroyed due
|
||||
to either:
|
||||
* the node rebooting, without the Pod getting evicted
|
||||
* for container runtimes that use virtual machines for isolation, the Pod
|
||||
sandbox virtual machine rebooting, which then requires creating a new sandbox and fresh container network configuration.
|
||||
|
||||
- Early in the lifecycle of the Pod, when the kubelet has not yet begun to set up a sandbox for
|
||||
the Pod using the container runtime.
|
||||
- Later in the lifecycle of the Pod, when the Pod sandbox has been destroyed due to either:
|
||||
- the node rebooting, without the Pod getting evicted
|
||||
- for container runtimes that use virtual machines for isolation, the Pod
|
||||
sandbox virtual machine rebooting, which then requires creating a new sandbox and
|
||||
fresh container network configuration.
|
||||
|
||||
The `PodHasNetwork` condition is set to `True` by the kubelet after the
|
||||
successful completion of sandbox creation and network configuration for the Pod
|
||||
|
@ -277,16 +279,14 @@ condition to `True` before sandbox creation and network configuration starts.
|
|||
|
||||
{{< feature-state for_k8s_version="v1.26" state="alpha" >}}
|
||||
|
||||
See [Pod Scheduling Readiness](/docs/concepts/scheduling-eviction/pod-scheduling-readiness/) for more information.
|
||||
See [Pod Scheduling Readiness](/docs/concepts/scheduling-eviction/pod-scheduling-readiness/)
|
||||
for more information.
|
||||
|
||||
## Container probes
|
||||
|
||||
A _probe_ is a diagnostic
|
||||
performed periodically by the
|
||||
[kubelet](/docs/reference/command-line-tools-reference/kubelet/)
|
||||
on a container. To perform a diagnostic,
|
||||
the kubelet either executes code within the container, or makes
|
||||
a network request.
|
||||
A _probe_ is a diagnostic performed periodically by the [kubelet](/docs/reference/command-line-tools-reference/kubelet/)
|
||||
on a container. To perform a diagnostic, the kubelet either executes code within the container,
|
||||
or makes a network request.
|
||||
|
||||
### Check mechanisms {#probe-check-methods}
|
||||
|
||||
|
@ -364,8 +364,6 @@ see [Configure Liveness, Readiness and Startup Probes](/docs/tasks/configure-pod
|
|||
|
||||
#### When should you use a liveness probe?
|
||||
|
||||
{{< feature-state for_k8s_version="v1.0" state="stable" >}}
|
||||
|
||||
If the process in your container is able to crash on its own whenever it
|
||||
encounters an issue or becomes unhealthy, you do not necessarily need a liveness
|
||||
probe; the kubelet will automatically perform the correct action in accordance
|
||||
|
@ -376,8 +374,6 @@ specify a liveness probe, and specify a `restartPolicy` of Always or OnFailure.
|
|||
|
||||
#### When should you use a readiness probe?
|
||||
|
||||
{{< feature-state for_k8s_version="v1.0" state="stable" >}}
|
||||
|
||||
If you'd like to start sending traffic to a Pod only when a probe succeeds,
|
||||
specify a readiness probe. In this case, the readiness probe might be the same
|
||||
as the liveness probe, but the existence of the readiness probe in the spec means
|
||||
|
@ -410,8 +406,6 @@ to stop.
|
|||
|
||||
#### When should you use a startup probe?
|
||||
|
||||
{{< feature-state for_k8s_version="v1.20" state="stable" >}}
|
||||
|
||||
Startup probes are useful for Pods that have containers that take a long time to
|
||||
come into service. Rather than set a long liveness interval, you can configure
|
||||
a separate configuration for probing the container as it starts up, allowing
|
||||
|
@ -440,60 +434,69 @@ shutdown.
|
|||
Typically, the container runtime sends a TERM signal to the main process in each
|
||||
container. Many container runtimes respect the `STOPSIGNAL` value defined in the container
|
||||
image and send this instead of TERM.
|
||||
Once the grace period has expired, the KILL signal is sent to any remaining
|
||||
processes, and the Pod is then deleted from the
|
||||
{{< glossary_tooltip text="API Server" term_id="kube-apiserver" >}}. If the kubelet or the
|
||||
container runtime's management service is restarted while waiting for processes to terminate, the
|
||||
cluster retries from the start including the full original grace period.
|
||||
Once the grace period has expired, the KILL signal is sent to any remaining processes, and the Pod
|
||||
is then deleted from the {{< glossary_tooltip text="API Server" term_id="kube-apiserver" >}}.
|
||||
If the kubelet or the container runtime's management service is restarted while waiting for
|
||||
processes to terminate, the cluster retries from the start including the full original grace period.
|
||||
|
||||
An example flow:
|
||||
|
||||
1. You use the `kubectl` tool to manually delete a specific Pod, with the default grace period
|
||||
(30 seconds).
|
||||
|
||||
1. The Pod in the API server is updated with the time beyond which the Pod is considered "dead"
|
||||
along with the grace period.
|
||||
If you use `kubectl describe` to check on the Pod you're deleting, that Pod shows up as
|
||||
"Terminating".
|
||||
If you use `kubectl describe` to check the Pod you're deleting, that Pod shows up as "Terminating".
|
||||
On the node where the Pod is running: as soon as the kubelet sees that a Pod has been marked
|
||||
as terminating (a graceful shutdown duration has been set), the kubelet begins the local Pod
|
||||
shutdown process.
|
||||
|
||||
1. If one of the Pod's containers has defined a `preStop`
|
||||
[hook](/docs/concepts/containers/container-lifecycle-hooks), the kubelet
|
||||
runs that hook inside of the container. If the `preStop` hook is still running after the
|
||||
grace period expires, the kubelet requests a small, one-off grace period extension of 2
|
||||
seconds.
|
||||
[hook](/docs/concepts/containers/container-lifecycle-hooks) and the `terminationGracePeriodSeconds`
|
||||
in the Pod spec is not set to 0, the kubelet runs that hook inside of the container.
|
||||
The default `terminationGracePeriodSeconds` setting is 30 seconds.
|
||||
|
||||
If the `preStop` hook is still running after the grace period expires, the kubelet requests
|
||||
a small, one-off grace period extension of 2 seconds.
|
||||
|
||||
{{< note >}}
|
||||
If the `preStop` hook needs longer to complete than the default grace period allows,
|
||||
you must modify `terminationGracePeriodSeconds` to suit this.
|
||||
{{< /note >}}
|
||||
|
||||
1. The kubelet triggers the container runtime to send a TERM signal to process 1 inside each
|
||||
container.
|
||||
{{< note >}}
|
||||
The containers in the Pod receive the TERM signal at different times and in an arbitrary
|
||||
order. If the order of shutdowns matters, consider using a `preStop` hook to synchronize.
|
||||
{{< /note >}}
|
||||
1. At the same time as the kubelet is starting graceful shutdown of the Pod, the control plane evaluates whether to remove that shutting-down Pod from EndpointSlice (and Endpoints) objects, where those objects represent
|
||||
a {{< glossary_tooltip term_id="service" text="Service" >}} with a configured
|
||||
{{< glossary_tooltip text="selector" term_id="selector" >}}.
|
||||
|
||||
1. At the same time as the kubelet is starting graceful shutdown of the Pod, the control plane
|
||||
evaluates whether to remove that shutting-down Pod from EndpointSlice (and Endpoints) objects,
|
||||
where those objects represent a {{< glossary_tooltip term_id="service" text="Service" >}}
|
||||
with a configured {{< glossary_tooltip text="selector" term_id="selector" >}}.
|
||||
{{< glossary_tooltip text="ReplicaSets" term_id="replica-set" >}} and other workload resources
|
||||
no longer treat the shutting-down Pod as a valid, in-service replica. Pods that shut down slowly
|
||||
should not continue to serve regular traffic and should start terminating and finish processing open connections.
|
||||
Some applications need to go beyond finishing open connections and need more graceful termination -
|
||||
for example: session draining and completion. Any endpoints that represent the terminating pods
|
||||
are not immediately removed from EndpointSlices,
|
||||
and a status indicating [terminating state](/docs/concepts/services-networking/endpoint-slices/#conditions)
|
||||
is exposed from the EndpointSlice API (and the legacy Endpoints API). Terminating
|
||||
endpoints always have their `ready` status
|
||||
as `false` (for backward compatibility with versions before 1.26),
|
||||
so load balancers will not use it for regular traffic.
|
||||
If traffic draining on terminating pod is needed, the actual readiness can be checked as a condition `serving`.
|
||||
You can find more details on how to implement connections draining
|
||||
in the tutorial [Pods And Endpoints Termination Flow](/docs/tutorials/services/pods-and-endpoint-termination-flow/)
|
||||
no longer treat the shutting-down Pod as a valid, in-service replica.
|
||||
|
||||
Pods that shut down slowly should not continue to serve regular traffic and should start
|
||||
terminating and finish processing open connections. Some applications need to go beyond
|
||||
finishing open connections and need more graceful termination, for example, session draining
|
||||
and completion.
|
||||
|
||||
Any endpoints that represent the terminating Pods are not immediately removed from
|
||||
EndpointSlices, and a status indicating [terminating state](/docs/concepts/services-networking/endpoint-slices/#conditions)
|
||||
is exposed from the EndpointSlice API (and the legacy Endpoints API).
|
||||
Terminating endpoints always have their `ready` status as `false` (for backward compatibility
|
||||
with versions before 1.26), so load balancers will not use it for regular traffic.
|
||||
|
||||
If traffic draining on terminating Pod is needed, the actual readiness can be checked as a
|
||||
condition `serving`. You can find more details on how to implement connections draining in the
|
||||
tutorial [Pods And Endpoints Termination Flow](/docs/tutorials/services/pods-and-endpoint-termination-flow/)
|
||||
|
||||
{{<note>}}
|
||||
If you don't have the `EndpointSliceTerminatingCondition` feature gate enabled
|
||||
in your cluster (the gate is on by default from Kubernetes 1.22, and locked to default in 1.26), then the Kubernetes control
|
||||
plane removes a Pod from any relevant EndpointSlices as soon as the Pod's
|
||||
in your cluster (the gate is on by default from Kubernetes 1.22, and locked to default in 1.26),
|
||||
then the Kubernetes control plane removes a Pod from any relevant EndpointSlices as soon as the Pod's
|
||||
termination grace period _begins_. The behavior above is described when the
|
||||
feature gate `EndpointSliceTerminatingCondition` is enabled.
|
||||
{{</note>}}
|
||||
|
@ -501,7 +504,7 @@ feature gate `EndpointSliceTerminatingCondition` is enabled.
|
|||
1. When the grace period expires, the kubelet triggers forcible shutdown. The container runtime sends
|
||||
`SIGKILL` to any processes still running in any container in the Pod.
|
||||
The kubelet also cleans up a hidden `pause` container if that container runtime uses one.
|
||||
1. The kubelet transitions the pod into a terminal phase (`Failed` or `Succeeded` depending on
|
||||
1. The kubelet transitions the Pod into a terminal phase (`Failed` or `Succeeded` depending on
|
||||
the end state of its containers). This step is guaranteed since version 1.27.
|
||||
1. The kubelet triggers forcible removal of Pod object from the API server, by setting grace period
|
||||
to 0 (immediate deletion).
|
||||
|
@ -518,11 +521,12 @@ the `--grace-period=<seconds>` option which allows you to override the default a
|
|||
own value.
|
||||
|
||||
Setting the grace period to `0` forcibly and immediately deletes the Pod from the API
|
||||
server. If the pod was still running on a node, that forcible deletion triggers the kubelet to
|
||||
server. If the Pod was still running on a node, that forcible deletion triggers the kubelet to
|
||||
begin immediate cleanup.
|
||||
|
||||
{{< note >}}
|
||||
You must specify an additional flag `--force` along with `--grace-period=0` in order to perform force deletions.
|
||||
You must specify an additional flag `--force` along with `--grace-period=0`
|
||||
in order to perform force deletions.
|
||||
{{< /note >}}
|
||||
|
||||
When a force deletion is performed, the API server does not wait for confirmation
|
||||
|
@ -532,7 +536,8 @@ name. On the node, Pods that are set to terminate immediately will still be give
|
|||
a small grace period before being force killed.
|
||||
|
||||
{{< caution >}}
|
||||
Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
|
||||
Immediate deletion does not wait for confirmation that the running resource has been terminated.
|
||||
The resource may continue to run on the cluster indefinitely.
|
||||
{{< /caution >}}
|
||||
|
||||
If you need to force-delete Pods that are part of a StatefulSet, refer to the task
|
||||
|
@ -545,21 +550,24 @@ For failed Pods, the API objects remain in the cluster's API until a human or
|
|||
{{< glossary_tooltip term_id="controller" text="controller" >}} process
|
||||
explicitly removes them.
|
||||
|
||||
The Pod garbage collector (PodGC), which is a controller in the control plane, cleans up terminated Pods (with a phase of `Succeeded` or
|
||||
`Failed`), when the number of Pods exceeds the configured threshold
|
||||
(determined by `terminated-pod-gc-threshold` in the kube-controller-manager).
|
||||
The Pod garbage collector (PodGC), which is a controller in the control plane, cleans up
|
||||
terminated Pods (with a phase of `Succeeded` or `Failed`), when the number of Pods exceeds the
|
||||
configured threshold (determined by `terminated-pod-gc-threshold` in the kube-controller-manager).
|
||||
This avoids a resource leak as Pods are created and terminated over time.
|
||||
|
||||
Additionally, PodGC cleans up any Pods which satisfy any of the following conditions:
|
||||
1. are orphan pods - bound to a node which no longer exists,
|
||||
2. are unscheduled terminating pods,
|
||||
3. are terminating pods, bound to a non-ready node tainted with [`node.kubernetes.io/out-of-service`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-out-of-service), when the `NodeOutOfServiceVolumeDetach` feature gate is enabled.
|
||||
|
||||
1. are orphan Pods - bound to a node which no longer exists,
|
||||
1. are unscheduled terminating Pods,
|
||||
1. are terminating Pods, bound to a non-ready node tainted with
|
||||
[`node.kubernetes.io/out-of-service`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-out-of-service),
|
||||
when the `NodeOutOfServiceVolumeDetach` feature gate is enabled.
|
||||
|
||||
When the `PodDisruptionConditions` feature gate is enabled, along with
|
||||
cleaning up the pods, PodGC will also mark them as failed if they are in a non-terminal
|
||||
phase. Also, PodGC adds a pod disruption condition when cleaning up an orphan
|
||||
pod (see also:
|
||||
[Pod disruption conditions](/docs/concepts/workloads/pods/disruptions#pod-disruption-conditions)).
|
||||
cleaning up the Pods, PodGC will also mark them as failed if they are in a non-terminal
|
||||
phase. Also, PodGC adds a Pod disruption condition when cleaning up an orphan Pod.
|
||||
See [Pod disruption conditions](/docs/concepts/workloads/pods/disruptions#pod-disruption-conditions)
|
||||
for more details.
|
||||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
|
@ -573,4 +581,4 @@ pod (see also:
|
|||
|
||||
* For detailed information about Pod and container status in the API, see
|
||||
the API reference documentation covering
|
||||
[`.status`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#PodStatus) for Pod.
|
||||
[`status`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#PodStatus) for Pod.
|
||||
|
|
Loading…
Reference in New Issue