Merge pull request #41108 from Zhuzhenghao/clean
Cleanup page garbage-collection and nodespull/41111/head
commit
c6ff7b40db
|
@ -8,21 +8,21 @@ weight: 70
|
||||||
{{<glossary_definition term_id="garbage-collection" length="short">}} This
|
{{<glossary_definition term_id="garbage-collection" length="short">}} This
|
||||||
allows the clean up of resources like the following:
|
allows the clean up of resources like the following:
|
||||||
|
|
||||||
* [Terminated pods](/docs/concepts/workloads/pods/pod-lifecycle/#pod-garbage-collection)
|
* [Terminated pods](/docs/concepts/workloads/pods/pod-lifecycle/#pod-garbage-collection)
|
||||||
* [Completed Jobs](/docs/concepts/workloads/controllers/ttlafterfinished/)
|
* [Completed Jobs](/docs/concepts/workloads/controllers/ttlafterfinished/)
|
||||||
* [Objects without owner references](#owners-dependents)
|
* [Objects without owner references](#owners-dependents)
|
||||||
* [Unused containers and container images](#containers-images)
|
* [Unused containers and container images](#containers-images)
|
||||||
* [Dynamically provisioned PersistentVolumes with a StorageClass reclaim policy of Delete](/docs/concepts/storage/persistent-volumes/#delete)
|
* [Dynamically provisioned PersistentVolumes with a StorageClass reclaim policy of Delete](/docs/concepts/storage/persistent-volumes/#delete)
|
||||||
* [Stale or expired CertificateSigningRequests (CSRs)](/docs/reference/access-authn-authz/certificate-signing-requests/#request-signing-process)
|
* [Stale or expired CertificateSigningRequests (CSRs)](/docs/reference/access-authn-authz/certificate-signing-requests/#request-signing-process)
|
||||||
* {{<glossary_tooltip text="Nodes" term_id="node">}} deleted in the following scenarios:
|
* {{<glossary_tooltip text="Nodes" term_id="node">}} deleted in the following scenarios:
|
||||||
* On a cloud when the cluster uses a [cloud controller manager](/docs/concepts/architecture/cloud-controller/)
|
* On a cloud when the cluster uses a [cloud controller manager](/docs/concepts/architecture/cloud-controller/)
|
||||||
* On-premises when the cluster uses an addon similar to a cloud controller
|
* On-premises when the cluster uses an addon similar to a cloud controller
|
||||||
manager
|
manager
|
||||||
* [Node Lease objects](/docs/concepts/architecture/nodes/#heartbeats)
|
* [Node Lease objects](/docs/concepts/architecture/nodes/#heartbeats)
|
||||||
|
|
||||||
## Owners and dependents {#owners-dependents}
|
## Owners and dependents {#owners-dependents}
|
||||||
|
|
||||||
Many objects in Kubernetes link to each other through [*owner references*](/docs/concepts/overview/working-with-objects/owners-dependents/).
|
Many objects in Kubernetes link to each other through [*owner references*](/docs/concepts/overview/working-with-objects/owners-dependents/).
|
||||||
Owner references tell the control plane which objects are dependent on others.
|
Owner references tell the control plane which objects are dependent on others.
|
||||||
Kubernetes uses owner references to give the control plane, and other API
|
Kubernetes uses owner references to give the control plane, and other API
|
||||||
clients, the opportunity to clean up related resources before deleting an
|
clients, the opportunity to clean up related resources before deleting an
|
||||||
|
@ -49,7 +49,7 @@ In v1.20+, if a cluster-scoped dependent specifies a namespaced kind as an owner
|
||||||
it is treated as having an unresolvable owner reference, and is not able to be garbage collected.
|
it is treated as having an unresolvable owner reference, and is not able to be garbage collected.
|
||||||
|
|
||||||
In v1.20+, if the garbage collector detects an invalid cross-namespace `ownerReference`,
|
In v1.20+, if the garbage collector detects an invalid cross-namespace `ownerReference`,
|
||||||
or a cluster-scoped dependent with an `ownerReference` referencing a namespaced kind, a warning Event
|
or a cluster-scoped dependent with an `ownerReference` referencing a namespaced kind, a warning Event
|
||||||
with a reason of `OwnerRefInvalidNamespace` and an `involvedObject` of the invalid dependent is reported.
|
with a reason of `OwnerRefInvalidNamespace` and an `involvedObject` of the invalid dependent is reported.
|
||||||
You can check for that kind of Event by running
|
You can check for that kind of Event by running
|
||||||
`kubectl get events -A --field-selector=reason=OwnerRefInvalidNamespace`.
|
`kubectl get events -A --field-selector=reason=OwnerRefInvalidNamespace`.
|
||||||
|
@ -61,31 +61,31 @@ Kubernetes checks for and deletes objects that no longer have owner
|
||||||
references, like the pods left behind when you delete a ReplicaSet. When you
|
references, like the pods left behind when you delete a ReplicaSet. When you
|
||||||
delete an object, you can control whether Kubernetes deletes the object's
|
delete an object, you can control whether Kubernetes deletes the object's
|
||||||
dependents automatically, in a process called *cascading deletion*. There are
|
dependents automatically, in a process called *cascading deletion*. There are
|
||||||
two types of cascading deletion, as follows:
|
two types of cascading deletion, as follows:
|
||||||
|
|
||||||
* Foreground cascading deletion
|
* Foreground cascading deletion
|
||||||
* Background cascading deletion
|
* Background cascading deletion
|
||||||
|
|
||||||
You can also control how and when garbage collection deletes resources that have
|
You can also control how and when garbage collection deletes resources that have
|
||||||
owner references using Kubernetes {{<glossary_tooltip text="finalizers" term_id="finalizer">}}.
|
owner references using Kubernetes {{<glossary_tooltip text="finalizers" term_id="finalizer">}}.
|
||||||
|
|
||||||
### Foreground cascading deletion {#foreground-deletion}
|
### Foreground cascading deletion {#foreground-deletion}
|
||||||
|
|
||||||
In foreground cascading deletion, the owner object you're deleting first enters
|
In foreground cascading deletion, the owner object you're deleting first enters
|
||||||
a *deletion in progress* state. In this state, the following happens to the
|
a *deletion in progress* state. In this state, the following happens to the
|
||||||
owner object:
|
owner object:
|
||||||
|
|
||||||
* The Kubernetes API server sets the object's `metadata.deletionTimestamp`
|
* The Kubernetes API server sets the object's `metadata.deletionTimestamp`
|
||||||
field to the time the object was marked for deletion.
|
field to the time the object was marked for deletion.
|
||||||
* The Kubernetes API server also sets the `metadata.finalizers` field to
|
* The Kubernetes API server also sets the `metadata.finalizers` field to
|
||||||
`foregroundDeletion`.
|
`foregroundDeletion`.
|
||||||
* The object remains visible through the Kubernetes API until the deletion
|
* The object remains visible through the Kubernetes API until the deletion
|
||||||
process is complete.
|
process is complete.
|
||||||
|
|
||||||
After the owner object enters the deletion in progress state, the controller
|
After the owner object enters the deletion in progress state, the controller
|
||||||
deletes the dependents. After deleting all the dependent objects, the controller
|
deletes the dependents. After deleting all the dependent objects, the controller
|
||||||
deletes the owner object. At this point, the object is no longer visible in the
|
deletes the owner object. At this point, the object is no longer visible in the
|
||||||
Kubernetes API.
|
Kubernetes API.
|
||||||
|
|
||||||
During foreground cascading deletion, the only dependents that block owner
|
During foreground cascading deletion, the only dependents that block owner
|
||||||
deletion are those that have the `ownerReference.blockOwnerDeletion=true` field.
|
deletion are those that have the `ownerReference.blockOwnerDeletion=true` field.
|
||||||
|
@ -113,7 +113,7 @@ to override this behaviour, see [Delete owner objects and orphan dependents](/do
|
||||||
The {{<glossary_tooltip text="kubelet" term_id="kubelet">}} performs garbage
|
The {{<glossary_tooltip text="kubelet" term_id="kubelet">}} performs garbage
|
||||||
collection on unused images every five minutes and on unused containers every
|
collection on unused images every five minutes and on unused containers every
|
||||||
minute. You should avoid using external garbage collection tools, as these can
|
minute. You should avoid using external garbage collection tools, as these can
|
||||||
break the kubelet behavior and remove containers that should exist.
|
break the kubelet behavior and remove containers that should exist.
|
||||||
|
|
||||||
To configure options for unused container and image garbage collection, tune the
|
To configure options for unused container and image garbage collection, tune the
|
||||||
kubelet using a [configuration file](/docs/tasks/administer-cluster/kubelet-config-file/)
|
kubelet using a [configuration file](/docs/tasks/administer-cluster/kubelet-config-file/)
|
||||||
|
@ -124,13 +124,13 @@ resource type.
|
||||||
### Container image lifecycle
|
### Container image lifecycle
|
||||||
|
|
||||||
Kubernetes manages the lifecycle of all images through its *image manager*,
|
Kubernetes manages the lifecycle of all images through its *image manager*,
|
||||||
which is part of the kubelet, with the cooperation of
|
which is part of the kubelet, with the cooperation of
|
||||||
{{< glossary_tooltip text="cadvisor" term_id="cadvisor" >}}. The kubelet
|
{{< glossary_tooltip text="cadvisor" term_id="cadvisor" >}}. The kubelet
|
||||||
considers the following disk usage limits when making garbage collection
|
considers the following disk usage limits when making garbage collection
|
||||||
decisions:
|
decisions:
|
||||||
|
|
||||||
* `HighThresholdPercent`
|
* `HighThresholdPercent`
|
||||||
* `LowThresholdPercent`
|
* `LowThresholdPercent`
|
||||||
|
|
||||||
Disk usage above the configured `HighThresholdPercent` value triggers garbage
|
Disk usage above the configured `HighThresholdPercent` value triggers garbage
|
||||||
collection, which deletes images in order based on the last time they were used,
|
collection, which deletes images in order based on the last time they were used,
|
||||||
|
@ -140,17 +140,17 @@ until disk usage reaches the `LowThresholdPercent` value.
|
||||||
### Container garbage collection {#container-image-garbage-collection}
|
### Container garbage collection {#container-image-garbage-collection}
|
||||||
|
|
||||||
The kubelet garbage collects unused containers based on the following variables,
|
The kubelet garbage collects unused containers based on the following variables,
|
||||||
which you can define:
|
which you can define:
|
||||||
|
|
||||||
* `MinAge`: the minimum age at which the kubelet can garbage collect a
|
* `MinAge`: the minimum age at which the kubelet can garbage collect a
|
||||||
container. Disable by setting to `0`.
|
container. Disable by setting to `0`.
|
||||||
* `MaxPerPodContainer`: the maximum number of dead containers each Pod
|
* `MaxPerPodContainer`: the maximum number of dead containers each Pod
|
||||||
can have. Disable by setting to less than `0`.
|
can have. Disable by setting to less than `0`.
|
||||||
* `MaxContainers`: the maximum number of dead containers the cluster can have.
|
* `MaxContainers`: the maximum number of dead containers the cluster can have.
|
||||||
Disable by setting to less than `0`.
|
Disable by setting to less than `0`.
|
||||||
|
|
||||||
In addition to these variables, the kubelet garbage collects unidentified and
|
In addition to these variables, the kubelet garbage collects unidentified and
|
||||||
deleted containers, typically starting with the oldest first.
|
deleted containers, typically starting with the oldest first.
|
||||||
|
|
||||||
`MaxPerPodContainer` and `MaxContainers` may potentially conflict with each other
|
`MaxPerPodContainer` and `MaxContainers` may potentially conflict with each other
|
||||||
in situations where retaining the maximum number of containers per Pod
|
in situations where retaining the maximum number of containers per Pod
|
||||||
|
@ -171,8 +171,8 @@ You can tune garbage collection of resources by configuring options specific to
|
||||||
the controllers managing those resources. The following pages show you how to
|
the controllers managing those resources. The following pages show you how to
|
||||||
configure garbage collection:
|
configure garbage collection:
|
||||||
|
|
||||||
* [Configuring cascading deletion of Kubernetes objects](/docs/tasks/administer-cluster/use-cascading-deletion/)
|
* [Configuring cascading deletion of Kubernetes objects](/docs/tasks/administer-cluster/use-cascading-deletion/)
|
||||||
* [Configuring cleanup of finished Jobs](/docs/concepts/workloads/controllers/ttlafterfinished/)
|
* [Configuring cleanup of finished Jobs](/docs/concepts/workloads/controllers/ttlafterfinished/)
|
||||||
|
|
||||||
<!-- * [Configuring unused container and image garbage collection](/docs/tasks/administer-cluster/reconfigure-kubelet/) -->
|
<!-- * [Configuring unused container and image garbage collection](/docs/tasks/administer-cluster/reconfigure-kubelet/) -->
|
||||||
|
|
||||||
|
|
|
@ -81,7 +81,7 @@ first and re-added after the update.
|
||||||
### Self-registration of Nodes
|
### Self-registration of Nodes
|
||||||
|
|
||||||
When the kubelet flag `--register-node` is true (the default), the kubelet will attempt to
|
When the kubelet flag `--register-node` is true (the default), the kubelet will attempt to
|
||||||
register itself with the API server. This is the preferred pattern, used by most distros.
|
register itself with the API server. This is the preferred pattern, used by most distros.
|
||||||
|
|
||||||
For self-registration, the kubelet is started with the following options:
|
For self-registration, the kubelet is started with the following options:
|
||||||
|
|
||||||
|
@ -122,7 +122,7 @@ Pods already scheduled on the Node may misbehave or cause issues if the Node
|
||||||
configuration will be changed on kubelet restart. For example, already running
|
configuration will be changed on kubelet restart. For example, already running
|
||||||
Pod may be tainted against the new labels assigned to the Node, while other
|
Pod may be tainted against the new labels assigned to the Node, while other
|
||||||
Pods, that are incompatible with that Pod will be scheduled based on this new
|
Pods, that are incompatible with that Pod will be scheduled based on this new
|
||||||
label. Node re-registration ensures all Pods will be drained and properly
|
label. Node re-registration ensures all Pods will be drained and properly
|
||||||
re-scheduled.
|
re-scheduled.
|
||||||
{{< /note >}}
|
{{< /note >}}
|
||||||
|
|
||||||
|
@ -225,9 +225,9 @@ of the Node resource. For example, the following JSON structure describes a heal
|
||||||
|
|
||||||
When problems occur on nodes, the Kubernetes control plane automatically creates
|
When problems occur on nodes, the Kubernetes control plane automatically creates
|
||||||
[taints](/docs/concepts/scheduling-eviction/taint-and-toleration/) that match the conditions
|
[taints](/docs/concepts/scheduling-eviction/taint-and-toleration/) that match the conditions
|
||||||
affecting the node. An example of this is when the `status` of the Ready condition
|
affecting the node. An example of this is when the `status` of the Ready condition
|
||||||
remains `Unknown` or `False` for longer than the kube-controller-manager's `NodeMonitorGracePeriod`,
|
remains `Unknown` or `False` for longer than the kube-controller-manager's `NodeMonitorGracePeriod`,
|
||||||
which defaults to 40 seconds. This will cause either an `node.kubernetes.io/unreachable` taint, for an `Unknown` status,
|
which defaults to 40 seconds. This will cause either an `node.kubernetes.io/unreachable` taint, for an `Unknown` status,
|
||||||
or a `node.kubernetes.io/not-ready` taint, for a `False` status, to be added to the Node.
|
or a `node.kubernetes.io/not-ready` taint, for a `False` status, to be added to the Node.
|
||||||
|
|
||||||
These taints affect pending pods as the scheduler takes the Node's taints into consideration when
|
These taints affect pending pods as the scheduler takes the Node's taints into consideration when
|
||||||
|
@ -321,7 +321,7 @@ This period can be configured using the `--node-monitor-period` flag on the
|
||||||
|
|
||||||
### Rate limits on eviction
|
### Rate limits on eviction
|
||||||
|
|
||||||
In most cases, the node controller limits the eviction rate to
|
In most cases, the node controller limits the eviction rate to
|
||||||
`--node-eviction-rate` (default 0.1) per second, meaning it won't evict pods
|
`--node-eviction-rate` (default 0.1) per second, meaning it won't evict pods
|
||||||
from more than 1 node per 10 seconds.
|
from more than 1 node per 10 seconds.
|
||||||
|
|
||||||
|
@ -345,7 +345,7 @@ then the eviction mechanism does not take per-zone unavailability into account.
|
||||||
A key reason for spreading your nodes across availability zones is so that the
|
A key reason for spreading your nodes across availability zones is so that the
|
||||||
workload can be shifted to healthy zones when one entire zone goes down.
|
workload can be shifted to healthy zones when one entire zone goes down.
|
||||||
Therefore, if all nodes in a zone are unhealthy, then the node controller evicts at
|
Therefore, if all nodes in a zone are unhealthy, then the node controller evicts at
|
||||||
the normal rate of `--node-eviction-rate`. The corner case is when all zones are
|
the normal rate of `--node-eviction-rate`. The corner case is when all zones are
|
||||||
completely unhealthy (none of the nodes in the cluster are healthy). In such a
|
completely unhealthy (none of the nodes in the cluster are healthy). In such a
|
||||||
case, the node controller assumes that there is some problem with connectivity
|
case, the node controller assumes that there is some problem with connectivity
|
||||||
between the control plane and the nodes, and doesn't perform any evictions.
|
between the control plane and the nodes, and doesn't perform any evictions.
|
||||||
|
@ -550,36 +550,36 @@ are emitted under the kubelet subsystem to monitor node shutdowns.
|
||||||
|
|
||||||
{{< feature-state state="beta" for_k8s_version="v1.26" >}}
|
{{< feature-state state="beta" for_k8s_version="v1.26" >}}
|
||||||
|
|
||||||
A node shutdown action may not be detected by kubelet's Node Shutdown Manager,
|
A node shutdown action may not be detected by kubelet's Node Shutdown Manager,
|
||||||
either because the command does not trigger the inhibitor locks mechanism used by
|
either because the command does not trigger the inhibitor locks mechanism used by
|
||||||
kubelet or because of a user error, i.e., the ShutdownGracePeriod and
|
kubelet or because of a user error, i.e., the ShutdownGracePeriod and
|
||||||
ShutdownGracePeriodCriticalPods are not configured properly. Please refer to above
|
ShutdownGracePeriodCriticalPods are not configured properly. Please refer to above
|
||||||
section [Graceful Node Shutdown](#graceful-node-shutdown) for more details.
|
section [Graceful Node Shutdown](#graceful-node-shutdown) for more details.
|
||||||
|
|
||||||
When a node is shutdown but not detected by kubelet's Node Shutdown Manager, the pods
|
When a node is shutdown but not detected by kubelet's Node Shutdown Manager, the pods
|
||||||
that are part of a {{< glossary_tooltip text="StatefulSet" term_id="statefulset" >}} will be stuck in terminating status on
|
that are part of a {{< glossary_tooltip text="StatefulSet" term_id="statefulset" >}} will be stuck in terminating status on
|
||||||
the shutdown node and cannot move to a new running node. This is because kubelet on
|
the shutdown node and cannot move to a new running node. This is because kubelet on
|
||||||
the shutdown node is not available to delete the pods so the StatefulSet cannot
|
the shutdown node is not available to delete the pods so the StatefulSet cannot
|
||||||
create a new pod with the same name. If there are volumes used by the pods, the
|
create a new pod with the same name. If there are volumes used by the pods, the
|
||||||
VolumeAttachments will not be deleted from the original shutdown node so the volumes
|
VolumeAttachments will not be deleted from the original shutdown node so the volumes
|
||||||
used by these pods cannot be attached to a new running node. As a result, the
|
used by these pods cannot be attached to a new running node. As a result, the
|
||||||
application running on the StatefulSet cannot function properly. If the original
|
application running on the StatefulSet cannot function properly. If the original
|
||||||
shutdown node comes up, the pods will be deleted by kubelet and new pods will be
|
shutdown node comes up, the pods will be deleted by kubelet and new pods will be
|
||||||
created on a different running node. If the original shutdown node does not come up,
|
created on a different running node. If the original shutdown node does not come up,
|
||||||
these pods will be stuck in terminating status on the shutdown node forever.
|
these pods will be stuck in terminating status on the shutdown node forever.
|
||||||
|
|
||||||
To mitigate the above situation, a user can manually add the taint `node.kubernetes.io/out-of-service` with either `NoExecute`
|
To mitigate the above situation, a user can manually add the taint `node.kubernetes.io/out-of-service` with either `NoExecute`
|
||||||
or `NoSchedule` effect to a Node marking it out-of-service.
|
or `NoSchedule` effect to a Node marking it out-of-service.
|
||||||
If the `NodeOutOfServiceVolumeDetach`[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
If the `NodeOutOfServiceVolumeDetach`[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||||
is enabled on {{< glossary_tooltip text="kube-controller-manager" term_id="kube-controller-manager" >}}, and a Node is marked out-of-service with this taint, the
|
is enabled on {{< glossary_tooltip text="kube-controller-manager" term_id="kube-controller-manager" >}}, and a Node is marked out-of-service with this taint, the
|
||||||
pods on the node will be forcefully deleted if there are no matching tolerations on it and volume
|
pods on the node will be forcefully deleted if there are no matching tolerations on it and volume
|
||||||
detach operations for the pods terminating on the node will happen immediately. This allows the
|
detach operations for the pods terminating on the node will happen immediately. This allows the
|
||||||
Pods on the out-of-service node to recover quickly on a different node.
|
Pods on the out-of-service node to recover quickly on a different node.
|
||||||
|
|
||||||
During a non-graceful shutdown, Pods are terminated in the two phases:
|
During a non-graceful shutdown, Pods are terminated in the two phases:
|
||||||
|
|
||||||
1. Force delete the Pods that do not have matching `out-of-service` tolerations.
|
1. Force delete the Pods that do not have matching `out-of-service` tolerations.
|
||||||
2. Immediately perform detach volume operation for such pods.
|
2. Immediately perform detach volume operation for such pods.
|
||||||
|
|
||||||
{{< note >}}
|
{{< note >}}
|
||||||
- Before adding the taint `node.kubernetes.io/out-of-service` , it should be verified
|
- Before adding the taint `node.kubernetes.io/out-of-service` , it should be verified
|
||||||
|
@ -641,10 +641,9 @@ see [KEP-2400](https://github.com/kubernetes/enhancements/issues/2400) and its
|
||||||
## {{% heading "whatsnext" %}}
|
## {{% heading "whatsnext" %}}
|
||||||
|
|
||||||
Learn more about the following:
|
Learn more about the following:
|
||||||
* [Components](/docs/concepts/overview/components/#node-components) that make up a node.
|
* [Components](/docs/concepts/overview/components/#node-components) that make up a node.
|
||||||
* [API definition for Node](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#node-v1-core).
|
* [API definition for Node](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#node-v1-core).
|
||||||
* [Node](https://git.k8s.io/design-proposals-archive/architecture/architecture.md#the-kubernetes-node) section of the architecture design document.
|
* [Node](https://git.k8s.io/design-proposals-archive/architecture/architecture.md#the-kubernetes-node) section of the architecture design document.
|
||||||
* [Taints and Tolerations](/docs/concepts/scheduling-eviction/taint-and-toleration/).
|
* [Taints and Tolerations](/docs/concepts/scheduling-eviction/taint-and-toleration/).
|
||||||
* [Node Resource Managers](/docs/concepts/policy/node-resource-managers/).
|
* [Node Resource Managers](/docs/concepts/policy/node-resource-managers/).
|
||||||
* [Resource Management for Windows nodes](/docs/concepts/configuration/windows-resource-management/).
|
* [Resource Management for Windows nodes](/docs/concepts/configuration/windows-resource-management/).
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue