The PodOverhead feature is GA

2022-03-18 08:46:40 +08:00 · 2022-03-18 08:46:40 +08:00 · 0bc8468bfa
parent d200945499
commit 0bc8468bfa
4 changed files with 89 additions and 80 deletions
--- a/content/en/docs/concepts/containers/runtime-class.md
+++ b/content/en/docs/concepts/containers/runtime-class.md
@ -59,12 +59,15 @@ The RuntimeClass resource currently only has 2 significant fields: the RuntimeCl
 (`metadata.name`) and the handler (`handler`). The object definition looks like this:

 ```yaml
-apiVersion: node.k8s.io/v1 # RuntimeClass is defined in the node.k8s.io API group
+# RuntimeClass is defined in the node.k8s.io API group
+apiVersion: node.k8s.io/v1
 kind: RuntimeClass
 metadata:
-  name: myclass # The name the RuntimeClass will be referenced by
-  # RuntimeClass is a non-namespaced resource
-handler: myconfiguration # The name of the corresponding CRI configuration
+  # The name the RuntimeClass will be referenced by.
+  # RuntimeClass is a non-namespaced resource.
+  name: myclass 
+# The name of the corresponding CRI configuration
+handler: myconfiguration 
 ```

 The name of a RuntimeClass object must be a valid
@ -72,14 +75,14 @@ The name of a RuntimeClass object must be a valid

 {{< note >}}
 It is recommended that RuntimeClass write operations (create/update/patch/delete) be
-restricted to the cluster administrator. This is typically the default. See [Authorization
-Overview](/docs/reference/access-authn-authz/authorization/) for more details.
+restricted to the cluster administrator. This is typically the default. See
+[Authorization Overview](/docs/reference/access-authn-authz/authorization/) for more details.
 {{< /note >}}

 ## Usage

-Once RuntimeClasses are configured for the cluster, using them is very simple. Specify a
-`runtimeClassName` in the Pod spec. For example:
+Once RuntimeClasses are configured for the cluster, you can specify a
+`runtimeClassName` in the Pod spec to use it. For example:

 ```yaml
 apiVersion: v1
@ -113,14 +116,14 @@ Runtime handlers are configured through containerd's configuration at
 [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.${HANDLER_NAME}]
 ```

-See containerd's config documentation for more details:
-https://github.com/containerd/cri/blob/master/docs/config.md
+See containerd's [config documentation](https://github.com/containerd/cri/blob/master/docs/config.md)
+for more details:

 #### {{< glossary_tooltip term_id="cri-o" >}}

 Runtime handlers are configured through CRI-O's configuration at `/etc/crio/crio.conf`. Valid
-handlers are configured under the [crio.runtime
-table](https://github.com/cri-o/cri-o/blob/master/docs/crio.conf.5.md#crioruntime-table):
+handlers are configured under the
+[crio.runtime table](https://github.com/cri-o/cri-o/blob/master/docs/crio.conf.5.md#crioruntime-table):

 ```
 [crio.runtime.runtimes.${HANDLER_NAME}]
@ -148,19 +151,17 @@ can add `tolerations` to the RuntimeClass. As with the `nodeSelector`, the toler
 with the pod's tolerations in admission, effectively taking the union of the set of nodes tolerated
 by each.

-To learn more about configuring the node selector and tolerations, see [Assigning Pods to
-Nodes](/docs/concepts/scheduling-eviction/assign-pod-node/).
+To learn more about configuring the node selector and tolerations, see
+[Assigning Pods to Nodes](/docs/concepts/scheduling-eviction/assign-pod-node/).

 ### Pod Overhead

-{{< feature-state for_k8s_version="v1.18" state="beta" >}}
+{{< feature-state for_k8s_version="v1.24" state="stable" >}}

 You can specify _overhead_ resources that are associated with running a Pod. Declaring overhead allows
 the cluster (including the scheduler) to account for it when making decisions about Pods and resources.
-To use Pod overhead, you must have the PodOverhead [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
-enabled (it is on by default).

-Pod overhead is defined in RuntimeClass through the `overhead` fields. Through the use of these fields,
+Pod overhead is defined in RuntimeClass through the `overhead` field. Through the use of this field,
 you can specify the overhead of running pods utilizing this RuntimeClass and ensure these overheads
 are accounted for in Kubernetes.

@ -170,3 +171,4 @@ are accounted for in Kubernetes.
 - [RuntimeClass Scheduling Design](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/585-runtime-class/README.md#runtimeclass-scheduling)
 - Read about the [Pod Overhead](/docs/concepts/scheduling-eviction/pod-overhead/) concept
 - [PodOverhead Feature Design](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/688-pod-overhead)
+
--- a/content/en/docs/concepts/scheduling-eviction/pod-overhead.md
+++ b/content/en/docs/concepts/scheduling-eviction/pod-overhead.md
@ -10,17 +10,12 @@ weight: 30

 <!-- overview -->

-{{< feature-state for_k8s_version="v1.18" state="beta" >}}
-
+{{< feature-state for_k8s_version="v1.24" state="stable" >}}

 When you run a Pod on a Node, the Pod itself takes an amount of system resources. These
 resources are additional to the resources needed to run the container(s) inside the Pod.
-_Pod Overhead_ is a feature for accounting for the resources consumed by the Pod infrastructure
-on top of the container requests & limits.
-
-
-
-
+In Kubernetes, _Pod Overhead_ is a way to account for the resources consumed by the Pod
+infrastructure on top of the container requests & limits.

 <!-- body -->

@ -29,33 +24,30 @@ In Kubernetes, the Pod's overhead is set at
 time according to the overhead associated with the Pod's
 [RuntimeClass](/docs/concepts/containers/runtime-class/).

-When Pod Overhead is enabled, the overhead is considered in addition to the sum of container
-resource requests when scheduling a Pod. Similarly, the kubelet will include the Pod overhead when sizing
-the Pod cgroup, and when carrying out Pod eviction ranking.
+A pod's overhead is considered in addition to the sum of container resource requests when
+scheduling a Pod. Similarly, the kubelet will include the Pod overhead when sizing the Pod cgroup,
+and when carrying out Pod eviction ranking.

-## Enabling Pod Overhead {#set-up}
+## Configuring Pod overhead {#set-up}

-You need to make sure that the `PodOverhead`
-[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled (it is on by default as of 1.18)
-across your cluster, and a `RuntimeClass` is utilized which defines the `overhead` field.
+You need to make sure a `RuntimeClass` is utilized which defines the `overhead` field.

 ## Usage example

-To use the PodOverhead feature, you need a RuntimeClass that defines the `overhead` field. As
-an example, you could use the following RuntimeClass definition with a virtualizing container runtime
-that uses around 120MiB per Pod for the virtual machine and the guest OS:
+To work with Pod overhead, you need a RuntimeClass that defines the `overhead` field. As
+an example, you could use the following RuntimeClass definition with a virtualization container
+runtime that uses around 120MiB per Pod for the virtual machine and the guest OS:

 ```yaml
---
-kind: RuntimeClass
 apiVersion: node.k8s.io/v1
+kind: RuntimeClass
 metadata:
-    name: kata-fc
+  name: kata-fc
 handler: kata-fc
 overhead:
-    podFixed:
-        memory: "120Mi"
-        cpu: "250m"
+  podFixed:
+    memory: "120Mi"
+    cpu: "250m"
 ```

 Workloads which are created which specify the `kata-fc` RuntimeClass handler will take the memory and
@ -92,13 +84,15 @@ updates the workload's PodSpec to include the `overhead` as described in the Run
 the Pod will be rejected. In the given example, since only the RuntimeClass name is specified, the admission controller mutates the Pod
 to include an `overhead`.

-After the RuntimeClass admission controller, you can check the updated PodSpec:
+After the RuntimeClass admission controller has made modifications, you can check the updated
+Pod overhead value:

 ```bash
 kubectl get pod test-pod -o jsonpath='{.spec.overhead}'
 ```

 The output is:
+
 ```
 map[cpu:250m memory:120Mi]
 ```
@ -110,44 +104,50 @@ When the kube-scheduler is deciding which node should run a new Pod, the schedul
 `overhead` as well as the sum of container requests for that Pod. For this example, the scheduler adds the
 requests and the overhead, then looks for a node that has 2.25 CPU and 320 MiB of memory available.

-Once a Pod is scheduled to a node, the kubelet on that node creates a new {{< glossary_tooltip text="cgroup" term_id="cgroup" >}}
-for the Pod. It is within this pod that the underlying container runtime will create containers.
+Once a Pod is scheduled to a node, the kubelet on that node creates a new {{< glossary_tooltip
+text="cgroup" term_id="cgroup" >}} for the Pod. It is within this pod that the underlying
+container runtime will create containers.

 If the resource has a limit defined for each container (Guaranteed QoS or Bustrable QoS with limits defined),
 the kubelet will set an upper limit for the pod cgroup associated with that resource (cpu.cfs_quota_us for CPU
 and memory.limit_in_bytes memory). This upper limit is based on the sum of the container limits plus the `overhead`
 defined in the PodSpec.

-For CPU, if the Pod is Guaranteed or Burstable QoS, the kubelet will set `cpu.shares` based on the sum of container
-requests plus the `overhead` defined in the PodSpec.
+For CPU, if the Pod is Guaranteed or Burstable QoS, the kubelet will set `cpu.shares` based on the
+sum of container requests plus the `overhead` defined in the PodSpec.

 Looking at our example, verify the container requests for the workload:
+
 ```bash
 kubectl get pod test-pod -o jsonpath='{.spec.containers[*].resources.limits}'
 ```

 The total container requests are 2000m CPU and 200MiB of memory:
+
 ```
 map[cpu: 500m memory:100Mi] map[cpu:1500m memory:100Mi]
 ```

 Check this against what is observed by the node:
+
 ```bash
 kubectl describe node | grep test-pod -B2
 ```

-The output shows 2250m CPU and 320MiB of memory are requested, which includes PodOverhead:
+The output shows requests for 2250m CPU, and for 320MiB of memory. The requests include Pod overhead:
+
 ```
-  Namespace                   Name                CPU Requests  CPU Limits   Memory Requests  Memory Limits  AGE
-  ---------                   ----                ------------  ----------   ---------------  -------------  ---
-  default                     test-pod            2250m (56%)   2250m (56%)  320Mi (1%)       320Mi (1%)     36m
+  Namespace    Name       CPU Requests  CPU Limits   Memory Requests  Memory Limits  AGE
+  ---------    ----       ------------  ----------   ---------------  -------------  ---
+  default      test-pod   2250m (56%)   2250m (56%)  320Mi (1%)       320Mi (1%)     36m
 ```

 ## Verify Pod cgroup limits

-Check the Pod's memory cgroups on the node where the workload is running. In the following example, [`crictl`](https://github.com/kubernetes-sigs/cri-tools/blob/master/docs/crictl.md)
+Check the Pod's memory cgroups on the node where the workload is running. In the following example,
+[`crictl`](https://github.com/kubernetes-sigs/cri-tools/blob/master/docs/crictl.md)
 is used on the node, which provides a CLI for CRI-compatible container runtimes. This is an
-advanced example to show PodOverhead behavior, and it is not expected that users should need to check
+advanced example to show Pod overhead behavior, and it is not expected that users should need to check
 cgroups directly on the node.

 First, on the particular node, determine the Pod identifier:
@ -158,17 +158,21 @@ POD_ID="$(sudo crictl pods --name test-pod -q)"
 ```

 From this, you can determine the cgroup path for the Pod:
+
 ```bash
 # Run this on the node where the Pod is scheduled
 sudo crictl inspectp -o=json $POD_ID | grep cgroupsPath
 ```

 The resulting cgroup path includes the Pod's `pause` container. The Pod level cgroup is one directory above.
+
 ```
-        "cgroupsPath": "/kubepods/podd7f4b509-cf94-4951-9417-d1087c92a5b2/7ccf55aee35dd16aca4189c952d83487297f3cd760f1bbf09620e206e7d0c27a"
+  "cgroupsPath": "/kubepods/podd7f4b509-cf94-4951-9417-d1087c92a5b2/7ccf55aee35dd16aca4189c952d83487297f3cd760f1bbf09620e206e7d0c27a"
 ```

-In this specific case, the pod cgroup path is `kubepods/podd7f4b509-cf94-4951-9417-d1087c92a5b2`. Verify the Pod level cgroup setting for memory:
+In this specific case, the pod cgroup path is `kubepods/podd7f4b509-cf94-4951-9417-d1087c92a5b2`.
+Verify the Pod level cgroup setting for memory:
+
 ```bash
 # Run this on the node where the Pod is scheduled.
 # Also, change the name of the cgroup to match the cgroup allocated for your pod.
@ -176,22 +180,20 @@ In this specific case, the pod cgroup path is `kubepods/podd7f4b509-cf94-4951-94
 ```

 This is 320 MiB, as expected:
+
 ```
 335544320
 ```

 ### Observability

-A `kube_pod_overhead` metric is available in [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics)
-to help identify when PodOverhead is being utilized and to help observe stability of workloads
-running with a defined Overhead. This functionality is not available in the 1.9 release of
-kube-state-metrics, but is expected in a following release. Users will need to build kube-state-metrics
-from source in the meantime.
-
-
+Some `kube_pod_overhead_*` metrics are available in [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics)
+to help identify when Pod overhead is being utilized and to help observe stability of workloads
+running with a defined overhead.

 ## {{% heading "whatsnext" %}}

+* Learn more about [RuntimeClass](/docs/concepts/containers/runtime-class/)
+* Read the [PodOverhead Design](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/688-pod-overhead)
+  enhancement proposal for extra context

-* [RuntimeClass](/docs/concepts/containers/runtime-class/)
-* [PodOverhead Design](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/688-pod-overhead)
--- a/content/en/docs/reference/access-authn-authz/admission-controllers.md
+++ b/content/en/docs/reference/access-authn-authz/admission-controllers.md
@ -666,6 +666,7 @@ plugins:
 {{< /tabs >}}

 #### Configuration Annotation Format
+
 `PodNodeSelector` uses the annotation key `scheduler.alpha.kubernetes.io/node-selector` to assign node selectors to namespaces.

 ```yaml
@ -678,6 +679,7 @@ metadata:
 ```

 #### Internal Behavior
+
 This admission controller has the following behavior:

 1. If the `Namespace` has an annotation with a key `scheduler.alpha.kubernetes.io/node-selector`, use its value as the
@ -746,7 +748,8 @@ metadata:

 ### Priority {#priority}

-The priority admission controller uses the `priorityClassName` field and populates the integer value of the priority. If the priority class is not found, the Pod is rejected.
+The priority admission controller uses the `priorityClassName` field and populates the integer value of the priority.
+If the priority class is not found, the Pod is rejected.

 ### ResourceQuota {#resourcequota}

@ -754,19 +757,20 @@ This admission controller will observe the incoming request and ensure that it d
 enumerated in the `ResourceQuota` object in a `Namespace`.  If you are using `ResourceQuota`
 objects in your Kubernetes deployment, you MUST use this admission controller to enforce quota constraints.

-See the [resourceQuota design doc](https://git.k8s.io/community/contributors/design-proposals/resource-management/admission_control_resource_quota.md) and the [example of Resource Quota](/docs/concepts/policy/resource-quotas/) for more details.
+See the [resourceQuota design doc](https://git.k8s.io/community/contributors/design-proposals/resource-management/admission_control_resource_quota.md)
+and the [example of Resource Quota](/docs/concepts/policy/resource-quotas/) for more details.

 ### RuntimeClass {#runtimeclass}

 {{< feature-state for_k8s_version="v1.20" state="stable" >}}

-If you enable the `PodOverhead` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/), and define a RuntimeClass with [Pod overhead](/docs/concepts/scheduling-eviction/pod-overhead/) configured, this admission controller checks incoming
-Pods. When enabled, this admission controller rejects any Pod create requests that have the overhead already set.
-For Pods that have a  RuntimeClass is configured and selected in their `.spec`, this admission controller sets `.spec.overhead` in the Pod based on the value defined in the corresponding RuntimeClass.
-
-{{< note >}}
-The `.spec.overhead` field for Pod and the `.overhead` field for RuntimeClass are both in beta. If you do not enable the `PodOverhead` feature gate, all Pods are treated as if `.spec.overhead` is unset.
-{{< /note >}}
+If you define a RuntimeClass with [Pod overhead](/docs/concepts/scheduling-eviction/pod-overhead/)
+configured, this admission controller checks incoming Pods.
+When enabled, this admission controller rejects any Pod create requests
+that have the overhead already set.
+For Pods that have a RuntimeClass configured and selected in their `.spec`,
+this admission controller sets `.spec.overhead` in the Pod based on the value
+defined in the corresponding RuntimeClass.

 See also [Pod Overhead](/docs/concepts/scheduling-eviction/pod-overhead/)
 for more information.
@ -823,11 +827,11 @@ If you disable the ValidatingAdmissionWebhook, you must also disable the
 group/version via the `--runtime-config` flag (both are on by default in
 versions 1.9 and later).

-
 ## Is there a recommended set of admission controllers to use?

-Yes. The recommended admission controllers are enabled by default (shown [here](/docs/reference/command-line-tools-reference/kube-apiserver/#options)), so you do not need to explicitly specify them. You can enable additional admission controllers beyond the default set using the `--enable-admission-plugins` flag (**order doesn't matter**).
+Yes. The recommended admission controllers are enabled by default
+(shown [here](/docs/reference/command-line-tools-reference/kube-apiserver/#options)),
+so you do not need to explicitly specify them.
+You can enable additional admission controllers beyond the default set using the
+`--enable-admission-plugins` flag (**order doesn't matter**).

-{{< note >}}
-`--admission-control` was deprecated in 1.10 and replaced with `--enable-admission-plugins`.
-{{< /note >}}
--- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md
+++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md
@ -163,8 +163,6 @@ different Kubernetes components.
 | `PodAndContainerStatsFromCRI` | `false` | Alpha | 1.23 | |
 | `PodDeletionCost` | `false` | Alpha | 1.21 | 1.21 |
 | `PodDeletionCost` | `true` | Beta | 1.22 | |
-| `PodOverhead` | `false` | Alpha | 1.16 | 1.17 |
-| `PodOverhead` | `true` | Beta | 1.18 | |
 | `PodSecurity` | `false` | Alpha | 1.22 | 1.22 |
 | `PodSecurity` | `true` | Beta | 1.23 | |
 | `ProbeTerminationGracePeriod` | `false` | Alpha | 1.21 | 1.21 |
@ -411,6 +409,9 @@ different Kubernetes components.
 | `PodDisruptionBudget` | `false` | Alpha | 1.3 | 1.4 |
 | `PodDisruptionBudget` | `true` | Beta | 1.5 | 1.20 |
 | `PodDisruptionBudget` | `true` | GA | 1.21 | - |
+| `PodOverhead` | `false` | Alpha | 1.16 | 1.17 |
+| `PodOverhead` | `true` | Beta | 1.18 | 1.23 |
+| `PodOverhead` | `true` | GA | 1.24 | - |
 | `PodPriority` | `false` | Alpha | 1.8 | 1.10 |
 | `PodPriority` | `true` | Beta | 1.11 | 1.13 |
 | `PodPriority` | `true` | GA | 1.14 | - |