From 4568e6d9a403242dd8b8acc46b64bef30cfb2882 Mon Sep 17 00:00:00 2001 From: Itamar Holder Date: Thu, 3 Aug 2023 21:24:29 +0300 Subject: [PATCH 1/2] Document NodeSwap graduation to Beta1 Signed-off-by: Itamar Holder --- .../reference/command-line-tools-reference/feature-gates.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index 820cb088dc..59d6df201f 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -153,7 +153,8 @@ For a reference to old feature gates that are removed, please refer to | `NodeInclusionPolicyInPodTopologySpread` | `false` | Alpha | 1.25 | 1.25 | | `NodeInclusionPolicyInPodTopologySpread` | `true` | Beta | 1.26 | | | `NodeLogQuery` | `false` | Alpha | 1.27 | | -| `NodeSwap` | `false` | Alpha | 1.22 | | +| `NodeSwap` | `false` | Alpha | 1.22 | 1.27 | +| `NodeSwap` | `false` | Beta1 | 1.28 | | | `OpenAPIEnums` | `false` | Alpha | 1.23 | 1.23 | | `OpenAPIEnums` | `true` | Beta | 1.24 | | | `PDBUnhealthyPodEvictionPolicy` | `false` | Alpha | 1.26 | 1.26 | From b56c82c714283df3239d254c8eeb85bc85571ed5 Mon Sep 17 00:00:00 2001 From: Itamar Holder Date: Wed, 19 Jul 2023 18:03:40 +0300 Subject: [PATCH 2/2] Update content/en/docs/concepts/architecture/nodes.md Signed-off-by: Itamar Holder --- .../en/docs/concepts/architecture/nodes.md | 43 +++++++++++-------- 1 file changed, 25 insertions(+), 18 deletions(-) diff --git a/content/en/docs/concepts/architecture/nodes.md b/content/en/docs/concepts/architecture/nodes.md index 370f58ac16..4f2f7983c9 100644 --- a/content/en/docs/concepts/architecture/nodes.md +++ b/content/en/docs/concepts/architecture/nodes.md @@ -617,11 +617,7 @@ During a non-graceful shutdown, Pods are terminated in the two phases: ## Swap memory management {#swap-memory} -{{< feature-state state="alpha" for_k8s_version="v1.22" >}} - -Prior to Kubernetes 1.22, nodes did not support the use of swap memory, and a -kubelet would by default fail to start if swap was detected on a node. In 1.22 -onwards, swap memory support can be enabled on a per-node basis. +{{< feature-state state="beta" for_k8s_version="v1.28" >}} To enable swap on a node, the `NodeSwap` feature gate must be enabled on the kubelet, and the `--fail-swap-on` command line flag or `failSwapOn` @@ -638,29 +634,40 @@ specify how a node will use swap memory. For example, ```yaml memorySwap: - swapBehavior: LimitedSwap + swapBehavior: UnlimitedSwap ``` -The available configuration options for `swapBehavior` are: - -- `LimitedSwap`: Kubernetes workloads are limited in how much swap they can - use. Workloads on the node not managed by Kubernetes can still swap. -- `UnlimitedSwap`: Kubernetes workloads can use as much swap memory as they +- `UnlimitedSwap` (default): Kubernetes workloads can use as much swap memory as they request, up to the system limit. +- `LimitedSwap`: The utilization of swap memory by Kubernetes workloads is subject to limitations. Only Pods of Burstable QoS are permitted to employ swap. If configuration for `memorySwap` is not specified and the feature gate is enabled, by default the kubelet will apply the same behaviour as the -`LimitedSwap` setting. +`UnlimitedSwap` setting. -The behaviour of the `LimitedSwap` setting depends if the node is running with -v1 or v2 of control groups (also known as "cgroups"): +With `LimitedSwap`, Pods that do not fall under the Burstable QoS classification (i.e. +`BestEffort`/`Guaranteed` Qos Pods) are prohibited from utilizing swap memory. +To maintain the aforementioned security and node +health guarantees, these Pods are not permitted to use swap memory when `LimitedSwap` is +in effect. -- **cgroupsv1:** Kubernetes workloads can use any combination of memory and - swap, up to the pod's memory limit, if set. -- **cgroupsv2:** Kubernetes workloads cannot use swap memory. +Prior to detailing the calculation of the swap limit, it is necessary to define the following terms: +* `nodeTotalMemory`: The total amount of physical memory available on the node. +* `totalPodsSwapAvailable`: The total amount of swap memory on the node that is available for use by Pods (some swap memory may be reserved for system use). +* `containerMemoryRequest`: The container's memory request. + +Swap limitation is configured as: +`(containerMemoryRequest / nodeTotalMemory) * totalPodsSwapAvailable`. + +It is important to note that, for containers within Burstable QoS Pods, it is possible to +opt-out of swap usage by specifying memory requests that are equal to memory limits. +Containers configured in this manner will not have access to swap memory. + +Swap is supported only with **cgroup v2**, cgroup v1 is not supported. For more information, and to assist with testing and provide feedback, please -see [KEP-2400](https://github.com/kubernetes/enhancements/issues/2400) and its +see the blog-post about [Kubernetes 1.28: NodeSwap graduates to Beta1](/blog/2023/07/18/swap-beta1-1.28-2023/), +[KEP-2400](https://github.com/kubernetes/enhancements/issues/4128) and its [design proposal](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md). ## {{% heading "whatsnext" %}}