Apply suggestions from code review

Co-authored-by: Shannon Kularathna <ax3shannonkularathna@gmail.com>
Signed-off-by: David Porter <david@porter.me>
pull/35180/head
David Porter 2022-08-08 12:39:08 -07:00 committed by David Porter
parent ecc7ed5a74
commit 9dee6a0491
2 changed files with 95 additions and 108 deletions

View File

@ -1,5 +1,5 @@
--- ---
title: Cgroup V2 title: About cgroup v2
content_type: concept content_type: concept
weight: 50 weight: 50
--- ---
@ -7,126 +7,118 @@ weight: 50
<!-- overview --> <!-- overview -->
On Linux, {{< glossary_tooltip text="control groups" term_id="cgroup" >}} On Linux, {{< glossary_tooltip text="control groups" term_id="cgroup" >}}
are used to constrain resources that are allocated to processes. constrain resources that are allocated to processes.
{{< glossary_tooltip text="kubelet" term_id="kubelet" >}} and the The {{< glossary_tooltip text="kubelet" term_id="kubelet" >}} and the
underlying container runtime need to interface with control groups to enforce underlying container runtime need to interface with cgroups to enforce
[resource mangement for pods and [resource mangement for pods and containers](/docs/concepts/configuration/manage-resources-containers/) which
containers](/docs/concepts/configuration/manage-resources-containers/) and set includes cpu/memory requests and limits for containerized workloads.
resources such as cpu/memory requests and limits.
There are two versions of cgroups in linux: cgroupv1 and cgroupv2. Cgroupv2 is There are two versions of cgroups in Linux: cgroup v1 and cgroup v2. cgroup v2 is
the new generation of the cgroup API. the new generation of the `cgroup` API.
<!-- body --> <!-- body -->
## Cgroup version 2 {#cgroup-v2} ## What is cgroup v2? {#cgroup-v2}
{{< feature-state for_k8s_version="v1.25" state="stable" >}} {{< feature-state for_k8s_version="v1.25" state="stable" >}}
Cgroup v2 is the next version of the cgroup Linux API. Cgroup v2 provides a cgroup v2 is the next version of the Linux `cgroup` API. cgroup v2 provides a
unified control system, which provides enhanced resource management unified control system with enhanced resource management
capabilities. capabilities.
The new version offers several improvements over cgroup v1, some of these improvements are: cgroup v2 offers several improvements over cgroup v1, such as the following:
- cleaner and easier to use API with a unified hierarchy - Single unified hierarchy design in API
- safe sub-tree delegation to containers - Safer sub-tree delegation to containers
- newer features like Pressure Stall Information - Newer features like [Pressure Stall Information](https://www.kernel.org/doc/html/latest/accounting/psi.html)
- enhanced accounting and isolation across multiple resources - Enhanced resource allocation management and isolation across multiple resources
- accounting for network memory - Unified accounting for different types of memory allocations (network memory, kernel memory, etc)
- Accounting for non-immediate resource changes such as page cache write backs
Some Kubernetes features exclusively use cgroup v2 for enhanced resource
Some kubernetes features exclusively rely on on cgroupv2 for enhanced resource
management and isolation. For example, the management and isolation. For example, the
[MemoryQoS](/blog/2021/11/26/qos-memory-resources/) feature improves memory QoS [MemoryQoS](/blog/2021/11/26/qos-memory-resources/) feature improves memory QoS
and relies on cgroupv2 primitives. New upcoming resource management and relies on cgroup v2 primitives.
capabilities in kubelet will depend on cgroupv2 as well.
## Using cgroupv2 ## Using cgroup v2 {#using-cgroupv2}
To use cgroupv2, it is recommended to use a Linux distribution which enables The recommended way to use cgroup v2 is to use a Linux distribution that
cgroupv2 out of the box. Most new modern linux distributions have switched over enables and uses cgroup v2 by default.
to cgroupv2 by default.
To check if your distribution is using cgroupv2, follow the steps [below](#check-cgroup-version). To check if your distribution uses cgroup v2, refer to [Identify cgroup version on Linux nodes](#check-cgroup-version).
To use cgroupv2 the following requirements must be met: ### Requirements
* OS distribution enables cgroupv2 cgroup v2 has the following requirements:
* Linux Kernel version is >= 5.8
* Container runtime supports cgroupv2
* [containerd](https://containerd.io/) since 1.4
* [cri-o](https://cri-o.io/) since 1.20
* Kubelet and container runtime are configured to use the [systemd cgroup driver](/docs/setup/production-environment/container-runtimes#systemd-cgroup-driver)
### Linux Distribution cgroupv2 support * OS distribution enables cgroup v2
* Linux Kernel version is 5.8 or later
* Container runtime supports cgroup v2. For example:
* [containerd](https://containerd.io/) v1.4 and later
* [cri-o](https://cri-o.io/) v1.20 and later
* The kubelet and the container runtime are configured to use the [systemd cgroup driver](/docs/setup/production-environment/container-runtimes#systemd-cgroup-driver)
Many Linux Distributions have already switched over to use cgroupv2 by default, for example: ### Linux Distribution cgroup v2 support
For a list of Linux distributions that use cgroup v2, refer to the [cgroup v2 documentation](https://github.com/opencontainers/runc/blob/main/docs/cgroup-v2.md)
<!-- the list should be kept in sync with https://github.com/opencontainers/runc/blob/main/docs/cgroup-v2.md --> <!-- the list should be kept in sync with https://github.com/opencontainers/runc/blob/main/docs/cgroup-v2.md -->
* Container Optimized OS M97 * Container Optimized OS (since M97)
* Ubuntu (since 21.10, 22.04+ recommended) * Ubuntu (since 21.10, 22.04+ recommended)
* Debian GNU/Linux (since Debian 11 buster) * Debian GNU/Linux (since Debian 11 bullseye)
* Fedora (since 31) * Fedora (since 31)
* Arch Linux (since April 2021) * Arch Linux (since April 2021)
* RHEL and RHEL-like distributions (since 9) * RHEL and RHEL-like distributions (since 9)
To check if your distribution is using cgroupv2, refer to your distribution's To check if your distribution is using cgroup v2, refer to your distribution's
documentation or follow the steps [below](#check-cgroup-version) to verify the documentation or follow the instructions in [Identify the cgroup version on Linux nodes](#check-cgroup-version).
configuration.
You can also enable cgroupv2 manually on your Linux distribution by modifying You can also enable cgroup v2 manually on your Linux distribution by modifying
the kernel boot arguments in the GRUB command line, and setting the kernel cmdline boot arguments. If your distribution uses GRUB,
`systemd.unified_cgroup_hierarchy=1`, however it's recommended to use a `systemd.unified_cgroup_hierarchy=1` should be added in `GRUB_CMDLINE_LINUX`
distribution that already enables cgroupv2 by default. under `/etc/default/grub`, followed by `sudo update-grub`. However, the
recommended approach is to use a distribution that already enables cgroup v2 by
default.
### Migrating to cgroup v2 {#migrating-cgroupv2}
### Migrating to cgroupv2 To migrate to cgroup v2, ensure that you meet the [requirements](#requirements), then upgrade
to a kernel version that enables cgroup v2 by default.
To migrate to cgroupv2, update to a newer kernel version that enables cgroupv2 The kubelet automatically detects that the OS is running on cgroup v2 and
by default, ensure your container runtime supports cgroupv2, and configure performs accordingly with no additional configuration required.
kubelet and container runtime are configured to use the [systemd cgroup
driver](/docs/setup/production-environment/container-runtimes#systemd-cgroup-driver).
Kubelet will automatically detect that the OS is running on cgroupv2 and will
perform accordingly, no additional configuration is required.
There should not be any noticeable difference in the user experience when There should not be any noticeable difference in the user experience when
switching to cgroup v2, unless users are accessing the cgroup file system switching to cgroup v2, unless users are accessing the cgroup file system
directly, either on the node or from within the containers. directly, either on the node or from within the containers.
Cgroup V2 uses a new API as compared to cgroup V1, so if there are any cgroup v2 uses a different API than cgroup v1, so if there are any
applications that directly access the cgroup file system, they need to be applications that directly access the cgroup file system, they need to be
updated to newer versions that support cgroupv2. For example: updated to newer versions that support cgroup v2. For example:
* Some third party monitoring and security agents may be dependent on cgroup filesystem. * Some third-party monitoring and security agents may depend on the cgroup filesystem.
Update them to the latest versions that support cgroupv2 Update these agents to versions that support cgroup v2.
* If you are running [cAdvisor](https://github.com/google/cadvisor) as a * If you run [cAdvisor](https://github.com/google/cadvisor) as a stand-alone
daemonset for monitoring pods and containers, update it to latest version (v0.45.0) DaemonSet for monitoring pods and containers, update it to v0.43.0 or later.
* If you use JDK (Java workload), prefer to use JDK 11.0.16 and later or JDK 15 * If you use JDK, prefer to use JDK 11.0.16 and later or JDK 15 and later, which [fully support cgroup v2](https://bugs.openjdk.org/browse/JDK-8230305).
and later, which [fully support
cgroupv2](https://bugs.openjdk.org/browse/JDK-8230305)
## Identify the cgroup version on Linux Nodes {#check-cgroup-version}
## Identifying cgroup version used on Linux Nodes {#check-cgroup-version} The cgroup version depends on on the Linux distribution being used and the
The cgroup version is dependent on the Linux distribution being used and the
default cgroup version configured on the OS. To check which cgroup version your default cgroup version configured on the OS. To check which cgroup version your
OS Distro is using, you can run the `stat -fc %T /sys/fs/cgroup/` command on distribution uses, run the `stat -fc %T /sys/fs/cgroup/` command on
the node and check if the output is `cgroup2fs`: the node:
```shell ```shell
# On a cgroupv2 node: stat -fc %T /sys/fs/cgroup/
$ stat -fc %T /sys/fs/cgroup/
cgroup2fs
# On a cgroupv1 node:
$ stat -fc %T /sys/fs/cgroup/
tmpfs
``` ```
For cgroup v2, the output is `cgroup2fs`.
For cgroup v1, the output is `tmpfs.`
## {{% heading "whatsnext" %}} ## {{% heading "whatsnext" %}}
- Learn more about [cgroups](https://man7.org/linux/man-pages/man7/cgroups.7.html) - Learn more about [cgroups](https://man7.org/linux/man-pages/man7/cgroups.7.html)

View File

@ -89,12 +89,11 @@ are used to constrain resources that are allocated to processes.
Both {{< glossary_tooltip text="kubelet" term_id="kubelet" >}} and the Both {{< glossary_tooltip text="kubelet" term_id="kubelet" >}} and the
underlying container runtime need to interface with control groups to enforce underlying container runtime need to interface with control groups to enforce
[resource mangement for pods and [resource management for pods and containers](/docs/concepts/configuration/manage-resources-containers/) and set
containers](/docs/concepts/configuration/manage-resources-containers/) and set
resources such as cpu/memory requests and limits. To interface with control resources such as cpu/memory requests and limits. To interface with control
groups, kubelet and container runtime need to use a "cgroup driver". It's groups, the kubelet and the container runtime need to use a *cgroup driver*.
critical that both kubelet and the container runtime cgroup driver match and It's critical that the kubelet and the container runtime uses the same cgroup
are configured the same. driver and are configured the same.
There are two cgroup drivers available: There are two cgroup drivers available:
@ -103,15 +102,15 @@ There are two cgroup drivers available:
### cgroupfs driver {#cgroupfs-cgroup-driver} ### cgroupfs driver {#cgroupfs-cgroup-driver}
The `cgroupfs` driver is the default cgroup driver in kubelet. When `cgroupfs` The `cgroupfs` driver is the default cgroup driver in the kubelet. When the `cgroupfs`
driver is used, kubelet and the container runtime will directly interface with driver is used, the kubelet and the container runtime directly interface with
the cgroup filesystem to configure cgroups. the cgroup filesystem to configure cgroups.
The `cgroupfs` is **not** recommended to be used when The `cgroupfs` driver is **not** recommended when
[systemd](https://www.freedesktop.org/wiki/Software/systemd/) is choosen as the [systemd](https://www.freedesktop.org/wiki/Software/systemd/) is the
init system since systemd expects there to only be a single cgroup manager on init system because systemd expects a single cgroup manager on
the system. Additionally, if [cgroupv2](/docs/concepts/architecture/cgroups) is the system. Additionally, if you use [cgroup v2](/docs/concepts/architecture/cgroups)
used, it's also recommended to use the `systemd` cgroup driver instead of , use the `systemd` cgroup driver instead of
`cgroupfs`. `cgroupfs`.
### systemd cgroup driver {#systemd-cgroup-driver} ### systemd cgroup driver {#systemd-cgroup-driver}
@ -120,39 +119,35 @@ When [systemd](https://www.freedesktop.org/wiki/Software/systemd/) is chosen as
system for a Linux distribution, the init process generates and consumes a root control group system for a Linux distribution, the init process generates and consumes a root control group
(`cgroup`) and acts as a cgroup manager. (`cgroup`) and acts as a cgroup manager.
Systemd has a tight integration with cgroups and allocates a cgroup per systemd systemd has a tight integration with cgroups and allocates a cgroup per systemd
unit. As a result, when using `systemd` as the init system, but `cgroupfs` unit. As a result, if you use `systemd` as the init system with the `cgroupfs`
driver, there will be two different cpu managers on the system which is driver, the system gets two different cgroup managers.
undesirable.
A single cgroup manager simplifies the view of what resources are being Two cgroup managers result in two views of the available and in-use resources in
allocated and will by default have a more consistent view of the available and the system. In some cases, nodes that are configured to use `cgroupfs` for the
in-use resources. When there are two cgroup managers on a system, you end up kubelet and container runtime, but use `systemd` for the rest of the processes become
with two views of those resources. In the field, people have reported cases unstable under resource pressure.
where nodes that are configured to use `cgroupfs` for the kubelet and container
runtime, but `systemd` for the rest of the processes, become unstable under
resource pressure. Changing the settings such that your container runtime and
kubelet use `systemd` as the cgroup driver stabilized the system.
Additionally, if your OS distribution is using [cgroupv2](/docs/concepts/architecture/cgroups), it is highly The approach to mitigate this instability is to use `systemd` as the cgroup driver for
recommended to use the `systemd` cgroup driver. the kubelet and the container runtime when systemd is the selected init system.
To set `systemd` as the cgroup driver edit the To set `systemd` as the cgroup driver, edit the
[`KubeletConfiguration`](/docs/tasks/administer-cluster/kubelet-config-file/) [`KubeletConfiguration`](/docs/tasks/administer-cluster/kubelet-config-file/)
option of `cgroupDriver` and set it to `systemd`. For example: option of `cgroupDriver` and set it to `systemd`. For example:
```yaml ```yaml
apiVersion: kubelet.config.k8s.io/v1beta1 apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration kind: KubeletConfiguration
... rest of config ... ...
cgroupDriver: systemd cgroupDriver: systemd
``` ```
If kubelet is configured with `systemd` as cgroupDriver, the container runtime If you configure `systemd` as the cgroup driver for the kubelet, you must also
must also be configured to use the `systemd` as the cgroup driver. If using configure `systemd` as the cgroup driver for the container runtime. Refer to
containerd, it can be configured to use systemd cgroup driver as described the documentation for your container runtime for instructions. For example:
[here](#containerd-systemd). [CRI-O](#cri-o) already defaults to systemd cgroup
driver. For other container runtimes, refer to their specific documentation. * [containerd](#containerd-systemd)
* [CRI-O](#cri-o)
{{< caution >}} {{< caution >}}
Changing the cgroup driver of a Node that has joined a cluster is a sensitive operation. Changing the cgroup driver of a Node that has joined a cluster is a sensitive operation.
@ -213,7 +208,7 @@ To use the `systemd` cgroup driver in `/etc/containerd/config.toml` with `runc`,
SystemdCgroup = true SystemdCgroup = true
``` ```
`systemd` cgroup driver is recommended to set if using [cgroupv2](/docs/concepts/architecture/cgroups). The `systemd` cgroup driver is recommended if you use [cgroup v2](/docs/concepts/architecture/cgroups).
{{< note >}} {{< note >}}
If you installed containerd from a package (for example, RPM or `.deb`), you may find If you installed containerd from a package (for example, RPM or `.deb`), you may find