Apply suggestions from code review

Co-authored-by: Shannon Kularathna <ax3shannonkularathna@gmail.com>
Signed-off-by: David Porter <david@porter.me>
pull/35180/head
David Porter 2022-08-08 12:39:08 -07:00 committed by David Porter
parent ecc7ed5a74
commit 9dee6a0491
2 changed files with 95 additions and 108 deletions

View File

@ -1,5 +1,5 @@
---
title: Cgroup V2
title: About cgroup v2
content_type: concept
weight: 50
---
@ -7,126 +7,118 @@ weight: 50
<!-- overview -->
On Linux, {{< glossary_tooltip text="control groups" term_id="cgroup" >}}
are used to constrain resources that are allocated to processes.
constrain resources that are allocated to processes.
{{< glossary_tooltip text="kubelet" term_id="kubelet" >}} and the
underlying container runtime need to interface with control groups to enforce
[resource mangement for pods and
containers](/docs/concepts/configuration/manage-resources-containers/) and set
resources such as cpu/memory requests and limits.
The {{< glossary_tooltip text="kubelet" term_id="kubelet" >}} and the
underlying container runtime need to interface with cgroups to enforce
[resource mangement for pods and containers](/docs/concepts/configuration/manage-resources-containers/) which
includes cpu/memory requests and limits for containerized workloads.
There are two versions of cgroups in linux: cgroupv1 and cgroupv2. Cgroupv2 is
the new generation of the cgroup API.
There are two versions of cgroups in Linux: cgroup v1 and cgroup v2. cgroup v2 is
the new generation of the `cgroup` API.
<!-- body -->
## Cgroup version 2 {#cgroup-v2}
## What is cgroup v2? {#cgroup-v2}
{{< feature-state for_k8s_version="v1.25" state="stable" >}}
Cgroup v2 is the next version of the cgroup Linux API. Cgroup v2 provides a
unified control system, which provides enhanced resource management
cgroup v2 is the next version of the Linux `cgroup` API. cgroup v2 provides a
unified control system with enhanced resource management
capabilities.
The new version offers several improvements over cgroup v1, some of these improvements are:
cgroup v2 offers several improvements over cgroup v1, such as the following:
- cleaner and easier to use API with a unified hierarchy
- safe sub-tree delegation to containers
- newer features like Pressure Stall Information
- enhanced accounting and isolation across multiple resources
- accounting for network memory
- Single unified hierarchy design in API
- Safer sub-tree delegation to containers
- Newer features like [Pressure Stall Information](https://www.kernel.org/doc/html/latest/accounting/psi.html)
- Enhanced resource allocation management and isolation across multiple resources
- Unified accounting for different types of memory allocations (network memory, kernel memory, etc)
- Accounting for non-immediate resource changes such as page cache write backs
Some kubernetes features exclusively rely on on cgroupv2 for enhanced resource
Some Kubernetes features exclusively use cgroup v2 for enhanced resource
management and isolation. For example, the
[MemoryQoS](/blog/2021/11/26/qos-memory-resources/) feature improves memory QoS
and relies on cgroupv2 primitives. New upcoming resource management
capabilities in kubelet will depend on cgroupv2 as well.
and relies on cgroup v2 primitives.
## Using cgroupv2
## Using cgroup v2 {#using-cgroupv2}
To use cgroupv2, it is recommended to use a Linux distribution which enables
cgroupv2 out of the box. Most new modern linux distributions have switched over
to cgroupv2 by default.
The recommended way to use cgroup v2 is to use a Linux distribution that
enables and uses cgroup v2 by default.
To check if your distribution is using cgroupv2, follow the steps [below](#check-cgroup-version).
To check if your distribution uses cgroup v2, refer to [Identify cgroup version on Linux nodes](#check-cgroup-version).
To use cgroupv2 the following requirements must be met:
### Requirements
* OS distribution enables cgroupv2
* Linux Kernel version is >= 5.8
* Container runtime supports cgroupv2
* [containerd](https://containerd.io/) since 1.4
* [cri-o](https://cri-o.io/) since 1.20
* Kubelet and container runtime are configured to use the [systemd cgroup driver](/docs/setup/production-environment/container-runtimes#systemd-cgroup-driver)
cgroup v2 has the following requirements:
### Linux Distribution cgroupv2 support
* OS distribution enables cgroup v2
* Linux Kernel version is 5.8 or later
* Container runtime supports cgroup v2. For example:
* [containerd](https://containerd.io/) v1.4 and later
* [cri-o](https://cri-o.io/) v1.20 and later
* The kubelet and the container runtime are configured to use the [systemd cgroup driver](/docs/setup/production-environment/container-runtimes#systemd-cgroup-driver)
Many Linux Distributions have already switched over to use cgroupv2 by default, for example:
### Linux Distribution cgroup v2 support
For a list of Linux distributions that use cgroup v2, refer to the [cgroup v2 documentation](https://github.com/opencontainers/runc/blob/main/docs/cgroup-v2.md)
<!-- the list should be kept in sync with https://github.com/opencontainers/runc/blob/main/docs/cgroup-v2.md -->
* Container Optimized OS M97
* Container Optimized OS (since M97)
* Ubuntu (since 21.10, 22.04+ recommended)
* Debian GNU/Linux (since Debian 11 buster)
* Debian GNU/Linux (since Debian 11 bullseye)
* Fedora (since 31)
* Arch Linux (since April 2021)
* RHEL and RHEL-like distributions (since 9)
To check if your distribution is using cgroupv2, refer to your distribution's
documentation or follow the steps [below](#check-cgroup-version) to verify the
configuration.
To check if your distribution is using cgroup v2, refer to your distribution's
documentation or follow the instructions in [Identify the cgroup version on Linux nodes](#check-cgroup-version).
You can also enable cgroupv2 manually on your Linux distribution by modifying
the kernel boot arguments in the GRUB command line, and setting
`systemd.unified_cgroup_hierarchy=1`, however it's recommended to use a
distribution that already enables cgroupv2 by default.
You can also enable cgroup v2 manually on your Linux distribution by modifying
the kernel cmdline boot arguments. If your distribution uses GRUB,
`systemd.unified_cgroup_hierarchy=1` should be added in `GRUB_CMDLINE_LINUX`
under `/etc/default/grub`, followed by `sudo update-grub`. However, the
recommended approach is to use a distribution that already enables cgroup v2 by
default.
### Migrating to cgroup v2 {#migrating-cgroupv2}
### Migrating to cgroupv2
To migrate to cgroup v2, ensure that you meet the [requirements](#requirements), then upgrade
to a kernel version that enables cgroup v2 by default.
To migrate to cgroupv2, update to a newer kernel version that enables cgroupv2
by default, ensure your container runtime supports cgroupv2, and configure
kubelet and container runtime are configured to use the [systemd cgroup
driver](/docs/setup/production-environment/container-runtimes#systemd-cgroup-driver).
Kubelet will automatically detect that the OS is running on cgroupv2 and will
perform accordingly, no additional configuration is required.
The kubelet automatically detects that the OS is running on cgroup v2 and
performs accordingly with no additional configuration required.
There should not be any noticeable difference in the user experience when
switching to cgroup v2, unless users are accessing the cgroup file system
directly, either on the node or from within the containers.
Cgroup V2 uses a new API as compared to cgroup V1, so if there are any
cgroup v2 uses a different API than cgroup v1, so if there are any
applications that directly access the cgroup file system, they need to be
updated to newer versions that support cgroupv2. For example:
updated to newer versions that support cgroup v2. For example:
* Some third party monitoring and security agents may be dependent on cgroup filesystem.
Update them to the latest versions that support cgroupv2
* If you are running [cAdvisor](https://github.com/google/cadvisor) as a
daemonset for monitoring pods and containers, update it to latest version (v0.45.0)
* If you use JDK (Java workload), prefer to use JDK 11.0.16 and later or JDK 15
and later, which [fully support
cgroupv2](https://bugs.openjdk.org/browse/JDK-8230305)
* Some third-party monitoring and security agents may depend on the cgroup filesystem.
Update these agents to versions that support cgroup v2.
* If you run [cAdvisor](https://github.com/google/cadvisor) as a stand-alone
DaemonSet for monitoring pods and containers, update it to v0.43.0 or later.
* If you use JDK, prefer to use JDK 11.0.16 and later or JDK 15 and later, which [fully support cgroup v2](https://bugs.openjdk.org/browse/JDK-8230305).
## Identify the cgroup version on Linux Nodes {#check-cgroup-version}
## Identifying cgroup version used on Linux Nodes {#check-cgroup-version}
The cgroup version is dependent on the Linux distribution being used and the
The cgroup version depends on on the Linux distribution being used and the
default cgroup version configured on the OS. To check which cgroup version your
OS Distro is using, you can run the `stat -fc %T /sys/fs/cgroup/` command on
the node and check if the output is `cgroup2fs`:
distribution uses, run the `stat -fc %T /sys/fs/cgroup/` command on
the node:
```shell
# On a cgroupv2 node:
$ stat -fc %T /sys/fs/cgroup/
cgroup2fs
# On a cgroupv1 node:
$ stat -fc %T /sys/fs/cgroup/
tmpfs
stat -fc %T /sys/fs/cgroup/
```
For cgroup v2, the output is `cgroup2fs`.
For cgroup v1, the output is `tmpfs.`
## {{% heading "whatsnext" %}}
- Learn more about [cgroups](https://man7.org/linux/man-pages/man7/cgroups.7.html)

View File

@ -89,12 +89,11 @@ are used to constrain resources that are allocated to processes.
Both {{< glossary_tooltip text="kubelet" term_id="kubelet" >}} and the
underlying container runtime need to interface with control groups to enforce
[resource mangement for pods and
containers](/docs/concepts/configuration/manage-resources-containers/) and set
[resource management for pods and containers](/docs/concepts/configuration/manage-resources-containers/) and set
resources such as cpu/memory requests and limits. To interface with control
groups, kubelet and container runtime need to use a "cgroup driver". It's
critical that both kubelet and the container runtime cgroup driver match and
are configured the same.
groups, the kubelet and the container runtime need to use a *cgroup driver*.
It's critical that the kubelet and the container runtime uses the same cgroup
driver and are configured the same.
There are two cgroup drivers available:
@ -103,15 +102,15 @@ There are two cgroup drivers available:
### cgroupfs driver {#cgroupfs-cgroup-driver}
The `cgroupfs` driver is the default cgroup driver in kubelet. When `cgroupfs`
driver is used, kubelet and the container runtime will directly interface with
The `cgroupfs` driver is the default cgroup driver in the kubelet. When the `cgroupfs`
driver is used, the kubelet and the container runtime directly interface with
the cgroup filesystem to configure cgroups.
The `cgroupfs` is **not** recommended to be used when
[systemd](https://www.freedesktop.org/wiki/Software/systemd/) is choosen as the
init system since systemd expects there to only be a single cgroup manager on
the system. Additionally, if [cgroupv2](/docs/concepts/architecture/cgroups) is
used, it's also recommended to use the `systemd` cgroup driver instead of
The `cgroupfs` driver is **not** recommended when
[systemd](https://www.freedesktop.org/wiki/Software/systemd/) is the
init system because systemd expects a single cgroup manager on
the system. Additionally, if you use [cgroup v2](/docs/concepts/architecture/cgroups)
, use the `systemd` cgroup driver instead of
`cgroupfs`.
### systemd cgroup driver {#systemd-cgroup-driver}
@ -120,39 +119,35 @@ When [systemd](https://www.freedesktop.org/wiki/Software/systemd/) is chosen as
system for a Linux distribution, the init process generates and consumes a root control group
(`cgroup`) and acts as a cgroup manager.
Systemd has a tight integration with cgroups and allocates a cgroup per systemd
unit. As a result, when using `systemd` as the init system, but `cgroupfs`
driver, there will be two different cpu managers on the system which is
undesirable.
systemd has a tight integration with cgroups and allocates a cgroup per systemd
unit. As a result, if you use `systemd` as the init system with the `cgroupfs`
driver, the system gets two different cgroup managers.
A single cgroup manager simplifies the view of what resources are being
allocated and will by default have a more consistent view of the available and
in-use resources. When there are two cgroup managers on a system, you end up
with two views of those resources. In the field, people have reported cases
where nodes that are configured to use `cgroupfs` for the kubelet and container
runtime, but `systemd` for the rest of the processes, become unstable under
resource pressure. Changing the settings such that your container runtime and
kubelet use `systemd` as the cgroup driver stabilized the system.
Two cgroup managers result in two views of the available and in-use resources in
the system. In some cases, nodes that are configured to use `cgroupfs` for the
kubelet and container runtime, but use `systemd` for the rest of the processes become
unstable under resource pressure.
Additionally, if your OS distribution is using [cgroupv2](/docs/concepts/architecture/cgroups), it is highly
recommended to use the `systemd` cgroup driver.
The approach to mitigate this instability is to use `systemd` as the cgroup driver for
the kubelet and the container runtime when systemd is the selected init system.
To set `systemd` as the cgroup driver edit the
To set `systemd` as the cgroup driver, edit the
[`KubeletConfiguration`](/docs/tasks/administer-cluster/kubelet-config-file/)
option of `cgroupDriver` and set it to `systemd`. For example:
```yaml
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
... rest of config ...
...
cgroupDriver: systemd
```
If kubelet is configured with `systemd` as cgroupDriver, the container runtime
must also be configured to use the `systemd` as the cgroup driver. If using
containerd, it can be configured to use systemd cgroup driver as described
[here](#containerd-systemd). [CRI-O](#cri-o) already defaults to systemd cgroup
driver. For other container runtimes, refer to their specific documentation.
If you configure `systemd` as the cgroup driver for the kubelet, you must also
configure `systemd` as the cgroup driver for the container runtime. Refer to
the documentation for your container runtime for instructions. For example:
* [containerd](#containerd-systemd)
* [CRI-O](#cri-o)
{{< caution >}}
Changing the cgroup driver of a Node that has joined a cluster is a sensitive operation.
@ -213,7 +208,7 @@ To use the `systemd` cgroup driver in `/etc/containerd/config.toml` with `runc`,
SystemdCgroup = true
```
`systemd` cgroup driver is recommended to set if using [cgroupv2](/docs/concepts/architecture/cgroups).
The `systemd` cgroup driver is recommended if you use [cgroup v2](/docs/concepts/architecture/cgroups).
{{< note >}}
If you installed containerd from a package (for example, RPM or `.deb`), you may find