Apply suggestions from code review
Co-authored-by: Shannon Kularathna <ax3shannonkularathna@gmail.com> Signed-off-by: David Porter <david@porter.me>pull/35180/head
parent
ecc7ed5a74
commit
9dee6a0491
|
@ -1,5 +1,5 @@
|
||||||
---
|
---
|
||||||
title: Cgroup V2
|
title: About cgroup v2
|
||||||
content_type: concept
|
content_type: concept
|
||||||
weight: 50
|
weight: 50
|
||||||
---
|
---
|
||||||
|
@ -7,126 +7,118 @@ weight: 50
|
||||||
<!-- overview -->
|
<!-- overview -->
|
||||||
|
|
||||||
On Linux, {{< glossary_tooltip text="control groups" term_id="cgroup" >}}
|
On Linux, {{< glossary_tooltip text="control groups" term_id="cgroup" >}}
|
||||||
are used to constrain resources that are allocated to processes.
|
constrain resources that are allocated to processes.
|
||||||
|
|
||||||
{{< glossary_tooltip text="kubelet" term_id="kubelet" >}} and the
|
The {{< glossary_tooltip text="kubelet" term_id="kubelet" >}} and the
|
||||||
underlying container runtime need to interface with control groups to enforce
|
underlying container runtime need to interface with cgroups to enforce
|
||||||
[resource mangement for pods and
|
[resource mangement for pods and containers](/docs/concepts/configuration/manage-resources-containers/) which
|
||||||
containers](/docs/concepts/configuration/manage-resources-containers/) and set
|
includes cpu/memory requests and limits for containerized workloads.
|
||||||
resources such as cpu/memory requests and limits.
|
|
||||||
|
|
||||||
There are two versions of cgroups in linux: cgroupv1 and cgroupv2. Cgroupv2 is
|
There are two versions of cgroups in Linux: cgroup v1 and cgroup v2. cgroup v2 is
|
||||||
the new generation of the cgroup API.
|
the new generation of the `cgroup` API.
|
||||||
|
|
||||||
<!-- body -->
|
<!-- body -->
|
||||||
|
|
||||||
|
|
||||||
## Cgroup version 2 {#cgroup-v2}
|
## What is cgroup v2? {#cgroup-v2}
|
||||||
{{< feature-state for_k8s_version="v1.25" state="stable" >}}
|
{{< feature-state for_k8s_version="v1.25" state="stable" >}}
|
||||||
|
|
||||||
Cgroup v2 is the next version of the cgroup Linux API. Cgroup v2 provides a
|
cgroup v2 is the next version of the Linux `cgroup` API. cgroup v2 provides a
|
||||||
unified control system, which provides enhanced resource management
|
unified control system with enhanced resource management
|
||||||
capabilities.
|
capabilities.
|
||||||
|
|
||||||
The new version offers several improvements over cgroup v1, some of these improvements are:
|
cgroup v2 offers several improvements over cgroup v1, such as the following:
|
||||||
|
|
||||||
- cleaner and easier to use API with a unified hierarchy
|
- Single unified hierarchy design in API
|
||||||
- safe sub-tree delegation to containers
|
- Safer sub-tree delegation to containers
|
||||||
- newer features like Pressure Stall Information
|
- Newer features like [Pressure Stall Information](https://www.kernel.org/doc/html/latest/accounting/psi.html)
|
||||||
- enhanced accounting and isolation across multiple resources
|
- Enhanced resource allocation management and isolation across multiple resources
|
||||||
- accounting for network memory
|
- Unified accounting for different types of memory allocations (network memory, kernel memory, etc)
|
||||||
|
- Accounting for non-immediate resource changes such as page cache write backs
|
||||||
|
|
||||||
|
Some Kubernetes features exclusively use cgroup v2 for enhanced resource
|
||||||
Some kubernetes features exclusively rely on on cgroupv2 for enhanced resource
|
|
||||||
management and isolation. For example, the
|
management and isolation. For example, the
|
||||||
[MemoryQoS](/blog/2021/11/26/qos-memory-resources/) feature improves memory QoS
|
[MemoryQoS](/blog/2021/11/26/qos-memory-resources/) feature improves memory QoS
|
||||||
and relies on cgroupv2 primitives. New upcoming resource management
|
and relies on cgroup v2 primitives.
|
||||||
capabilities in kubelet will depend on cgroupv2 as well.
|
|
||||||
|
|
||||||
|
|
||||||
## Using cgroupv2
|
## Using cgroup v2 {#using-cgroupv2}
|
||||||
|
|
||||||
To use cgroupv2, it is recommended to use a Linux distribution which enables
|
The recommended way to use cgroup v2 is to use a Linux distribution that
|
||||||
cgroupv2 out of the box. Most new modern linux distributions have switched over
|
enables and uses cgroup v2 by default.
|
||||||
to cgroupv2 by default.
|
|
||||||
|
|
||||||
To check if your distribution is using cgroupv2, follow the steps [below](#check-cgroup-version).
|
To check if your distribution uses cgroup v2, refer to [Identify cgroup version on Linux nodes](#check-cgroup-version).
|
||||||
|
|
||||||
To use cgroupv2 the following requirements must be met:
|
### Requirements
|
||||||
|
|
||||||
* OS distribution enables cgroupv2
|
cgroup v2 has the following requirements:
|
||||||
* Linux Kernel version is >= 5.8
|
|
||||||
* Container runtime supports cgroupv2
|
|
||||||
* [containerd](https://containerd.io/) since 1.4
|
|
||||||
* [cri-o](https://cri-o.io/) since 1.20
|
|
||||||
* Kubelet and container runtime are configured to use the [systemd cgroup driver](/docs/setup/production-environment/container-runtimes#systemd-cgroup-driver)
|
|
||||||
|
|
||||||
### Linux Distribution cgroupv2 support
|
* OS distribution enables cgroup v2
|
||||||
|
* Linux Kernel version is 5.8 or later
|
||||||
|
* Container runtime supports cgroup v2. For example:
|
||||||
|
* [containerd](https://containerd.io/) v1.4 and later
|
||||||
|
* [cri-o](https://cri-o.io/) v1.20 and later
|
||||||
|
* The kubelet and the container runtime are configured to use the [systemd cgroup driver](/docs/setup/production-environment/container-runtimes#systemd-cgroup-driver)
|
||||||
|
|
||||||
Many Linux Distributions have already switched over to use cgroupv2 by default, for example:
|
### Linux Distribution cgroup v2 support
|
||||||
|
|
||||||
|
For a list of Linux distributions that use cgroup v2, refer to the [cgroup v2 documentation](https://github.com/opencontainers/runc/blob/main/docs/cgroup-v2.md)
|
||||||
|
|
||||||
<!-- the list should be kept in sync with https://github.com/opencontainers/runc/blob/main/docs/cgroup-v2.md -->
|
<!-- the list should be kept in sync with https://github.com/opencontainers/runc/blob/main/docs/cgroup-v2.md -->
|
||||||
* Container Optimized OS M97
|
* Container Optimized OS (since M97)
|
||||||
* Ubuntu (since 21.10, 22.04+ recommended)
|
* Ubuntu (since 21.10, 22.04+ recommended)
|
||||||
* Debian GNU/Linux (since Debian 11 buster)
|
* Debian GNU/Linux (since Debian 11 bullseye)
|
||||||
* Fedora (since 31)
|
* Fedora (since 31)
|
||||||
* Arch Linux (since April 2021)
|
* Arch Linux (since April 2021)
|
||||||
* RHEL and RHEL-like distributions (since 9)
|
* RHEL and RHEL-like distributions (since 9)
|
||||||
|
|
||||||
To check if your distribution is using cgroupv2, refer to your distribution's
|
To check if your distribution is using cgroup v2, refer to your distribution's
|
||||||
documentation or follow the steps [below](#check-cgroup-version) to verify the
|
documentation or follow the instructions in [Identify the cgroup version on Linux nodes](#check-cgroup-version).
|
||||||
configuration.
|
|
||||||
|
|
||||||
You can also enable cgroupv2 manually on your Linux distribution by modifying
|
You can also enable cgroup v2 manually on your Linux distribution by modifying
|
||||||
the kernel boot arguments in the GRUB command line, and setting
|
the kernel cmdline boot arguments. If your distribution uses GRUB,
|
||||||
`systemd.unified_cgroup_hierarchy=1`, however it's recommended to use a
|
`systemd.unified_cgroup_hierarchy=1` should be added in `GRUB_CMDLINE_LINUX`
|
||||||
distribution that already enables cgroupv2 by default.
|
under `/etc/default/grub`, followed by `sudo update-grub`. However, the
|
||||||
|
recommended approach is to use a distribution that already enables cgroup v2 by
|
||||||
|
default.
|
||||||
|
|
||||||
|
### Migrating to cgroup v2 {#migrating-cgroupv2}
|
||||||
|
|
||||||
### Migrating to cgroupv2
|
To migrate to cgroup v2, ensure that you meet the [requirements](#requirements), then upgrade
|
||||||
|
to a kernel version that enables cgroup v2 by default.
|
||||||
|
|
||||||
To migrate to cgroupv2, update to a newer kernel version that enables cgroupv2
|
The kubelet automatically detects that the OS is running on cgroup v2 and
|
||||||
by default, ensure your container runtime supports cgroupv2, and configure
|
performs accordingly with no additional configuration required.
|
||||||
kubelet and container runtime are configured to use the [systemd cgroup
|
|
||||||
driver](/docs/setup/production-environment/container-runtimes#systemd-cgroup-driver).
|
|
||||||
|
|
||||||
Kubelet will automatically detect that the OS is running on cgroupv2 and will
|
|
||||||
perform accordingly, no additional configuration is required.
|
|
||||||
|
|
||||||
There should not be any noticeable difference in the user experience when
|
There should not be any noticeable difference in the user experience when
|
||||||
switching to cgroup v2, unless users are accessing the cgroup file system
|
switching to cgroup v2, unless users are accessing the cgroup file system
|
||||||
directly, either on the node or from within the containers.
|
directly, either on the node or from within the containers.
|
||||||
|
|
||||||
Cgroup V2 uses a new API as compared to cgroup V1, so if there are any
|
cgroup v2 uses a different API than cgroup v1, so if there are any
|
||||||
applications that directly access the cgroup file system, they need to be
|
applications that directly access the cgroup file system, they need to be
|
||||||
updated to newer versions that support cgroupv2. For example:
|
updated to newer versions that support cgroup v2. For example:
|
||||||
|
|
||||||
* Some third party monitoring and security agents may be dependent on cgroup filesystem.
|
* Some third-party monitoring and security agents may depend on the cgroup filesystem.
|
||||||
Update them to the latest versions that support cgroupv2
|
Update these agents to versions that support cgroup v2.
|
||||||
* If you are running [cAdvisor](https://github.com/google/cadvisor) as a
|
* If you run [cAdvisor](https://github.com/google/cadvisor) as a stand-alone
|
||||||
daemonset for monitoring pods and containers, update it to latest version (v0.45.0)
|
DaemonSet for monitoring pods and containers, update it to v0.43.0 or later.
|
||||||
* If you use JDK (Java workload), prefer to use JDK 11.0.16 and later or JDK 15
|
* If you use JDK, prefer to use JDK 11.0.16 and later or JDK 15 and later, which [fully support cgroup v2](https://bugs.openjdk.org/browse/JDK-8230305).
|
||||||
and later, which [fully support
|
|
||||||
cgroupv2](https://bugs.openjdk.org/browse/JDK-8230305)
|
|
||||||
|
|
||||||
|
## Identify the cgroup version on Linux Nodes {#check-cgroup-version}
|
||||||
|
|
||||||
## Identifying cgroup version used on Linux Nodes {#check-cgroup-version}
|
The cgroup version depends on on the Linux distribution being used and the
|
||||||
|
|
||||||
The cgroup version is dependent on the Linux distribution being used and the
|
|
||||||
default cgroup version configured on the OS. To check which cgroup version your
|
default cgroup version configured on the OS. To check which cgroup version your
|
||||||
OS Distro is using, you can run the `stat -fc %T /sys/fs/cgroup/` command on
|
distribution uses, run the `stat -fc %T /sys/fs/cgroup/` command on
|
||||||
the node and check if the output is `cgroup2fs`:
|
the node:
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
# On a cgroupv2 node:
|
stat -fc %T /sys/fs/cgroup/
|
||||||
$ stat -fc %T /sys/fs/cgroup/
|
|
||||||
cgroup2fs
|
|
||||||
|
|
||||||
# On a cgroupv1 node:
|
|
||||||
$ stat -fc %T /sys/fs/cgroup/
|
|
||||||
tmpfs
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
For cgroup v2, the output is `cgroup2fs`.
|
||||||
|
|
||||||
|
For cgroup v1, the output is `tmpfs.`
|
||||||
|
|
||||||
## {{% heading "whatsnext" %}}
|
## {{% heading "whatsnext" %}}
|
||||||
|
|
||||||
- Learn more about [cgroups](https://man7.org/linux/man-pages/man7/cgroups.7.html)
|
- Learn more about [cgroups](https://man7.org/linux/man-pages/man7/cgroups.7.html)
|
||||||
|
|
|
@ -89,12 +89,11 @@ are used to constrain resources that are allocated to processes.
|
||||||
|
|
||||||
Both {{< glossary_tooltip text="kubelet" term_id="kubelet" >}} and the
|
Both {{< glossary_tooltip text="kubelet" term_id="kubelet" >}} and the
|
||||||
underlying container runtime need to interface with control groups to enforce
|
underlying container runtime need to interface with control groups to enforce
|
||||||
[resource mangement for pods and
|
[resource management for pods and containers](/docs/concepts/configuration/manage-resources-containers/) and set
|
||||||
containers](/docs/concepts/configuration/manage-resources-containers/) and set
|
|
||||||
resources such as cpu/memory requests and limits. To interface with control
|
resources such as cpu/memory requests and limits. To interface with control
|
||||||
groups, kubelet and container runtime need to use a "cgroup driver". It's
|
groups, the kubelet and the container runtime need to use a *cgroup driver*.
|
||||||
critical that both kubelet and the container runtime cgroup driver match and
|
It's critical that the kubelet and the container runtime uses the same cgroup
|
||||||
are configured the same.
|
driver and are configured the same.
|
||||||
|
|
||||||
There are two cgroup drivers available:
|
There are two cgroup drivers available:
|
||||||
|
|
||||||
|
@ -103,15 +102,15 @@ There are two cgroup drivers available:
|
||||||
|
|
||||||
### cgroupfs driver {#cgroupfs-cgroup-driver}
|
### cgroupfs driver {#cgroupfs-cgroup-driver}
|
||||||
|
|
||||||
The `cgroupfs` driver is the default cgroup driver in kubelet. When `cgroupfs`
|
The `cgroupfs` driver is the default cgroup driver in the kubelet. When the `cgroupfs`
|
||||||
driver is used, kubelet and the container runtime will directly interface with
|
driver is used, the kubelet and the container runtime directly interface with
|
||||||
the cgroup filesystem to configure cgroups.
|
the cgroup filesystem to configure cgroups.
|
||||||
|
|
||||||
The `cgroupfs` is **not** recommended to be used when
|
The `cgroupfs` driver is **not** recommended when
|
||||||
[systemd](https://www.freedesktop.org/wiki/Software/systemd/) is choosen as the
|
[systemd](https://www.freedesktop.org/wiki/Software/systemd/) is the
|
||||||
init system since systemd expects there to only be a single cgroup manager on
|
init system because systemd expects a single cgroup manager on
|
||||||
the system. Additionally, if [cgroupv2](/docs/concepts/architecture/cgroups) is
|
the system. Additionally, if you use [cgroup v2](/docs/concepts/architecture/cgroups)
|
||||||
used, it's also recommended to use the `systemd` cgroup driver instead of
|
, use the `systemd` cgroup driver instead of
|
||||||
`cgroupfs`.
|
`cgroupfs`.
|
||||||
|
|
||||||
### systemd cgroup driver {#systemd-cgroup-driver}
|
### systemd cgroup driver {#systemd-cgroup-driver}
|
||||||
|
@ -120,39 +119,35 @@ When [systemd](https://www.freedesktop.org/wiki/Software/systemd/) is chosen as
|
||||||
system for a Linux distribution, the init process generates and consumes a root control group
|
system for a Linux distribution, the init process generates and consumes a root control group
|
||||||
(`cgroup`) and acts as a cgroup manager.
|
(`cgroup`) and acts as a cgroup manager.
|
||||||
|
|
||||||
Systemd has a tight integration with cgroups and allocates a cgroup per systemd
|
systemd has a tight integration with cgroups and allocates a cgroup per systemd
|
||||||
unit. As a result, when using `systemd` as the init system, but `cgroupfs`
|
unit. As a result, if you use `systemd` as the init system with the `cgroupfs`
|
||||||
driver, there will be two different cpu managers on the system which is
|
driver, the system gets two different cgroup managers.
|
||||||
undesirable.
|
|
||||||
|
|
||||||
A single cgroup manager simplifies the view of what resources are being
|
Two cgroup managers result in two views of the available and in-use resources in
|
||||||
allocated and will by default have a more consistent view of the available and
|
the system. In some cases, nodes that are configured to use `cgroupfs` for the
|
||||||
in-use resources. When there are two cgroup managers on a system, you end up
|
kubelet and container runtime, but use `systemd` for the rest of the processes become
|
||||||
with two views of those resources. In the field, people have reported cases
|
unstable under resource pressure.
|
||||||
where nodes that are configured to use `cgroupfs` for the kubelet and container
|
|
||||||
runtime, but `systemd` for the rest of the processes, become unstable under
|
|
||||||
resource pressure. Changing the settings such that your container runtime and
|
|
||||||
kubelet use `systemd` as the cgroup driver stabilized the system.
|
|
||||||
|
|
||||||
Additionally, if your OS distribution is using [cgroupv2](/docs/concepts/architecture/cgroups), it is highly
|
The approach to mitigate this instability is to use `systemd` as the cgroup driver for
|
||||||
recommended to use the `systemd` cgroup driver.
|
the kubelet and the container runtime when systemd is the selected init system.
|
||||||
|
|
||||||
To set `systemd` as the cgroup driver edit the
|
To set `systemd` as the cgroup driver, edit the
|
||||||
[`KubeletConfiguration`](/docs/tasks/administer-cluster/kubelet-config-file/)
|
[`KubeletConfiguration`](/docs/tasks/administer-cluster/kubelet-config-file/)
|
||||||
option of `cgroupDriver` and set it to `systemd`. For example:
|
option of `cgroupDriver` and set it to `systemd`. For example:
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
apiVersion: kubelet.config.k8s.io/v1beta1
|
apiVersion: kubelet.config.k8s.io/v1beta1
|
||||||
kind: KubeletConfiguration
|
kind: KubeletConfiguration
|
||||||
... rest of config ...
|
...
|
||||||
cgroupDriver: systemd
|
cgroupDriver: systemd
|
||||||
```
|
```
|
||||||
|
|
||||||
If kubelet is configured with `systemd` as cgroupDriver, the container runtime
|
If you configure `systemd` as the cgroup driver for the kubelet, you must also
|
||||||
must also be configured to use the `systemd` as the cgroup driver. If using
|
configure `systemd` as the cgroup driver for the container runtime. Refer to
|
||||||
containerd, it can be configured to use systemd cgroup driver as described
|
the documentation for your container runtime for instructions. For example:
|
||||||
[here](#containerd-systemd). [CRI-O](#cri-o) already defaults to systemd cgroup
|
|
||||||
driver. For other container runtimes, refer to their specific documentation.
|
* [containerd](#containerd-systemd)
|
||||||
|
* [CRI-O](#cri-o)
|
||||||
|
|
||||||
{{< caution >}}
|
{{< caution >}}
|
||||||
Changing the cgroup driver of a Node that has joined a cluster is a sensitive operation.
|
Changing the cgroup driver of a Node that has joined a cluster is a sensitive operation.
|
||||||
|
@ -213,7 +208,7 @@ To use the `systemd` cgroup driver in `/etc/containerd/config.toml` with `runc`,
|
||||||
SystemdCgroup = true
|
SystemdCgroup = true
|
||||||
```
|
```
|
||||||
|
|
||||||
`systemd` cgroup driver is recommended to set if using [cgroupv2](/docs/concepts/architecture/cgroups).
|
The `systemd` cgroup driver is recommended if you use [cgroup v2](/docs/concepts/architecture/cgroups).
|
||||||
|
|
||||||
{{< note >}}
|
{{< note >}}
|
||||||
If you installed containerd from a package (for example, RPM or `.deb`), you may find
|
If you installed containerd from a package (for example, RPM or `.deb`), you may find
|
||||||
|
|
Loading…
Reference in New Issue