Apply suggestions from code review
Co-authored-by: Shannon Kularathna <ax3shannonkularathna@gmail.com> Signed-off-by: David Porter <david@porter.me>pull/35180/head
parent
ecc7ed5a74
commit
9dee6a0491
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
title: Cgroup V2
|
||||
title: About cgroup v2
|
||||
content_type: concept
|
||||
weight: 50
|
||||
---
|
||||
|
@ -7,126 +7,118 @@ weight: 50
|
|||
<!-- overview -->
|
||||
|
||||
On Linux, {{< glossary_tooltip text="control groups" term_id="cgroup" >}}
|
||||
are used to constrain resources that are allocated to processes.
|
||||
constrain resources that are allocated to processes.
|
||||
|
||||
{{< glossary_tooltip text="kubelet" term_id="kubelet" >}} and the
|
||||
underlying container runtime need to interface with control groups to enforce
|
||||
[resource mangement for pods and
|
||||
containers](/docs/concepts/configuration/manage-resources-containers/) and set
|
||||
resources such as cpu/memory requests and limits.
|
||||
The {{< glossary_tooltip text="kubelet" term_id="kubelet" >}} and the
|
||||
underlying container runtime need to interface with cgroups to enforce
|
||||
[resource mangement for pods and containers](/docs/concepts/configuration/manage-resources-containers/) which
|
||||
includes cpu/memory requests and limits for containerized workloads.
|
||||
|
||||
There are two versions of cgroups in linux: cgroupv1 and cgroupv2. Cgroupv2 is
|
||||
the new generation of the cgroup API.
|
||||
There are two versions of cgroups in Linux: cgroup v1 and cgroup v2. cgroup v2 is
|
||||
the new generation of the `cgroup` API.
|
||||
|
||||
<!-- body -->
|
||||
|
||||
|
||||
## Cgroup version 2 {#cgroup-v2}
|
||||
## What is cgroup v2? {#cgroup-v2}
|
||||
{{< feature-state for_k8s_version="v1.25" state="stable" >}}
|
||||
|
||||
Cgroup v2 is the next version of the cgroup Linux API. Cgroup v2 provides a
|
||||
unified control system, which provides enhanced resource management
|
||||
cgroup v2 is the next version of the Linux `cgroup` API. cgroup v2 provides a
|
||||
unified control system with enhanced resource management
|
||||
capabilities.
|
||||
|
||||
The new version offers several improvements over cgroup v1, some of these improvements are:
|
||||
cgroup v2 offers several improvements over cgroup v1, such as the following:
|
||||
|
||||
- cleaner and easier to use API with a unified hierarchy
|
||||
- safe sub-tree delegation to containers
|
||||
- newer features like Pressure Stall Information
|
||||
- enhanced accounting and isolation across multiple resources
|
||||
- accounting for network memory
|
||||
- Single unified hierarchy design in API
|
||||
- Safer sub-tree delegation to containers
|
||||
- Newer features like [Pressure Stall Information](https://www.kernel.org/doc/html/latest/accounting/psi.html)
|
||||
- Enhanced resource allocation management and isolation across multiple resources
|
||||
- Unified accounting for different types of memory allocations (network memory, kernel memory, etc)
|
||||
- Accounting for non-immediate resource changes such as page cache write backs
|
||||
|
||||
|
||||
Some kubernetes features exclusively rely on on cgroupv2 for enhanced resource
|
||||
Some Kubernetes features exclusively use cgroup v2 for enhanced resource
|
||||
management and isolation. For example, the
|
||||
[MemoryQoS](/blog/2021/11/26/qos-memory-resources/) feature improves memory QoS
|
||||
and relies on cgroupv2 primitives. New upcoming resource management
|
||||
capabilities in kubelet will depend on cgroupv2 as well.
|
||||
and relies on cgroup v2 primitives.
|
||||
|
||||
|
||||
## Using cgroupv2
|
||||
## Using cgroup v2 {#using-cgroupv2}
|
||||
|
||||
To use cgroupv2, it is recommended to use a Linux distribution which enables
|
||||
cgroupv2 out of the box. Most new modern linux distributions have switched over
|
||||
to cgroupv2 by default.
|
||||
The recommended way to use cgroup v2 is to use a Linux distribution that
|
||||
enables and uses cgroup v2 by default.
|
||||
|
||||
To check if your distribution is using cgroupv2, follow the steps [below](#check-cgroup-version).
|
||||
To check if your distribution uses cgroup v2, refer to [Identify cgroup version on Linux nodes](#check-cgroup-version).
|
||||
|
||||
To use cgroupv2 the following requirements must be met:
|
||||
### Requirements
|
||||
|
||||
* OS distribution enables cgroupv2
|
||||
* Linux Kernel version is >= 5.8
|
||||
* Container runtime supports cgroupv2
|
||||
* [containerd](https://containerd.io/) since 1.4
|
||||
* [cri-o](https://cri-o.io/) since 1.20
|
||||
* Kubelet and container runtime are configured to use the [systemd cgroup driver](/docs/setup/production-environment/container-runtimes#systemd-cgroup-driver)
|
||||
cgroup v2 has the following requirements:
|
||||
|
||||
### Linux Distribution cgroupv2 support
|
||||
* OS distribution enables cgroup v2
|
||||
* Linux Kernel version is 5.8 or later
|
||||
* Container runtime supports cgroup v2. For example:
|
||||
* [containerd](https://containerd.io/) v1.4 and later
|
||||
* [cri-o](https://cri-o.io/) v1.20 and later
|
||||
* The kubelet and the container runtime are configured to use the [systemd cgroup driver](/docs/setup/production-environment/container-runtimes#systemd-cgroup-driver)
|
||||
|
||||
Many Linux Distributions have already switched over to use cgroupv2 by default, for example:
|
||||
### Linux Distribution cgroup v2 support
|
||||
|
||||
For a list of Linux distributions that use cgroup v2, refer to the [cgroup v2 documentation](https://github.com/opencontainers/runc/blob/main/docs/cgroup-v2.md)
|
||||
|
||||
<!-- the list should be kept in sync with https://github.com/opencontainers/runc/blob/main/docs/cgroup-v2.md -->
|
||||
* Container Optimized OS M97
|
||||
* Container Optimized OS (since M97)
|
||||
* Ubuntu (since 21.10, 22.04+ recommended)
|
||||
* Debian GNU/Linux (since Debian 11 buster)
|
||||
* Debian GNU/Linux (since Debian 11 bullseye)
|
||||
* Fedora (since 31)
|
||||
* Arch Linux (since April 2021)
|
||||
* RHEL and RHEL-like distributions (since 9)
|
||||
|
||||
To check if your distribution is using cgroupv2, refer to your distribution's
|
||||
documentation or follow the steps [below](#check-cgroup-version) to verify the
|
||||
configuration.
|
||||
To check if your distribution is using cgroup v2, refer to your distribution's
|
||||
documentation or follow the instructions in [Identify the cgroup version on Linux nodes](#check-cgroup-version).
|
||||
|
||||
You can also enable cgroupv2 manually on your Linux distribution by modifying
|
||||
the kernel boot arguments in the GRUB command line, and setting
|
||||
`systemd.unified_cgroup_hierarchy=1`, however it's recommended to use a
|
||||
distribution that already enables cgroupv2 by default.
|
||||
You can also enable cgroup v2 manually on your Linux distribution by modifying
|
||||
the kernel cmdline boot arguments. If your distribution uses GRUB,
|
||||
`systemd.unified_cgroup_hierarchy=1` should be added in `GRUB_CMDLINE_LINUX`
|
||||
under `/etc/default/grub`, followed by `sudo update-grub`. However, the
|
||||
recommended approach is to use a distribution that already enables cgroup v2 by
|
||||
default.
|
||||
|
||||
### Migrating to cgroup v2 {#migrating-cgroupv2}
|
||||
|
||||
### Migrating to cgroupv2
|
||||
To migrate to cgroup v2, ensure that you meet the [requirements](#requirements), then upgrade
|
||||
to a kernel version that enables cgroup v2 by default.
|
||||
|
||||
To migrate to cgroupv2, update to a newer kernel version that enables cgroupv2
|
||||
by default, ensure your container runtime supports cgroupv2, and configure
|
||||
kubelet and container runtime are configured to use the [systemd cgroup
|
||||
driver](/docs/setup/production-environment/container-runtimes#systemd-cgroup-driver).
|
||||
|
||||
Kubelet will automatically detect that the OS is running on cgroupv2 and will
|
||||
perform accordingly, no additional configuration is required.
|
||||
The kubelet automatically detects that the OS is running on cgroup v2 and
|
||||
performs accordingly with no additional configuration required.
|
||||
|
||||
There should not be any noticeable difference in the user experience when
|
||||
switching to cgroup v2, unless users are accessing the cgroup file system
|
||||
directly, either on the node or from within the containers.
|
||||
|
||||
Cgroup V2 uses a new API as compared to cgroup V1, so if there are any
|
||||
cgroup v2 uses a different API than cgroup v1, so if there are any
|
||||
applications that directly access the cgroup file system, they need to be
|
||||
updated to newer versions that support cgroupv2. For example:
|
||||
updated to newer versions that support cgroup v2. For example:
|
||||
|
||||
* Some third party monitoring and security agents may be dependent on cgroup filesystem.
|
||||
Update them to the latest versions that support cgroupv2
|
||||
* If you are running [cAdvisor](https://github.com/google/cadvisor) as a
|
||||
daemonset for monitoring pods and containers, update it to latest version (v0.45.0)
|
||||
* If you use JDK (Java workload), prefer to use JDK 11.0.16 and later or JDK 15
|
||||
and later, which [fully support
|
||||
cgroupv2](https://bugs.openjdk.org/browse/JDK-8230305)
|
||||
* Some third-party monitoring and security agents may depend on the cgroup filesystem.
|
||||
Update these agents to versions that support cgroup v2.
|
||||
* If you run [cAdvisor](https://github.com/google/cadvisor) as a stand-alone
|
||||
DaemonSet for monitoring pods and containers, update it to v0.43.0 or later.
|
||||
* If you use JDK, prefer to use JDK 11.0.16 and later or JDK 15 and later, which [fully support cgroup v2](https://bugs.openjdk.org/browse/JDK-8230305).
|
||||
|
||||
## Identify the cgroup version on Linux Nodes {#check-cgroup-version}
|
||||
|
||||
## Identifying cgroup version used on Linux Nodes {#check-cgroup-version}
|
||||
|
||||
The cgroup version is dependent on the Linux distribution being used and the
|
||||
The cgroup version depends on on the Linux distribution being used and the
|
||||
default cgroup version configured on the OS. To check which cgroup version your
|
||||
OS Distro is using, you can run the `stat -fc %T /sys/fs/cgroup/` command on
|
||||
the node and check if the output is `cgroup2fs`:
|
||||
distribution uses, run the `stat -fc %T /sys/fs/cgroup/` command on
|
||||
the node:
|
||||
|
||||
```shell
|
||||
# On a cgroupv2 node:
|
||||
$ stat -fc %T /sys/fs/cgroup/
|
||||
cgroup2fs
|
||||
|
||||
# On a cgroupv1 node:
|
||||
$ stat -fc %T /sys/fs/cgroup/
|
||||
tmpfs
|
||||
stat -fc %T /sys/fs/cgroup/
|
||||
```
|
||||
|
||||
For cgroup v2, the output is `cgroup2fs`.
|
||||
|
||||
For cgroup v1, the output is `tmpfs.`
|
||||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
- Learn more about [cgroups](https://man7.org/linux/man-pages/man7/cgroups.7.html)
|
||||
|
|
|
@ -89,12 +89,11 @@ are used to constrain resources that are allocated to processes.
|
|||
|
||||
Both {{< glossary_tooltip text="kubelet" term_id="kubelet" >}} and the
|
||||
underlying container runtime need to interface with control groups to enforce
|
||||
[resource mangement for pods and
|
||||
containers](/docs/concepts/configuration/manage-resources-containers/) and set
|
||||
[resource management for pods and containers](/docs/concepts/configuration/manage-resources-containers/) and set
|
||||
resources such as cpu/memory requests and limits. To interface with control
|
||||
groups, kubelet and container runtime need to use a "cgroup driver". It's
|
||||
critical that both kubelet and the container runtime cgroup driver match and
|
||||
are configured the same.
|
||||
groups, the kubelet and the container runtime need to use a *cgroup driver*.
|
||||
It's critical that the kubelet and the container runtime uses the same cgroup
|
||||
driver and are configured the same.
|
||||
|
||||
There are two cgroup drivers available:
|
||||
|
||||
|
@ -103,15 +102,15 @@ There are two cgroup drivers available:
|
|||
|
||||
### cgroupfs driver {#cgroupfs-cgroup-driver}
|
||||
|
||||
The `cgroupfs` driver is the default cgroup driver in kubelet. When `cgroupfs`
|
||||
driver is used, kubelet and the container runtime will directly interface with
|
||||
The `cgroupfs` driver is the default cgroup driver in the kubelet. When the `cgroupfs`
|
||||
driver is used, the kubelet and the container runtime directly interface with
|
||||
the cgroup filesystem to configure cgroups.
|
||||
|
||||
The `cgroupfs` is **not** recommended to be used when
|
||||
[systemd](https://www.freedesktop.org/wiki/Software/systemd/) is choosen as the
|
||||
init system since systemd expects there to only be a single cgroup manager on
|
||||
the system. Additionally, if [cgroupv2](/docs/concepts/architecture/cgroups) is
|
||||
used, it's also recommended to use the `systemd` cgroup driver instead of
|
||||
The `cgroupfs` driver is **not** recommended when
|
||||
[systemd](https://www.freedesktop.org/wiki/Software/systemd/) is the
|
||||
init system because systemd expects a single cgroup manager on
|
||||
the system. Additionally, if you use [cgroup v2](/docs/concepts/architecture/cgroups)
|
||||
, use the `systemd` cgroup driver instead of
|
||||
`cgroupfs`.
|
||||
|
||||
### systemd cgroup driver {#systemd-cgroup-driver}
|
||||
|
@ -120,39 +119,35 @@ When [systemd](https://www.freedesktop.org/wiki/Software/systemd/) is chosen as
|
|||
system for a Linux distribution, the init process generates and consumes a root control group
|
||||
(`cgroup`) and acts as a cgroup manager.
|
||||
|
||||
Systemd has a tight integration with cgroups and allocates a cgroup per systemd
|
||||
unit. As a result, when using `systemd` as the init system, but `cgroupfs`
|
||||
driver, there will be two different cpu managers on the system which is
|
||||
undesirable.
|
||||
systemd has a tight integration with cgroups and allocates a cgroup per systemd
|
||||
unit. As a result, if you use `systemd` as the init system with the `cgroupfs`
|
||||
driver, the system gets two different cgroup managers.
|
||||
|
||||
A single cgroup manager simplifies the view of what resources are being
|
||||
allocated and will by default have a more consistent view of the available and
|
||||
in-use resources. When there are two cgroup managers on a system, you end up
|
||||
with two views of those resources. In the field, people have reported cases
|
||||
where nodes that are configured to use `cgroupfs` for the kubelet and container
|
||||
runtime, but `systemd` for the rest of the processes, become unstable under
|
||||
resource pressure. Changing the settings such that your container runtime and
|
||||
kubelet use `systemd` as the cgroup driver stabilized the system.
|
||||
Two cgroup managers result in two views of the available and in-use resources in
|
||||
the system. In some cases, nodes that are configured to use `cgroupfs` for the
|
||||
kubelet and container runtime, but use `systemd` for the rest of the processes become
|
||||
unstable under resource pressure.
|
||||
|
||||
Additionally, if your OS distribution is using [cgroupv2](/docs/concepts/architecture/cgroups), it is highly
|
||||
recommended to use the `systemd` cgroup driver.
|
||||
The approach to mitigate this instability is to use `systemd` as the cgroup driver for
|
||||
the kubelet and the container runtime when systemd is the selected init system.
|
||||
|
||||
To set `systemd` as the cgroup driver edit the
|
||||
To set `systemd` as the cgroup driver, edit the
|
||||
[`KubeletConfiguration`](/docs/tasks/administer-cluster/kubelet-config-file/)
|
||||
option of `cgroupDriver` and set it to `systemd`. For example:
|
||||
|
||||
```yaml
|
||||
apiVersion: kubelet.config.k8s.io/v1beta1
|
||||
kind: KubeletConfiguration
|
||||
... rest of config ...
|
||||
...
|
||||
cgroupDriver: systemd
|
||||
```
|
||||
|
||||
If kubelet is configured with `systemd` as cgroupDriver, the container runtime
|
||||
must also be configured to use the `systemd` as the cgroup driver. If using
|
||||
containerd, it can be configured to use systemd cgroup driver as described
|
||||
[here](#containerd-systemd). [CRI-O](#cri-o) already defaults to systemd cgroup
|
||||
driver. For other container runtimes, refer to their specific documentation.
|
||||
If you configure `systemd` as the cgroup driver for the kubelet, you must also
|
||||
configure `systemd` as the cgroup driver for the container runtime. Refer to
|
||||
the documentation for your container runtime for instructions. For example:
|
||||
|
||||
* [containerd](#containerd-systemd)
|
||||
* [CRI-O](#cri-o)
|
||||
|
||||
{{< caution >}}
|
||||
Changing the cgroup driver of a Node that has joined a cluster is a sensitive operation.
|
||||
|
@ -213,7 +208,7 @@ To use the `systemd` cgroup driver in `/etc/containerd/config.toml` with `runc`,
|
|||
SystemdCgroup = true
|
||||
```
|
||||
|
||||
`systemd` cgroup driver is recommended to set if using [cgroupv2](/docs/concepts/architecture/cgroups).
|
||||
The `systemd` cgroup driver is recommended if you use [cgroup v2](/docs/concepts/architecture/cgroups).
|
||||
|
||||
{{< note >}}
|
||||
If you installed containerd from a package (for example, RPM or `.deb`), you may find
|
||||
|
|
Loading…
Reference in New Issue