CGroupV2 GA blog post

Add blog for cgroupv2 GA in 1.25

Signed-off-by: David Porter <david@porter.me>
pull/35452/head
David Porter 2022-07-27 10:01:12 -07:00
parent a30869337c
commit f1ad13a649
1 changed files with 95 additions and 0 deletions

View File

@ -0,0 +1,95 @@
---
layout: blog
title: "CGroup V2 Goes GA in 1.25"
date: 2022-08-04
slug: cgroupv2-ga-1-25
---
Authors: David Porter (Google), Mrunal Patel (RedHat)
Kubernetes 1.25 brings cgroup v2 to GA, enabling kubelet to use the latest resource management capabilities.
## What are cgroups?
One of the most important aspects of Kubernetes is [resource management](/docs/concepts/configuration/manage-resources-containers/) which is concerned with how to manage finite resources in clusters across nodes such as CPU, Memory, Disk, etc.
Anytime you're making use of resource management capabilities in Kubernetes like setting [Requests and Limits across CPU and Memory](/docs/concepts/configuration/manage-resources-containers/#requests-and-limits) for your pods, cgroups are being used.
cgroups are a low level Linux kernel capability that underpin resource management functionality like limiting CPU usage or setting memory limits.
There are two versions of cgroups offered in the Linux kernel cgroups v1 and cgroups v2.
## What is cgroup v2?
cgroup v2 is the next version of the Linux cgroup API. cgroup v2 provides a unified control system with enhanced resource management capabilities.
cgroup v2 has been development in the Linux Kernel since 2016 and in recent years has matured across the container ecosystem. With Kubernetes 1.25, cgroup v2 support is graduating to GA.
Many new Linux distributions have switched over to cgroup v2 by default so it's important that Kubernetes continues to work well on these new updated distros.
cgroup v2 offers several improvements over cgroup v1, such as the following:
* Single unified hierarchy design in API
* Safer sub-tree delegation to containers
* Newer features like Pressure Stall Information
* Enhanced resource allocation management and isolation across multiple resources
* Unified accounting for different types of memory allocations (network and kernel memory, etc)
* Accounting for non-immediate resource changes such as page cache write backs
Some Kubernetes features exclusively use cgroup v2 for enhanced resource management and isolation. For example, the [MemoryQoS feature](https://kubernetes.io/blog/2021/11/26/qos-memory-resources/) improves memory QoS and relies on cgroup v2 primitives to enable it. New resource management features in kubelet will also take advantage of the new cgroup v2 features moving forward.
## How do I use cgroup v2?
As you update your cluster and nodes to a newer Linux distribution, you may be already using cgroup v2!
The recommended way to use cgroup v2 is to use a Linux distribution that enables and uses cgroup v2 by default, for example one of the many distributions below:
* Container Optimized OS (since M97)
* Ubuntu (since 21.10)
* Debian GNU/Linux (since Debian 11 Bullseye)
* Fedora (since 31)
* Arch Linux (since April 2021)
* RHEL and RHEL-like distributions (since 9)
To check if your distribution uses cgroup v2 by default, you can follow the instructions [here](/docs/concepts/architecture/cgroups/#check-cgroup-version) or consult your OS distribution documentation.
If you're using a managed kubernetes offering, consult with your provider to determine how they're adopting cgroup v2.
Using cgroup v2 with kubernetes has the following requirements:
* OS distribution enables cgroup v2 on kernel version 5.8 or later
* Container runtime supports cgroup v2. For example:
* [containerd](https://containerd.io/) v1.4 or later
* [cri-o](https://cri-o.io/) v1.20 or later
* The kubelet and the container runtime are configured to use the [systemd cgroup driver](/docs/setup/production-environment/container-runtimes#systemd-cgroup-driver)
The kubelet and container runtime use a [cgroup driver](/docs/setup/production-environment/container-runtimes#cgroup-drivers) to set cgroup limits. When using cgroup v2, it's highly recommended that both kubelet and your container runtime (e.g. containerd or CRI-O) use [systemd cgroup driver](/docs/setup/production-environment/container-runtimes#systemd-cgroup-driver). This ensures that there is only a single cgroup manager on the system. Refer to the [systemd cgroup driver documentation](/docs/setup/production-environment/container-runtimes#systemd-cgroup-driver) about how to configure kubelet and the container runtime to use systemd as the cgroup driver.
## Migrating to cgroup v2
When you run kubelet on a Linux distribution that enables cgroup v2, kubelet should automatically perform accordingly without any additional configuration required.
You should not find any noticeable difference in the user experience when switching to cgroup v2, unless users are accessing the cgroup file system directly, either on the node or from within the containers.
It's important to note that the cgroup v2 uses a different API compared to cgroup v1, so if there any applications that read the cgroup file system, they should be updated to support cgroup v2. For example:
* Some third-party monitoring and security agents may depend on the cgroup filesystem. Update these agents to versions that support cgroup v2.
* If you run [cAdvisor](https://github.com/google/cadvisor) as a stand-alone DaemonSet for monitoring pods and containers, update it to v0.43.0 or later.
* If you deploy Java applications with the JDK, prefer to use JDK 11.0.16 and later or JDK 15 and later, which [fully support cgroup v2](https://bugs.openjdk.org/browse/JDK-8230305).
## How can I learn more?
* Read the [Kubernetes cgroup v2 documentation](/docs/concepts/architecture/cgroups)
* Read the enhancement proposal, [KEP 2254](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2254-cgroup-v2/README.md)
* Learn more about [cgroups](https://man7.org/linux/man-pages/man7/cgroups.7.html) on Linux Man Pages and [cgroup v2](https://docs.kernel.org/admin-guide/cgroup-v2.html) on the Linux Kernel documentation
## How do I get involved?
Your feedback is always welcome! SIG Node meets regularly and can be reached via [Slack](https://slack.k8s.io/) (channel `#sig-node`), or the SIG's [mailing list](https://github.com/kubernetes/community/tree/master/sig-node#contact).
We would love work on new resource management capabilities on top of cgroup v2.
cgroup v2 has had a long journey and is a great example of OSS community collaboration across the industry since it requires work across the stack from the Linux Kernel to systemd to container runtimes, and Kubernetes.
We would like to thank [Giuseppe Scrivano](https://github.com/giuseppe) who initiated cgroup v2 support in kubernetes, and reviews and leadership from the SIG-Node community including chairs [Dawn Chen](https://github.com/dchen1107) and [Derek Carr](https://github.com/derekwaynecarr).
Additionally we would like to thank maintainers of container runtimes like Docker, containerd and CRI-O as well as maintainers of low level components such as [cAdvisor](https://github.com/google/cadvisor) and [runc, libcontainer](https://github.com/opencontainers/runc) which underpin many container runtimes. And of course it would not be possible without support from systemd and upstream Linux Kernel maintainers. It's a team effort!