Merge pull request #45178 from kinvolk/rata/userns-1.30
User namespaces doc changes for 1.30pull/45624/head
commit
c7cd6c5644
|
@ -7,7 +7,7 @@ min-kubernetes-server-version: v1.25
|
|||
---
|
||||
|
||||
<!-- overview -->
|
||||
{{< feature-state for_k8s_version="v1.25" state="alpha" >}}
|
||||
{{< feature-state for_k8s_version="v1.30" state="beta" >}}
|
||||
|
||||
This page explains how user namespaces are used in Kubernetes pods. A user
|
||||
namespace isolates the user running inside the container from the one
|
||||
|
@ -46,7 +46,26 @@ tmpfs, Secrets use a tmpfs, etc.)
|
|||
Some popular filesystems that support idmap mounts in Linux 6.3 are: btrfs,
|
||||
ext4, xfs, fat, tmpfs, overlayfs.
|
||||
|
||||
In addition, support is needed in the
|
||||
In addition, the container runtime and its underlying OCI runtime must support
|
||||
user namespaces. The following OCI runtimes offer support:
|
||||
|
||||
* [crun](https://github.com/containers/crun) version 1.9 or greater (it's recommend version 1.13+).
|
||||
|
||||
<!-- ideally, update this if a newer minor release of runc comes out, whether or not it includes the idmap support -->
|
||||
{{< note >}}
|
||||
Many OCI runtimes do not include the support needed for using user namespaces in
|
||||
Linux pods. If you use a managed Kubernetes, or have downloaded it from packages
|
||||
and set it up, it's likely that nodes in your cluster use a runtime that doesn't
|
||||
include this support. For example, the most widely used OCI runtime is `runc`,
|
||||
and version `1.1.z` of runc doesn't support all the features needed by the
|
||||
Kubernetes implementation of user namespaces.
|
||||
|
||||
If there is a newer release of runc than 1.1 available for use, check its
|
||||
documentation and release notes for compatibility (look for idmap mounts support
|
||||
in particular, because that is the missing feature).
|
||||
{{< /note >}}
|
||||
|
||||
To use user namespaces with Kubernetes, you also need to use a CRI
|
||||
{{< glossary_tooltip text="container runtime" term_id="container-runtime" >}}
|
||||
to use this feature with Kubernetes pods:
|
||||
|
||||
|
@ -137,20 +156,67 @@ use, see `man 7 user_namespaces`.
|
|||
|
||||
## Set up a node to support user namespaces
|
||||
|
||||
It is recommended that the host's files and host's processes use UIDs/GIDs in
|
||||
the range of 0-65535.
|
||||
By default, the kubelet assigns pods UIDs/GIDs above the range 0-65535, based on
|
||||
the assumption that the host's files and processes use UIDs/GIDs within this
|
||||
range, which is standard for most Linux distributions. This approach prevents
|
||||
any overlap between the UIDs/GIDs of the host and those of the pods.
|
||||
|
||||
The kubelet will assign UIDs/GIDs higher than that to pods. Therefore, to
|
||||
guarantee as much isolation as possible, the UIDs/GIDs used by the host's files
|
||||
and host's processes should be in the range 0-65535.
|
||||
Avoiding the overlap is important to mitigate the impact of vulnerabilities such
|
||||
as [CVE-2021-25741][CVE-2021-25741], where a pod can potentially read arbitrary
|
||||
files in the host. If the UIDs/GIDs of the pod and the host don't overlap, it is
|
||||
limited what a pod would be able to do: the pod UID/GID won't match the host's
|
||||
file owner/group.
|
||||
|
||||
Note that this recommendation is important to mitigate the impact of CVEs like
|
||||
[CVE-2021-25741][CVE-2021-25741], where a pod can potentially read arbitrary
|
||||
files in the hosts. If the UIDs/GIDs of the pod and the host don't overlap, it
|
||||
is limited what a pod would be able to do: the pod UID/GID won't match the
|
||||
host's file owner/group.
|
||||
The kubelet can use a custom range for user IDs and group IDs for pods. To
|
||||
configure a custom range, the node needs to have:
|
||||
|
||||
* A user `kubelet` in the system (you cannot use any other username here)
|
||||
* The binary `getsubids` installed (part of [shadow-utils][shadow-utils]) and
|
||||
in the `PATH` for the kubelet binary.
|
||||
* A configuration of subordinate UIDs/GIDs for the `kubelet` user (see
|
||||
[`man 5 subuid`](https://man7.org/linux/man-pages/man5/subuid.5.html) and
|
||||
[`man 5 subgid`](https://man7.org/linux/man-pages/man5/subgid.5.html)).
|
||||
|
||||
This setting only gathers the UID/GID range configuration and does not change
|
||||
the user executing the `kubelet`.
|
||||
|
||||
You must follow some constraints for the subordinate ID range that you assign
|
||||
to the `kubelet` user:
|
||||
|
||||
* The subordinate user ID, that starts the UID range for Pods, **must** be a
|
||||
multiple of 65536 and must also be greater than or equal to 65536. In other
|
||||
words, you cannot use any ID from the range 0-65535 for Pods; the kubelet
|
||||
imposes this restriction to make it difficult to create an accidentally insecure
|
||||
configuration.
|
||||
|
||||
* The subordinate ID count must be a multiple of 65536
|
||||
|
||||
* The subordinate ID count must be at least `65536 x <maxPods>` where `<maxPods>`
|
||||
is the maximum number of pods that can run on the node.
|
||||
|
||||
* You must assign the same range for both user IDs and for group IDs, It doesn't
|
||||
matter if other users have user ID ranges that don't align with the group ID
|
||||
ranges.
|
||||
|
||||
* None of the assigned ranges should overlap with any other assignment.
|
||||
|
||||
* The subordinate configuration must be only one line. In other words, you can't
|
||||
have multiple ranges.
|
||||
|
||||
For example, you could define `/etc/subuid` and `/etc/subgid` to both have
|
||||
these entries for the `kubelet` user:
|
||||
|
||||
```
|
||||
# The format is
|
||||
# name:firstID:count of IDs
|
||||
# where
|
||||
# - firstID is 65536 (the minimum value possible)
|
||||
# - count of IDs is 110 (default limit for number of) * 65536
|
||||
kubelet:65536:7208960
|
||||
```
|
||||
|
||||
[CVE-2021-25741]: https://github.com/kubernetes/kubernetes/issues/104980
|
||||
[shadow-utils]: https://github.com/shadow-maint/shadow
|
||||
|
||||
## Integration with Pod security admission checks
|
||||
|
||||
|
|
|
@ -6,8 +6,12 @@ _build:
|
|||
render: false
|
||||
|
||||
stages:
|
||||
- stage: alpha
|
||||
- stage: alpha
|
||||
defaultValue: false
|
||||
fromVersion: "1.28"
|
||||
toVersion: "1.29"
|
||||
- stage: beta
|
||||
defaultValue: false
|
||||
fromVersion: "1.30"
|
||||
---
|
||||
Enable user namespace support for Pods.
|
||||
|
|
|
@ -7,7 +7,7 @@ min-kubernetes-server-version: v1.25
|
|||
---
|
||||
|
||||
<!-- overview -->
|
||||
{{< feature-state for_k8s_version="v1.25" state="alpha" >}}
|
||||
{{< feature-state for_k8s_version="v1.30" state="beta" >}}
|
||||
|
||||
This page shows how to configure a user namespace for pods. This allows you to
|
||||
isolate the user running inside the container from the one in the host.
|
||||
|
@ -57,10 +57,6 @@ If you have a mixture of nodes and only some of the nodes provide user namespace
|
|||
Pods, you also need to ensure that the user namespace Pods are
|
||||
[scheduled](/docs/concepts/scheduling-eviction/assign-pod-node/) to suitable nodes.
|
||||
|
||||
Please note that **if your container runtime doesn't support user namespaces, the
|
||||
`hostUsers` field in the pod spec will be silently ignored and the pod will be
|
||||
created without user namespaces.**
|
||||
|
||||
<!-- steps -->
|
||||
|
||||
## Run a Pod that uses a user namespace {#create-pod}
|
||||
|
@ -82,27 +78,42 @@ to `false`. For example:
|
|||
kubectl attach -it userns bash
|
||||
```
|
||||
|
||||
And run the command. The output is similar to this:
|
||||
Run this command:
|
||||
|
||||
```none
|
||||
```shell
|
||||
readlink /proc/self/ns/user
|
||||
user:[4026531837]
|
||||
cat /proc/self/uid_map
|
||||
0 0 4294967295
|
||||
```
|
||||
|
||||
Then, open a shell in the host and run the same command.
|
||||
The output is similar to:
|
||||
|
||||
The output must be different. This means the host and the pod are using a
|
||||
different user namespace. When user namespaces are not enabled, the host and the
|
||||
pod use the same user namespace.
|
||||
```shell
|
||||
user:[4026531837]
|
||||
```
|
||||
|
||||
Also run:
|
||||
|
||||
```shell
|
||||
cat /proc/self/uid_map
|
||||
```
|
||||
|
||||
The output is similar to:
|
||||
```shell
|
||||
0 833617920 65536
|
||||
```
|
||||
|
||||
Then, open a shell in the host and run the same commands.
|
||||
|
||||
The `readlink` command shows the user namespace the process is running in. It
|
||||
should be different when it is run on the host and inside the container.
|
||||
|
||||
The last number of the `uid_map` file inside the container must be 65536, on the
|
||||
host it must be a bigger number.
|
||||
|
||||
If you are running the kubelet inside a user namespace, you need to compare the
|
||||
output from running the command in the pod to the output of running in the host:
|
||||
|
||||
```none
|
||||
```shell
|
||||
readlink /proc/$pid/ns/user
|
||||
user:[4026534732]
|
||||
```
|
||||
|
||||
replacing `$pid` with the kubelet PID.
|
||||
|
|
Loading…
Reference in New Issue