Merge pull request #50329 from my-git9/np-24976

[zh-cn]sync assign-pod-node topology-spread-constraints user-namespaces
pull/50419/head
Kubernetes Prow Robot 2025-04-06 22:38:42 -07:00 committed by GitHub
commit 49fa5babc6
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 69 additions and 41 deletions

View File

@ -376,7 +376,7 @@ in the [scheduler configuration](/docs/reference/scheduling/config/). For exampl
`args` 字段添加 `addedAffinity`。例如:
```yaml
apiVersion: kubescheduler.config.k8s.io/v1beta3
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
@ -730,9 +730,31 @@ to the same revision as the incoming Pod, so that a rolling upgrade won't break
确保滚动升级不会破坏亲和性。
<!--
# Only Pods from a given rollout are taken into consideration when calculating pod affinity.
# If you update the Deployment, the replacement Pods follow their own affinity rules
# (if there are any defined in the new Pod template)
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: application-server
...
spec:
template:
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- database
topologyKey: topology.kubernetes.io/zone
# Only Pods from a given rollout are taken into consideration when calculating pod affinity.
# If you update the Deployment, the replacement Pods follow their own affinity rules
# (if there are any defined in the new Pod template)
matchLabelKeys:
- pod-template-hash
```
-->
```yaml
apiVersion: apps/v1

View File

@ -26,7 +26,8 @@ or configure topology spread constraints for individual workloads.
例如区域Region、可用区Zone、节点和其他用户自定义拓扑域。
这样做有助于实现高可用并提升资源利用率。
你可以将[集群级约束](#cluster-level-default-constraints)设为默认值,或为个别工作负载配置拓扑分布约束。
你可以将[集群级约束](#cluster-level-default-constraints)设为默认值,
或为个别工作负载配置拓扑分布约束。
<!-- body -->
@ -62,12 +63,6 @@ are split across three different datacenters (or infrastructure zones). Now you
have less concern about a single node failure, but you notice that latency is
higher than you'd like, and you are paying for network costs associated with
sending network traffic between the different zones.
You decide that under normal operation you'd prefer to have a similar number of replicas
[scheduled](/docs/concepts/scheduling-eviction/) into each infrastructure zone,
and you'd like the cluster to self-heal in the case that there is a problem.
Pod topology spread constraints offer you a declarative way to configure that.
-->
随着你的工作负载扩容,运行的 Pod 变多,将需要考虑另一个重要问题。
假设你有 3 个节点,每个节点运行 5 个 Pod。这些节点有足够的容量能够运行许多副本
@ -75,6 +70,13 @@ Pod topology spread constraints offer you a declarative way to configure that.
现在你可能不太关注单节点故障问题,但你会注意到延迟高于自己的预期,
在不同的可用区之间发送网络流量会产生一些网络成本。
<!--
You decide that under normal operation you'd prefer to have a similar number of replicas
[scheduled](/docs/concepts/scheduling-eviction/) into each infrastructure zone,
and you'd like the cluster to self-heal in the case that there is a problem.
Pod topology spread constraints offer you a declarative way to configure that.
-->
你决定在正常运营时倾向于将类似数量的副本[调度](/zh-cn/docs/concepts/scheduling-eviction/)
到每个基础设施可用区,且你想要该集群在遇到问题时能够自愈。
@ -221,7 +223,13 @@ your cluster. Those fields are:
will try to put a balanced number of pods into each domain.
Also, we define an eligible domain as a domain whose nodes meet the requirements of
nodeAffinityPolicy and nodeTaintsPolicy.
-->
- **topologyKey** 是[节点标签](#node-labels)的键。如果节点使用此键标记并且具有相同的标签值,
则将这些节点视为处于同一拓扑域中。我们将拓扑域中(即键值对)的每个实例称为一个域。
调度器将尝试在每个拓扑域中放置数量均衡的 Pod。
另外,我们将符合条件的域定义为其节点满足 nodeAffinityPolicy 和 nodeTaintsPolicy 要求的域。
<!--
- **whenUnsatisfiable** indicates how to deal with a Pod if it doesn't satisfy the spread constraint:
- `DoNotSchedule` (default) tells the scheduler not to schedule it.
- `ScheduleAnyway` tells the scheduler to still schedule it while prioritizing nodes that minimize the skew.
@ -232,11 +240,6 @@ your cluster. Those fields are:
See [Label Selectors](/docs/concepts/overview/working-with-objects/labels/#label-selectors)
for more details.
-->
- **topologyKey** 是[节点标签](#node-labels)的键。如果节点使用此键标记并且具有相同的标签值,
则将这些节点视为处于同一拓扑域中。我们将拓扑域中(即键值对)的每个实例称为一个域。
调度器将尝试在每个拓扑域中放置数量均衡的 Pod。
另外,我们将符合条件的域定义为其节点满足 nodeAffinityPolicy 和 nodeTaintsPolicy 要求的域。
- **whenUnsatisfiable** 指示如果 Pod 不满足分布约束时如何处理:
- `DoNotSchedule`(默认)告诉调度器不要调度。
- `ScheduleAnyway` 告诉调度器仍然继续调度,只是根据如何能将偏差最小化来对节点进行排序。
@ -434,12 +437,6 @@ Usually, if you are using a workload controller such as a Deployment, the pod te
takes care of this for you. If you mix different spread constraints then Kubernetes
follows the API definition of the field; however, the behavior is more likely to become
confusing and troubleshooting is less straightforward.
You need a mechanism to ensure that all the nodes in a topology domain (such as a
cloud provider region) are labelled consistently.
To avoid you needing to manually label nodes, most clusters automatically
populate well-known labels such as `kubernetes.io/hostname`. Check whether
your cluster supports this.
-->
## 一致性 {#Consistency}
@ -449,6 +446,13 @@ your cluster supports this.
如果你混合不同的分布约束,则 Kubernetes 会遵循该字段的 API 定义;
但是,该行为可能更令人困惑,并且故障排除也没那么简单。
<!--
You need a mechanism to ensure that all the nodes in a topology domain (such as a
cloud provider region) are labelled consistently.
To avoid you needing to manually label nodes, most clusters automatically
populate well-known labels such as `kubernetes.io/hostname`. Check whether
your cluster supports this.
-->
你需要一种机制来确保拓扑域(例如云提供商区域)中的所有节点具有一致的标签。
为了避免你需要手动为节点打标签,大多数集群会自动填充知名的标签,
例如 `kubernetes.io/hostname`。检查你的集群是否支持此功能。
@ -822,7 +826,7 @@ An example configuration might look like follows:
配置的示例可能看起来像下面这个样子:
```yaml
apiVersion: kubescheduler.config.k8s.io/v1beta3
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
@ -894,7 +898,7 @@ empty `defaultConstraints` in the `PodTopologySpread` plugin configuration:
并将 `PodTopologySpread` 插件配置中的 `defaultConstraints` 参数置空来禁用默认 Pod 分布约束:
```yaml
apiVersion: kubescheduler.config.k8s.io/v1beta3
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:

View File

@ -27,7 +27,7 @@ namespace.
本页解释了在 Kubernetes Pod 中如何使用用户命名空间。
用户命名空间将容器内运行的用户与主机中的用户隔离开来。
在容器中以 root 身份运行的进程可以在主机中以不同的(非 root用户身份运行
在容器中以 Root 身份运行的进程可以在主机中以不同的(非 Root用户身份运行
换句话说,该进程在用户命名空间内的操作具有完全的权限,
但在命名空间外的操作是无特权的。
@ -39,13 +39,14 @@ exploitable when user namespaces is active. It is expected user namespace will
mitigate some future vulnerabilities too.
-->
你可以使用这个功能来减少被破坏的容器对主机或同一节点中的其他 Pod 的破坏。
有[几个安全漏洞][KEP-vulns]被评为 **高** 或 **重要**
有[几个安全漏洞][KEP-vulns]被评为 **HIGH** 或 **重要CRITICAL**
当用户命名空间处于激活状态时,这些漏洞是无法被利用的。
预计用户命名空间也会减轻一些未来的漏洞。
[KEP-vulns]: https://github.com/kubernetes/enhancements/tree/217d790720c5aef09b8bd4d6ca96284a0affe6c2/keps/sig-node/127-user-namespaces#motivation
<!-- body -->
## {{% heading "prerequisites" %}}
{{% thirdparty-content %}}
@ -57,14 +58,6 @@ the filesystems used. This means:
* On the node, the filesystem you use for `/var/lib/kubelet/pods/`, or the
custom directory you configure for this, needs idmap mount support.
* All the filesystems used in the pod's volumes must support idmap mounts.
In practice this means you need at least Linux 6.3, as tmpfs started supporting
idmap mounts in that version. This is usually needed as several Kubernetes
features use tmpfs (the service account token that is mounted by default uses a
tmpfs, Secrets use a tmpfs, etc.)
Some popular filesystems that support idmap mounts in Linux 6.3 are: btrfs,
ext4, xfs, fat, tmpfs, overlayfs.
-->
这是一个只对 Linux 有效的功能特性,且需要 Linux 支持在所用文件系统上挂载 idmap。
这意味着:
@ -73,6 +66,15 @@ ext4, xfs, fat, tmpfs, overlayfs.
需要支持 idmap 挂载。
* Pod 卷中使用的所有文件系统都必须支持 idmap 挂载。
<!--
In practice this means you need at least Linux 6.3, as tmpfs started supporting
idmap mounts in that version. This is usually needed as several Kubernetes
features use tmpfs (the service account token that is mounted by default uses a
tmpfs, Secrets use a tmpfs, etc.)
Some popular filesystems that support idmap mounts in Linux 6.3 are: btrfs,
ext4, xfs, fat, tmpfs, overlayfs.
-->
在实践中,这意味着你最低需要 Linux 6.3,因为 tmpfs 在该版本中开始支持 idmap 挂载。
这通常是需要的,因为有几个 Kubernetes 功能特性使用 tmpfs
(默认情况下挂载的服务账号令牌使用 tmpfs、Secret 使用 tmpfs 等等)。
@ -116,14 +118,14 @@ To use user namespaces with Kubernetes, you also need to use a CRI
此外,需要在{{< glossary_tooltip text="容器运行时" term_id="container-runtime" >}}提供支持,
才能在 Kubernetes Pod 中使用这一功能:
* containerd2.0(及更高版本支持容器使用用户命名空间。
* containerd2.0(及更高版本支持容器使用用户命名空间。
* CRI-O1.25(及更高)版本支持配置容器的用户命名空间。
<!--
You can see the status of user namespaces support in cri-dockerd tracked in an [issue][CRI-dockerd-issue]
on GitHub.
-->
你可以在 GitHub 上的 [issue][CRI-dockerd-issue] 中查看 cri-dockerd
你可以在 GitHub 上的 [Issue][CRI-dockerd-issue] 中查看 cri-dockerd
中用户命名空间支持的状态。
<!--
@ -195,7 +197,7 @@ if user namespaces is activated.
通常是 65534配置在 `/proc/sys/kernel/overflowuid和/proc/sys/kernel/overflowgid`)。
然而,即使以 65534 用户/组的身份运行,也不可能修改这些文件。
大多数需要以 root 身份运行但不访问其他主机命名空间或资源的应用程序,
大多数需要以 Root 身份运行但不访问其他主机命名空间或资源的应用程序,
在用户命名空间被启用时,应该可以继续正常运行,不需要做任何改变。
<!--
@ -236,8 +238,8 @@ This abstraction limits what can happen, for example, if the container manages
to escape to the host. Given that the container is running as a non-privileged
user on the host, it is limited what it can do to the host.
-->
这意味着容器可以以 root 身份运行,并将该身份映射到主机上的一个非 root 用户。
在容器内,进程会认为它是以 root 身份运行的(因此像 `apt`、`yum` 等工具可以正常工作),
这意味着容器可以以 Root 身份运行,并将该身份映射到主机上的一个非 Root 用户。
在容器内,进程会认为它是以 Root 身份运行的(因此像 `apt`、`yum` 等工具可以正常工作),
而实际上该进程在主机上没有权限。
你可以验证这一点,例如,如果你从主机上执行 `ps aux` 来检查容器进程是以哪个用户运行的。
`ps` 显示的用户与你在容器内执行 `id` 命令时看到的用户是不一样的。
@ -274,7 +276,7 @@ this is true when we use user namespaces.
If you want to know more details about what changes when user namespaces are in
use, see `man 7 user_namespaces`.
-->
在不使用用户命名空间的情况下,以 root 账号运行的容器,在容器逃逸时,在节点上有 root 权限。
在不使用用户命名空间的情况下,以 Root 账号运行的容器,在容器逃逸时,在节点上有 Root 权限。
而且如果某些权能被授予了某容器,这些权能在宿主机上也是有效的。
当我们使用用户命名空间时,这些都不再成立。
@ -435,7 +437,7 @@ privileged user on the host. Here's the list of fields that are **not** checks f
circumstances:
-->
如果你启用相关特性门控并创建了使用用户命名空间的 Pod以下的字段不会被限制
即使在执行了 _Baseline__Restricted_ Pod 安全性标准的上下文中。这种行为不会带来安全问题,
即使在执行了 **Baseline****Restricted** Pod 安全性标准的上下文中。这种行为不会带来安全问题,
因为带有用户命名空间的 Pod 内的 `root` 实际上指的是容器内的用户,绝不会映射到主机上的特权用户。
以下是在这种情况下**不进行**检查的 Pod 字段列表: