[zh] Sync a task: reserve-compute-resources.md

pull/46670/head
windsonsea 2024-06-05 09:41:53 +08:00
parent 13d8fb8af6
commit 39bd5c2ffd
1 changed files with 103 additions and 103 deletions

View File

@ -1,7 +1,6 @@
--- ---
title: 为系统守护进程预留计算资源 title: 为系统守护进程预留计算资源
content_type: task content_type: task
min-kubernetes-server-version: 1.8
weight: 290 weight: 290
--- ---
<!-- <!--
@ -11,7 +10,6 @@ reviewers:
- dashpole - dashpole
title: Reserve Compute Resources for System Daemons title: Reserve Compute Resources for System Daemons
content_type: task content_type: task
min-kubernetes-server-version: 1.8
weight: 290 weight: 290
--> -->
@ -38,14 +36,14 @@ Kubernetes 推荐集群管理员按照每个节点上的工作负载密度配置
## {{% heading "prerequisites" %}} ## {{% heading "prerequisites" %}}
{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}} {{< include "task-tutorial-prereqs.md" >}}
<!-- <!--
Your Kubernetes server must be at or later than version 1.17 to use You can configure below kubelet [configuration settings](/docs/reference/config-api/kubelet-config.v1beta1/)
the kubelet command line option `--reserved-cpus` to set an using the [kubelet configuration file](/docs/tasks/administer-cluster/kubelet-config-file/).
[explicitly reserved CPU list](#explicitly-reserved-cpu-list).
--> -->
的 kubernetes 服务器版本必须至少是 1.17 版本,才能使用 kubelet 可以使用 [kubelet 配置文件](/zh-cn/docs/tasks/administer-cluster/kubelet-config-file/)来配置以下
命令行选项 `--reserved-cpus` 设置[显式预留 CPU 列表](#explicitly-reserved-cpu-list)。 kubelet [设置](/zh-cn/docs/reference/config-api/kubelet-config.v1beta1/)。
<!-- steps --> <!-- steps -->
@ -80,22 +78,21 @@ Kubernetes 节点上的 'Allocatable' 被定义为 Pod 可用计算资源量。
### Enabling QoS and Pod level cgroups ### Enabling QoS and Pod level cgroups
To properly enforce node allocatable constraints on the node, you must To properly enforce node allocatable constraints on the node, you must
enable the new cgroup hierarchy via the `--cgroups-per-qos` flag. This flag is enable the new cgroup hierarchy via the `cgroupsPerQOS` setting. This setting is
enabled by default. When enabled, the `kubelet` will parent all end-user pods enabled by default. When enabled, the `kubelet` will parent all end-user pods
under a cgroup hierarchy managed by the `kubelet`. under a cgroup hierarchy managed by the `kubelet`.
--> -->
### 启用 QoS 和 Pod 级别的 cgroups {#enabling-qos-and-pod-level-cgroups} ### 启用 QoS 和 Pod 级别的 cgroups {#enabling-qos-and-pod-level-cgroups}
为了恰当地在节点范围实施节点可分配约束,你必须通过 `--cgroups-per-qos` 为了恰当地在节点范围实施节点可分配约束,你必须通过 `cgroupsPerQOS`
标志启用新的 cgroup 层次结构。这个标志是默认启用的。 设置启用新的 cgroup 层次结构。这个设置是默认启用的。
启用后,`kubelet` 将在其管理的 cgroup 层次结构中创建所有终端用户的 Pod。 启用后,`kubelet` 将在其管理的 cgroup 层次结构中创建所有终端用户的 Pod。
<!-- <!--
### Configuring a cgroup driver ### Configuring a cgroup driver
The `kubelet` supports manipulation of the cgroup hierarchy on The `kubelet` supports manipulation of the cgroup hierarchy on
the host using a cgroup driver. The driver is configured via the the host using a cgroup driver. The driver is configured via the `cgroupDriver` setting.
`--cgroup-driver` flag.
The supported values are the following: The supported values are the following:
@ -113,7 +110,7 @@ be configured to use the `systemd` cgroup driver.
### 配置 cgroup 驱动 {#configuring-a-cgroup-driver} ### 配置 cgroup 驱动 {#configuring-a-cgroup-driver}
`kubelet` 支持在主机上使用 cgroup 驱动操作 cgroup 层次结构。 `kubelet` 支持在主机上使用 cgroup 驱动操作 cgroup 层次结构。
该驱动通过 `--cgroup-driver` 标志进行配置。 该驱动通过 `cgroupDriver` 设置进行配置。
支持的参数值如下: 支持的参数值如下:
@ -129,13 +126,13 @@ be configured to use the `systemd` cgroup driver.
<!-- <!--
### Kube Reserved ### Kube Reserved
- **Kubelet Flag**: `--kube-reserved=[cpu=100m][,][memory=100Mi][,][ephemeral-storage=1Gi][,][pid=1000]` - **KubeletConfiguration Setting**: `kubeReserved: {}`. Example value `{cpu: 100m, memory: 100Mi, ephemeral-storage: 1Gi, pid=1000}`
- **Kubelet Flag**: `--kube-reserved-cgroup=` - **KubeletConfiguration Setting**: `kubeReservedCgroup: ""`
`kube-reserved` is meant to capture resource reservation for kubernetes system `kubeReserved` is meant to capture resource reservation for kubernetes system
daemons like the `kubelet`, `container runtime`, `node problem detector`, etc. daemons like the `kubelet`, `container runtime`, etc.
It is not meant to reserve resources for system daemons that are run as pods. It is not meant to reserve resources for system daemons that are run as pods.
`kube-reserved` is typically a function of `pod density` on the nodes. `kubeReserved` is typically a function of `pod density` on the nodes.
In addition to `cpu`, `memory`, and `ephemeral-storage`, `pid` may be In addition to `cpu`, `memory`, and `ephemeral-storage`, `pid` may be
specified to reserve the specified number of process IDs for specified to reserve the specified number of process IDs for
@ -143,21 +140,22 @@ kubernetes system daemons.
--> -->
### Kube 预留值 {#kube-reserved} ### Kube 预留值 {#kube-reserved}
- **Kubelet 标志**`--kube-reserved=[cpu=100m][,][memory=100Mi][,][ephemeral-storage=1Gi][,][pid=1000]` - **KubeletConfiguration 设置**`kubeReserved: {}`。
- **Kubelet 标志**`--kube-reserved-cgroup=` 示例值 `{cpu: 100m, memory: 100Mi, ephemeral-storage: 1Gi, pid=1000}`
- **KubeletConfiguration 设置**`kubeReservedCgroup: ""`
`kube-reserved` 用来给诸如 `kubelet`、容器运行时、节点问题监测器 `kubeReserved` 用来给诸如 `kubelet`、容器运行时等
Kubernetes 系统守护进程记述其资源预留值。 Kubernetes 系统守护进程记述其资源预留值。
该配置并非用来给以 Pod 形式运行的系统守护进程预留资源。`kube-reserved` 该配置并非用来给以 Pod 形式运行的系统守护进程预留资源。`kubeReserved`
通常是节点上 `Pod 密度` 的函数。 通常是节点上 `Pod 密度` 的函数。
除了 `cpu`、`内存` 和 `ephemeral-storage` 之外,`pid` 可用来指定为 除了 `cpu`、`内存` 和 `ephemeral-storage` 之外,`pid` 可用来指定为
Kubernetes 系统守护进程预留指定数量的进程 ID。 Kubernetes 系统守护进程预留指定数量的进程 ID。
<!-- <!--
To optionally enforce `kube-reserved` on kubernetes system daemons, specify the parent To optionally enforce `kubeReserved` on kubernetes system daemons, specify the parent
control group for kube daemons as the value for `--kube-reserved-cgroup` kubelet control group for kube daemons as the value for `kubeReservedCgroup` setting,
flag. and [add `kube-reserved` to `enforceNodeAllocatable`](#enforcing-node-allocatable).
It is recommended that the kubernetes system daemons are placed under a top It is recommended that the kubernetes system daemons are placed under a top
level control group (`runtime.slice` on systemd machines for example). Each level control group (`runtime.slice` on systemd machines for example). Each
@ -165,41 +163,42 @@ system daemon should ideally run within its own child control group. Refer to
[the design proposal](https://git.k8s.io/design-proposals-archive/node/node-allocatable.md#recommended-cgroups-setup) [the design proposal](https://git.k8s.io/design-proposals-archive/node/node-allocatable.md#recommended-cgroups-setup)
for more details on recommended control group hierarchy. for more details on recommended control group hierarchy.
--> -->
要选择性地对 Kubernetes 系统守护进程上执行 `kube-reserved` 保护,需要把 kubelet 的 要选择性地对 Kubernetes 系统守护进程上执行 `kubeReserved` 保护,需要把 kubelet 的
`--kube-reserved-cgroup` 标志的值设置为 kube 守护进程的父控制组。 `kubeReservedCgroup` 设置的值设为 kube 守护进程的父控制组,
并[将 `kube-reserved` 添加到 `enforceNodeAllocatable`](#enforcing-node-allocatable)。
推荐将 Kubernetes 系统守护进程放置于顶级控制组之下(例如 systemd 机器上的 推荐将 Kubernetes 系统守护进程放置于顶级控制组之下(例如 systemd 机器上的 `runtime.slice`)。
`runtime.slice`)。
理想情况下每个系统守护进程都应该在其自己的子控制组中运行。 理想情况下每个系统守护进程都应该在其自己的子控制组中运行。
请参考[这个设计方案](https://git.k8s.io/design-proposals-archive/node/node-allocatable.md#recommended-cgroups-setup) 请参考[这个设计方案](https://git.k8s.io/design-proposals-archive/node/node-allocatable.md#recommended-cgroups-setup)
进一步了解关于推荐控制组层次结构的细节。 进一步了解关于推荐控制组层次结构的细节。
<!-- <!--
Note that Kubelet **does not** create `--kube-reserved-cgroup` if it doesn't Note that Kubelet **does not** create `kubeReservedCgroup` if it doesn't
exist. The kubelet will fail to start if an invalid cgroup is specified. With `systemd` exist. The kubelet will fail to start if an invalid cgroup is specified. With `systemd`
cgroup driver, you should follow a specific pattern for the name of the cgroup you cgroup driver, you should follow a specific pattern for the name of the cgroup you
define: the name should be the value you set for `--kube-reserved-cgroup`, define: the name should be the value you set for `kubeReservedCgroup`,
with `.slice` appended. with `.slice` appended.
--> -->
请注意,如果 `--kube-reserved-cgroup` 不存在Kubelet 将 **不会** 创建它。 请注意,如果 `kubeReservedCgroup` 不存在Kubelet 将 **不会** 创建它。
如果指定了一个无效的 cgroupKubelet 将会无法启动。就 `systemd` cgroup 驱动而言, 如果指定了一个无效的 cgroupKubelet 将会无法启动。就 `systemd` cgroup 驱动而言,
你要为所定义的 cgroup 设置名称时要遵循特定的模式: 你要为所定义的 cgroup 设置名称时要遵循特定的模式:
所设置的名字应该是你为 `--kube-reserved-cgroup` 所给的参数值加上 `.slice` 后缀。 所设置的名字应该是你为 `kubeReservedCgroup` 所给的参数值加上 `.slice` 后缀。
<!-- <!--
### System Reserved ### System Reserved
- **Kubelet Flag**: `--system-reserved=[cpu=100m][,][memory=100Mi][,][ephemeral-storage=1Gi][,][pid=1000]` - **KubeletConfiguration Setting**: `systemReserved: {}`. Example value `{cpu: 100m, memory: 100Mi, ephemeral-storage: 1Gi, pid=1000}`
- **Kubelet Flag**: `--system-reserved-cgroup=` - **KubeletConfiguration Setting**: `systemReservedCgroup: ""`
--> -->
### 系统预留值 {#system-reserved} ### 系统预留值 {#system-reserved}
- **Kubelet 标志**`--system-reserved=[cpu=100m][,][memory=100Mi][,][ephemeral-storage=1Gi][,][pid=1000]` - **KubeletConfiguration 设置**`systemReserved: {}`。
- **Kubelet 标志**`--system-reserved-cgroup=` 示例值 `{cpu: 100m, memory: 100Mi, ephemeral-storage: 1Gi, pid=1000}`
- **KubeletConfiguration 设置**`systemReservedCgroup: ""`
<!-- <!--
`system-reserved` is meant to capture resource reservation for OS system daemons `systemReserved` is meant to capture resource reservation for OS system daemons
like `sshd`, `udev`, etc. `system-reserved` should reserve `memory` for the like `sshd`, `udev`, etc. `systemReserved` should reserve `memory` for the
`kernel` too since `kernel` memory is not accounted to pods in Kubernetes at this time. `kernel` too since `kernel` memory is not accounted to pods in Kubernetes at this time.
Reserving resources for user login sessions is also recommended (`user.slice` in Reserving resources for user login sessions is also recommended (`user.slice` in
systemd world). systemd world).
@ -208,8 +207,8 @@ In addition to `cpu`, `memory`, and `ephemeral-storage`, `pid` may be
specified to reserve the specified number of process IDs for OS system specified to reserve the specified number of process IDs for OS system
daemons. daemons.
--> -->
`system-reserved` 用于为诸如 `sshd`、`udev` 等系统守护进程记述其资源预留值。 `systemReserved` 用于为诸如 `sshd`、`udev` 等系统守护进程记述其资源预留值。
`system-reserved` 也应该为 `kernel` 预留 `内存`,因为目前 `kernel` `systemReserved` 也应该为 `kernel` 预留 `内存`,因为目前 `kernel`
使用的内存并不记在 Kubernetes 的 Pod 上。 使用的内存并不记在 Kubernetes 的 Pod 上。
同时还推荐为用户登录会话预留资源systemd 体系中的 `user.slice`)。 同时还推荐为用户登录会话预留资源systemd 体系中的 `user.slice`)。
@ -217,30 +216,31 @@ daemons.
Kubernetes 系统守护进程预留指定数量的进程 ID。 Kubernetes 系统守护进程预留指定数量的进程 ID。
<!-- <!--
To optionally enforce `system-reserved` on system daemons, specify the parent To optionally enforce `systemReserved` on system daemons, specify the parent
control group for OS system daemons as the value for `--system-reserved-cgroup` control group for OS system daemons as the value for `systemReservedCgroup` setting,
kubelet flag. and [add `system-reserved` to `enforceNodeAllocatable`](#enforcing-node-allocatable).
It is recommended that the OS system daemons are placed under a top level It is recommended that the OS system daemons are placed under a top level
control group (`system.slice` on systemd machines for example). control group (`system.slice` on systemd machines for example).
--> -->
要想为系统守护进程上可选地实施 `system-reserved` 约束,请指定 kubelet 的 要想为系统守护进程上可选地实施 `systemReserved` 约束,请指定 kubelet 的
`--system-reserved-cgroup` 标志值为 OS 系统守护进程的父级控制组。 `systemReservedCgroup` 设置值为 OS 系统守护进程的父级控制组,
并[将 `system-reserved` 添加到 `enforceNodeAllocatable`](#enforcing-node-allocatable)。
推荐将 OS 系统守护进程放在一个顶级控制组之下(例如 systemd 机器上的 推荐将 OS 系统守护进程放在一个顶级控制组之下(例如 systemd 机器上的
`system.slice`)。 `system.slice`)。
<!-- <!--
Note that `kubelet` **does not** create `--system-reserved-cgroup` if it doesn't Note that `kubelet` **does not** create `systemReservedCgroup` if it doesn't
exist. `kubelet` will fail if an invalid cgroup is specified. With `systemd` exist. `kubelet` will fail if an invalid cgroup is specified. With `systemd`
cgroup driver, you should follow a specific pattern for the name of the cgroup you cgroup driver, you should follow a specific pattern for the name of the cgroup you
define: the name should be the value you set for `--system-reserved-cgroup`, define: the name should be the value you set for `systemReservedCgroup`,
with `.slice` appended. with `.slice` appended.
--> -->
请注意,如果 `--system-reserved-cgroup` 不存在,`kubelet` **不会** 创建它。 请注意,如果 `systemReservedCgroup` 不存在,`kubelet` **不会** 创建它。
如果指定了无效的 cgroup`kubelet` 将会失败。就 `systemd` cgroup 驱动而言, 如果指定了无效的 cgroup`kubelet` 将会失败。就 `systemd` cgroup 驱动而言,
你在指定 cgroup 名字时要遵循特定的模式: 你在指定 cgroup 名字时要遵循特定的模式:
该名字应该是你为 `--system-reserved-cgroup` 参数所设置的值加上 `.slice` 后缀。 该名字应该是你为 `systemReservedCgroup` 参数所设置的值加上 `.slice` 后缀。
<!-- <!--
### Explicitly Reserved CPU List ### Explicitly Reserved CPU List
@ -250,26 +250,24 @@ with `.slice` appended.
{{< feature-state for_k8s_version="v1.17" state="stable" >}} {{< feature-state for_k8s_version="v1.17" state="stable" >}}
<!-- <!--
**Kubelet Flag**: `--reserved-cpus=0-3` **KubeletConfiguration Setting**: `reservedSystemCPUs:`. Example value `0-3`
**KubeletConfiguration Flag**: `reservedSystemCPUs: 0-3`
--> -->
**Kubelet 标志**: `--reserved-cpus=0-3` **KubeletConfiguration 设置**`reservedSystemCPUs:`。示例值 `0-3`
**KubeletConfiguration 标志**`reservedSystemCPUs: 0-3`
<!-- <!--
`reserved-cpus` is meant to define an explicit CPU set for OS system daemons and `reservedSystemCPUs` is meant to define an explicit CPU set for OS system daemons and
kubernetes system daemons. `reserved-cpus` is for systems that do not intend to kubernetes system daemons. `reservedSystemCPUs` is for systems that do not intend to
define separate top level cgroups for OS system daemons and kubernetes system daemons define separate top level cgroups for OS system daemons and kubernetes system daemons
with regard to cpuset resource. with regard to cpuset resource.
If the Kubelet **does not** have `--system-reserved-cgroup` and `--kube-reserved-cgroup`, If the Kubelet **does not** have `kubeReservedCgroup` and `systemReservedCgroup`,
the explicit cpuset provided by `reserved-cpus` will take precedence over the CPUs the explicit cpuset provided by `reservedSystemCPUs` will take precedence over the CPUs
defined by `--kube-reserved` and `--system-reserved` options. defined by `kubeReservedCgroup` and `systemReservedCgroup` options.
--> -->
`reserved-cpus` 旨在为操作系统守护程序和 Kubernetes 系统守护程序预留一组明确指定编号的 CPU。 `reservedSystemCPUs` 旨在为操作系统守护程序和 Kubernetes 系统守护程序预留一组明确指定编号的 CPU。
`reserved-cpus` 适用于不打算针对 cpuset 资源为操作系统守护程序和 Kubernetes `reservedSystemCPUs` 适用于不打算针对 cpuset 资源为操作系统守护程序和 Kubernetes
系统守护程序定义独立的顶级 cgroups 的系统。 系统守护程序定义独立的顶级 cgroups 的系统。
如果 Kubelet **没有** 指定参数 `--system-reserved-cgroup` 和 `--kube-reserved-cgroup` 如果 Kubelet **没有** 指定参数 `kubeReservedCgroup` 和 `systemReservedCgroup`
`reserved-cpus` 的设置将优先于 `--kube-reserved``--system-reserved` 选项。 `reservedSystemCPUs` 的设置将优先于 `kubeReservedCgroup``systemReservedCgroup` 选项。
<!-- <!--
This option is specifically designed for Telco/NFV use cases where uncontrolled This option is specifically designed for Telco/NFV use cases where uncontrolled
@ -291,7 +289,7 @@ cpuset 上,应使用 Kubernetes 之外的其他机制。
<!-- <!--
### Eviction Thresholds ### Eviction Thresholds
**Kubelet Flag**: `--eviction-hard=[memory.available<500Mi]` **KubeletConfiguration Setting**: `evictionHard: {memory.available: "100Mi", nodefs.available: "10%", nodefs.inodesFree: "5%", imagefs.available: "15%"}`. Example value: `{memory.available: "<500Mi"}`
Memory pressure at the node level leads to System OOMs which affects the entire Memory pressure at the node level leads to System OOMs which affects the entire
node and all pods running on it. Nodes can go offline temporarily until memory node and all pods running on it. Nodes can go offline temporarily until memory
@ -299,7 +297,7 @@ has been reclaimed. To avoid (or reduce the probability of) system OOMs kubelet
provides [out of resource](/docs/concepts/scheduling-eviction/node-pressure-eviction/) provides [out of resource](/docs/concepts/scheduling-eviction/node-pressure-eviction/)
management. Evictions are management. Evictions are
supported for `memory` and `ephemeral-storage` only. By reserving some memory via supported for `memory` and `ephemeral-storage` only. By reserving some memory via
`--eviction-hard` flag, the `kubelet` attempts to evict pods whenever memory `evictionHard` setting, the `kubelet` attempts to evict pods whenever memory
availability on the node drops below the reserved value. Hypothetically, if availability on the node drops below the reserved value. Hypothetically, if
system daemons did not exist on a node, pods cannot use more than `capacity - system daemons did not exist on a node, pods cannot use more than `capacity -
eviction-hard`. For this reason, resources reserved for evictions are not eviction-hard`. For this reason, resources reserved for evictions are not
@ -307,13 +305,15 @@ available for pods.
--> -->
### 驱逐阈值 {#eviction-Thresholds} ### 驱逐阈值 {#eviction-Thresholds}
**Kubelet 标志**`--eviction-hard=[memory.available<500Mi]` **KubeletConfiguration 设置**
`evictionHard: {memory.available: "100Mi", nodefs.available: "10%", nodefs.inodesFree: "5%", imagefs.available: "15%"}`
示例值: `{memory.available: "<500Mi"}`
节点级别的内存压力将导致系统内存不足,这将影响到整个节点及其上运行的所有 Pod。 节点级别的内存压力将导致系统内存不足,这将影响到整个节点及其上运行的所有 Pod。
节点可以暂时离线直到内存已经回收为止。为了防止系统内存不足(或减少系统内存不足的可能性), 节点可以暂时离线直到内存已经回收为止。为了防止系统内存不足(或减少系统内存不足的可能性),
kubelet 提供了[资源不足](/zh-cn/docs/concepts/scheduling-eviction/node-pressure-eviction/)管理。 kubelet 提供了[资源不足](/zh-cn/docs/concepts/scheduling-eviction/node-pressure-eviction/)管理。
驱逐操作只支持 `memory``ephemeral-storage` 驱逐操作只支持 `memory``ephemeral-storage`
通过 `--eviction-hard` 标志预留一些内存后,当节点上的可用内存降至预留值以下时, 通过 `evictionHard` 设置预留一些内存后,当节点上的可用内存降至预留值以下时,
`kubelet` 将尝试驱逐 Pod。 `kubelet` 将尝试驱逐 Pod。
如果节点上不存在系统守护进程Pod 将不能使用超过 `capacity-eviction-hard` 所指定的资源量。 如果节点上不存在系统守护进程Pod 将不能使用超过 `capacity-eviction-hard` 所指定的资源量。
因此,为驱逐而预留的资源对 Pod 是不可用的。 因此,为驱逐而预留的资源对 Pod 是不可用的。
@ -321,13 +321,14 @@ kubelet 提供了[资源不足](/zh-cn/docs/concepts/scheduling-eviction/node-pr
<!-- <!--
### Enforcing Node Allocatable ### Enforcing Node Allocatable
**Kubelet Flag**: `--enforce-node-allocatable=pods[,][system-reserved][,][kube-reserved]` **KubeletConfiguration setting**: `enforceNodeAllocatable: [pods]`. Example value: `[pods,system-reserved,kube-reserved]`
The scheduler treats 'Allocatable' as the available `capacity` for pods. The scheduler treats 'Allocatable' as the available `capacity` for pods.
--> -->
### 实施节点可分配约束 {#enforcing-node-allocatable} ### 实施节点可分配约束 {#enforcing-node-allocatable}
**Kubelet 标志**`--enforce-node-allocatable=pods[,][system-reserved][,][kube-reserved]` **KubeletConfiguration 设置**`enforceNodeAllocatable: [pods]`。
示例值:`[pods,system-reserved,kube-reserved]`
调度器将 'Allocatable' 视为 Pod 可用的 `capacity`(资源容量)。 调度器将 'Allocatable' 视为 Pod 可用的 `capacity`(资源容量)。
@ -337,35 +338,35 @@ by evicting pods whenever the overall usage across all pods exceeds
'Allocatable'. More details on eviction policy can be found 'Allocatable'. More details on eviction policy can be found
on the [node pressure eviction](/docs/concepts/scheduling-eviction/node-pressure-eviction/) on the [node pressure eviction](/docs/concepts/scheduling-eviction/node-pressure-eviction/)
page. This enforcement is controlled by page. This enforcement is controlled by
specifying `pods` value to the kubelet flag `--enforce-node-allocatable`. specifying `pods` value to the KubeletConfiguration setting `enforceNodeAllocatable`.
--> -->
`kubelet` 默认对 Pod 执行 'Allocatable' 约束。 `kubelet` 默认对 Pod 执行 'Allocatable' 约束。
无论何时,如果所有 Pod 的总用量超过了 'Allocatable',驱逐 Pod 的措施将被执行。 无论何时,如果所有 Pod 的总用量超过了 'Allocatable',驱逐 Pod 的措施将被执行。
有关驱逐策略的更多细节可以在[节点压力驱逐](/zh-cn/docs/concepts/scheduling-eviction/pod-priority-preemption/)页找到。 有关驱逐策略的更多细节可以在[节点压力驱逐](/zh-cn/docs/concepts/scheduling-eviction/pod-priority-preemption/)页找到。
通过设置 kubelet `--enforce-node-allocatable` 标志值为 `pods` 控制这个措施。 将 KubeletConfiguration `enforceNodeAllocatable` 设置为 `pods` 值来控制这个措施。
<!-- <!--
Optionally, `kubelet` can be made to enforce `kube-reserved` and Optionally, `kubelet` can be made to enforce `kubeReserved` and
`system-reserved` by specifying `kube-reserved` & `system-reserved` values in `systemReserved` by specifying `kube-reserved` & `system-reserved` values in
the same flag. Note that to enforce `kube-reserved` or `system-reserved`, the same setting. Note that to enforce `kubeReserved` or `systemReserved`,
`--kube-reserved-cgroup` or `--system-reserved-cgroup` needs to be specified `kubeReservedCgroup` or `systemReservedCgroup` needs to be specified
respectively. respectively.
--> -->
可选地,通过在同一标志中同时指定 `kube-reserved``system-reserved` 值, 可选地,通过在同一设置中同时指定 `kube-reserved``system-reserved` 值,
可以使 `kubelet` 强制实施 `kube-reserved` 和 `system-reserved` 约束。 可以使 `kubelet` 强制实施 `kubeReserved` 和 `systemReserved` 约束。
请注意,要想执行 `kube-reserved` 或者 `system-reserved` 约束, 请注意,要想执行 `kubeReserved` 或者 `systemReserved` 约束,
需要对应设置 `--kube-reserved-cgroup` 或者 `--system-reserved-cgroup`。 需要对应设置 `kubeReservedCgroup` 或者 `systemReservedCgroup`。
<!-- <!--
## General Guidelines ## General Guidelines
System daemons are expected to be treated similar to System daemons are expected to be treated similar to
[Guaranteed pods](/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-guaranteed). [Guaranteed pods](/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-guaranteed).
System daemons can burst within their bounding control groups and this behavior needs System daemons can burst within their bounding control groups and this behavior needs
to be managed as part of kubernetes deployments. For example, `kubelet` should to be managed as part of kubernetes deployments. For example, `kubelet` should
have its own control group and share `kube-reserved` resources with the have its own control group and share `kubeReserved` resources with the
container runtime. However, Kubelet cannot burst and use up all available Node container runtime. However, Kubelet cannot burst and use up all available Node
resources if `kube-reserved` is enforced. resources if `kubeReserved` is enforced.
--> -->
## 一般原则 {#general-guidelines} ## 一般原则 {#general-guidelines}
@ -373,32 +374,32 @@ resources if `kube-reserved` is enforced.
[Guaranteed 的 Pod](/zh-cn/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-guaranteed) [Guaranteed 的 Pod](/zh-cn/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-guaranteed)
一样对待。 一样对待。
系统守护进程可以在与其对应的控制组中出现突发资源用量,这一行为要作为 Kubernetes 部署的一部分进行管理。 系统守护进程可以在与其对应的控制组中出现突发资源用量,这一行为要作为 Kubernetes 部署的一部分进行管理。
例如,`kubelet` 应该有它自己的控制组并和容器运行时共享 `kube-reserved` 资源。 例如,`kubelet` 应该有它自己的控制组并和容器运行时共享 `kubeReserved` 资源。
不过,如果执行了 `kube-reserved` 约束,则 kubelet 不可出现突发负载并用光节点的所有可用资源。 不过,如果执行了 `kubeReserved` 约束,则 kubelet 不可出现突发负载并用光节点的所有可用资源。
<!-- <!--
Be extra careful while enforcing `system-reserved` reservation since it can lead Be extra careful while enforcing `systemReserved` reservation since it can lead
to critical system services being CPU starved, OOM killed, or unable to critical system services being CPU starved, OOM killed, or unable
to fork on the node. The to fork on the node. The
recommendation is to enforce `system-reserved` only if a user has profiled their recommendation is to enforce `systemReserved` only if a user has profiled their
nodes exhaustively to come up with precise estimates and is confident in their nodes exhaustively to come up with precise estimates and is confident in their
ability to recover if any process in that group is oom-killed. ability to recover if any process in that group is oom-killed.
--> -->
在执行 `system-reserved` 预留策略时请加倍小心,因为它可能导致节点上的关键系统服务出现 CPU 资源短缺、 在执行 `systemReserved` 预留策略时请加倍小心,因为它可能导致节点上的关键系统服务出现 CPU 资源短缺、
因为内存不足而被终止或者无法在节点上创建进程。 因为内存不足而被终止或者无法在节点上创建进程。
建议只有当用户详尽地描述了他们的节点以得出精确的估计值, 建议只有当用户详尽地描述了他们的节点以得出精确的估计值,
并且对该组中进程因内存不足而被杀死时,有足够的信心将其恢复时, 并且对该组中进程因内存不足而被杀死时,有足够的信心将其恢复时,
才可以强制执行 `system-reserved` 策略。 才可以强制执行 `systemReserved` 策略。
<!-- <!--
* To begin with enforce 'Allocatable' on `pods`. * To begin with enforce 'Allocatable' on `pods`.
* Once adequate monitoring and alerting is in place to track kube system * Once adequate monitoring and alerting is in place to track kube system
daemons, attempt to enforce `kube-reserved` based on usage heuristics. daemons, attempt to enforce `kubeReserved` based on usage heuristics.
* If absolutely necessary, enforce `system-reserved` over time. * If absolutely necessary, enforce `systemReserved` over time.
--> -->
* 作为起步,可以先针对 `pods` 上执行 'Allocatable' 约束。 * 作为起步,可以先针对 `pods` 上执行 'Allocatable' 约束。
* 一旦用于追踪系统守护进程的监控和告警的机制到位,可尝试基于用量估计的方式执行 `kube-reserved` 策略。 * 一旦用于追踪系统守护进程的监控和告警的机制到位,可尝试基于用量估计的方式执行 `kubeReserved` 策略。
* 随着时间推进,在绝对必要的时候可以执行 `system-reserved` 策略。 * 随着时间推进,在绝对必要的时候可以执行 `systemReserved` 策略。
<!-- <!--
The resource requirements of kube system daemons may grow over time as more and The resource requirements of kube system daemons may grow over time as more and
@ -418,18 +419,18 @@ So expect a drop in `Allocatable` capacity in future releases.
Here is an example to illustrate Node Allocatable computation: Here is an example to illustrate Node Allocatable computation:
* Node has `32Gi` of `memory`, `16 CPUs` and `100Gi` of `Storage` * Node has `32Gi` of `memory`, `16 CPUs` and `100Gi` of `Storage`
* `--kube-reserved` is set to `cpu=1,memory=2Gi,ephemeral-storage=1Gi` * `kubeReserved` is set to `{cpu: 1000m, memory: 2Gi, ephemeral-storage: 1Gi}`
* `--system-reserved` is set to `cpu=500m,memory=1Gi,ephemeral-storage=1Gi` * `systemReserved` is set to `{cpu: 500m, memory: 1Gi, ephemeral-storage: 1Gi}`
* `--eviction-hard` is set to `memory.available<500Mi,nodefs.available<10%` * `evictionHard` is set to `{memory.available: "<500Mi", nodefs.available: "<10%"}`
--> -->
## 示例场景 {#example-scenario} ## 示例场景 {#example-scenario}
这是一个用于说明节点可分配Node Allocatable计算方式的示例 这是一个用于说明节点可分配Node Allocatable计算方式的示例
* 节点拥有 `32Gi` `memory`、`16 CPU` 和 `100Gi` `Storage` 资源 * 节点拥有 `32Gi` `memory`、`16 CPU` 和 `100Gi` `Storage` 资源
* `--kube-reserved` 被设置为 `cpu=1,memory=2Gi,ephemeral-storage=1Gi` * `kubeReserved` 被设置为 `{cpu: 1000m, memory: 2Gi, ephemeral-storage: 1Gi}`
* `--system-reserved` 被设置为 `cpu=500m,memory=1Gi,ephemeral-storage=1Gi` * `systemReserved` 被设置为 `{cpu: 500m, memory: 1Gi, ephemeral-storage: 1Gi}`
* `--eviction-hard` 被设置为 `memory.available<500Mi,nodefs.available<10%` * `evictionHard` 被设置为 `{memory.available: "<500Mi", nodefs.available: "<10%"}`
<!-- <!--
Under this scenario, 'Allocatable' will be 14.5 CPUs, 28.5Gi of memory and Under this scenario, 'Allocatable' will be 14.5 CPUs, 28.5Gi of memory and
@ -446,10 +447,9 @@ much CPU as they can, pods together cannot consume more than 14.5 CPUs.
如果节点上的所有进程都尽可能多地使用 CPU则 Pod 加起来不能使用超过 14.5 CPUs 的资源。 如果节点上的所有进程都尽可能多地使用 CPU则 Pod 加起来不能使用超过 14.5 CPUs 的资源。
<!-- <!--
If `kube-reserved` and/or `system-reserved` is not enforced and system daemons If `kubeReserved` and/or `systemReserved` is not enforced and system daemons
exceed their reservation, `kubelet` evicts pods whenever the overall node memory exceed their reservation, `kubelet` evicts pods whenever the overall node memory
usage is higher than 31.5Gi or `storage` is greater than 90Gi. usage is higher than 31.5Gi or `storage` is greater than 90Gi.
--> -->
当没有执行 `kube-reserved` 和/或 `system-reserved` 策略且系统守护进程使用量超过其预留时, 当没有执行 `kubeReserved` 和/或 `systemReserved` 策略且系统守护进程使用量超过其预留时,
如果节点内存用量高于 31.5Gi 或 `storage` 大于 90Gikubelet 将会驱逐 Pod。 如果节点内存用量高于 31.5Gi 或 `storage` 大于 90Gikubelet 将会驱逐 Pod。