[zh] Cleanup node-pressure-eviction.md

pull/39267/head
zhuzhenghao 2023-02-05 14:11:42 +08:00
parent 4016a7c322
commit b902cff2b1
1 changed files with 123 additions and 121 deletions

View File

@ -11,17 +11,17 @@ weight: 100
{{<glossary_definition term_id="node-pressure-eviction" length="short">}}</br>
<!--
The {{<glossary_tooltip term_id="kubelet" text="kubelet">}} monitors resources
like memory, disk space, and filesystem inodes on your cluster's nodes.
When one or more of these resources reach specific consumption levels, the
<!--
The {{<glossary_tooltip term_id="kubelet" text="kubelet">}} monitors resources
like memory, disk space, and filesystem inodes on your cluster's nodes.
When one or more of these resources reach specific consumption levels, the
kubelet can proactively fail one or more pods on the node to reclaim resources
and prevent starvation.
and prevent starvation.
During a node-pressure eviction, the kubelet sets the `PodPhase` for the
selected pods to `Failed`. This terminates the pods.
selected pods to `Failed`. This terminates the pods.
Node-pressure eviction is not the same as
Node-pressure eviction is not the same as
[API-initiated eviction](/docs/concepts/scheduling-eviction/api-eviction/).
-->
{{<glossary_tooltip term_id="kubelet" text="kubelet">}}
@ -33,7 +33,7 @@ kubelet 可以主动地使节点上一个或者多个 Pod 失效,以回收资
节点压力驱逐不同于 [API 发起的驱逐](/zh-cn/docs/concepts/scheduling-eviction/api-eviction/)。
<!--
<!--
The kubelet does not respect your configured `PodDisruptionBudget` or the pod's
`terminationGracePeriodSeconds`. If you use [soft eviction thresholds](#soft-eviction-thresholds),
the kubelet respects your configured `eviction-max-pod-grace-period`. If you use
@ -42,7 +42,7 @@ the kubelet respects your configured `eviction-max-pod-grace-period`. If you use
If the pods are managed by a {{< glossary_tooltip text="workload" term_id="workload" >}}
resource (such as {{< glossary_tooltip text="StatefulSet" term_id="statefulset" >}}
or {{< glossary_tooltip text="Deployment" term_id="deployment" >}}) that
replaces failed pods, the control plane or `kube-controller-manager` creates new
replaces failed pods, the control plane or `kube-controller-manager` creates new
pods in place of the evicted pods.
-->
kubelet 并不理会你配置的 `PodDisruptionBudget` 或者是 Pod 的 `terminationGracePeriodSeconds`
@ -65,26 +65,26 @@ kubelet 在终止最终用户 Pod 之前会尝试[回收节点级资源](#reclai
例如,它会在磁盘资源不足时删除未使用的容器镜像。
{{</note>}}
<!--
<!--
The kubelet uses various parameters to make eviction decisions, like the following:
* Eviction signals
* Eviction thresholds
* Monitoring intervals
- Eviction signals
- Eviction thresholds
- Monitoring intervals
-->
kubelet 使用各种参数来做出驱逐决定,如下所示:
* 驱逐信号
* 驱逐条件
* 监控间隔
- 驱逐信号
- 驱逐条件
- 监控间隔
<!--
<!--
### Eviction signals {#eviction-signals}
Eviction signals are the current state of a particular resource at a specific
point in time. Kubelet uses eviction signals to make eviction decisions by
comparing the signals to eviction thresholds, which are the minimum amount of
the resource that should be available on the node.
comparing the signals to eviction thresholds, which are the minimum amount of
the resource that should be available on the node.
Kubelet uses the following eviction signals:
-->
@ -105,11 +105,11 @@ kubelet 使用以下驱逐信号:
| `imagefs.inodesFree` | `imagefs.inodesFree` := `node.stats.runtime.imagefs.inodesFree` |
| `pid.available` | `pid.available` := `node.stats.rlimit.maxpid` - `node.stats.rlimit.curproc` |
<!--
<!--
In this table, the `Description` column shows how kubelet gets the value of the
signal. Each signal supports either a percentage or a literal value. Kubelet
signal. Each signal supports either a percentage or a literal value. Kubelet
calculates the percentage value relative to the total capacity associated with
the signal.
the signal.
-->
在上表中,`描述`列显示了 kubelet 如何获取信号的值。每个信号支持百分比值或者是字面值。
kubelet 计算相对于与信号有关的总量的百分比值。
@ -117,14 +117,14 @@ kubelet 计算相对于与信号有关的总量的百分比值。
<!--
The value for `memory.available` is derived from the cgroupfs instead of tools
like `free -m`. This is important because `free -m` does not work in a
container, and if users use the [node
allocatable](/docs/tasks/administer-cluster/reserve-compute-resources/#node-allocatable) feature, out of resource decisions
container, and if users use the [node allocatable](/docs/tasks/administer-cluster/reserve-compute-resources/#node-allocatable)
feature, out of resource decisions
are made local to the end user Pod part of the cgroup hierarchy as well as the
root node. This [script](/examples/admin/resource/memory-available.sh)
reproduces the same set of steps that the kubelet performs to calculate
`memory.available`. The kubelet excludes inactive_file (i.e. # of bytes of
file-backed memory on inactive LRU list) from its calculation as it assumes that
memory is reclaimable under pressure.
memory is reclaimable under pressure.
-->
`memory.available` 的值来自 cgroupfs而不是像 `free -m` 这样的工具。
这很重要,因为 `free -m` 在容器中不起作用,如果用户使用
@ -139,12 +139,12 @@ kubelet 在其计算中排除了 inactive_file即非活动 LRU 列表上基
The kubelet supports the following filesystem partitions:
1. `nodefs`: The node's main filesystem, used for local disk volumes, emptyDir,
log storage, and more. For example, `nodefs` contains `/var/lib/kubelet/`.
log storage, and more. For example, `nodefs` contains `/var/lib/kubelet/`.
1. `imagefs`: An optional filesystem that container runtimes use to store container
images and container writable layers.
Kubelet auto-discovers these filesystems and ignores other filesystems. Kubelet
does not support other configurations.
does not support other configurations.
-->
kubelet 支持以下文件系统分区:
@ -175,7 +175,7 @@ Some kubelet garbage collection features are deprecated in favor of eviction:
| `--maximum-dead-containers-per-container` | - | 一旦旧的日志存储在容器的上下文之外就会被弃用 |
| `--minimum-container-ttl-duration` | - | 一旦旧的日志存储在容器的上下文之外就会被弃用 |
<!--
<!--
### Eviction thresholds
You can specify custom eviction thresholds for the kubelet to use when it makes
@ -183,10 +183,10 @@ eviction decisions.
Eviction thresholds have the form `[eviction-signal][operator][quantity]`, where:
* `eviction-signal` is the [eviction signal](#eviction-signals) to use.
* `operator` is the [relational operator](https://en.wikipedia.org/wiki/Relational_operator#Standard_relational_operators)
- `eviction-signal` is the [eviction signal](#eviction-signals) to use.
- `operator` is the [relational operator](https://en.wikipedia.org/wiki/Relational_operator#Standard_relational_operators)
you want, such as `<` (less than).
* `quantity` is the eviction threshold amount, such as `1Gi`. The value of `quantity`
- `quantity` is the eviction threshold amount, such as `1Gi`. The value of `quantity`
must match the quantity representation used by Kubernetes. You can use either
literal values or percentages (`%`).
-->
@ -208,7 +208,7 @@ For example, if a node has `10Gi` of total memory and you want trigger eviction
the available memory falls below `1Gi`, you can define the eviction threshold as
either `memory.available<10%` or `memory.available<1Gi`. You cannot use both.
You can configure soft and hard eviction thresholds.
You can configure soft and hard eviction thresholds.
-->
例如,如果一个节点的总内存为 10Gi 并且你希望在可用内存低于 1Gi 时触发驱逐,
则可以将驱逐条件定义为 `memory.available<10%``memory.available< 1G`
@ -216,13 +216,13 @@ You can configure soft and hard eviction thresholds.
你可以配置软和硬驱逐条件。
<!--
<!--
#### Soft eviction thresholds {#soft-eviction-thresholds}
A soft eviction threshold pairs an eviction threshold with a required
administrator-specified grace period. The kubelet does not evict pods until the
grace period is exceeded. The kubelet returns an error on startup if there is no
specified grace period.
specified grace period.
-->
#### 软驱逐条件 {#soft-eviction-thresholds}
@ -230,10 +230,10 @@ specified grace period.
在超过宽限期之前kubelet 不会驱逐 Pod。
如果没有指定的宽限期kubelet 会在启动时返回错误。
<!--
<!--
You can specify both a soft eviction threshold grace period and a maximum
allowed pod termination grace period for kubelet to use during evictions. If you
specify a maximum allowed grace period and the soft eviction threshold is met,
specify a maximum allowed grace period and the soft eviction threshold is met,
the kubelet uses the lesser of the two grace periods. If you do not specify a
maximum allowed grace period, the kubelet kills evicted pods immediately without
graceful termination.
@ -242,14 +242,14 @@ graceful termination.
如果你指定了宽限期的上限并且 Pod 满足软驱逐阈条件,则 kubelet 将使用两个宽限期中的较小者。
如果你没有指定宽限期上限kubelet 会立即杀死被驱逐的 Pod不允许其体面终止。
<!--
<!--
You can use the following flags to configure soft eviction thresholds:
* `eviction-soft`: A set of eviction thresholds like `memory.available<1.5Gi`
- `eviction-soft`: A set of eviction thresholds like `memory.available<1.5Gi`
that can trigger pod eviction if held over the specified grace period.
* `eviction-soft-grace-period`: A set of eviction grace periods like `memory.available=1m30s`
- `eviction-soft-grace-period`: A set of eviction grace periods like `memory.available=1m30s`
that define how long a soft eviction threshold must hold before triggering a Pod eviction.
* `eviction-max-pod-grace-period`: The maximum allowed grace period (in seconds)
- `eviction-max-pod-grace-period`: The maximum allowed grace period (in seconds)
to use when terminating pods in response to a soft eviction threshold being met.
-->
你可以使用以下标志来配置软驱逐条件:
@ -260,15 +260,15 @@ You can use the following flags to configure soft eviction thresholds:
`memory.available=1m30s`,定义软驱逐条件在触发 Pod 驱逐之前必须保持多长时间。
* `eviction-max-pod-grace-period`:在满足软驱逐条件而终止 Pod 时使用的最大允许宽限期(以秒为单位)。
<!--
<!--
#### Hard eviction thresholds {#hard-eviction-thresholds}
A hard eviction threshold has no grace period. When a hard eviction threshold is
met, the kubelet kills pods immediately without graceful termination to reclaim
the starved resource.
You can use the `eviction-hard` flag to configure a set of hard eviction
thresholds like `memory.available<1Gi`.
You can use the `eviction-hard` flag to configure a set of hard eviction
thresholds like `memory.available<1Gi`.
-->
#### 硬驱逐条件 {#hard-eviction-thresholds}
@ -278,13 +278,13 @@ kubelet 会立即杀死 pod而不会正常终止以回收紧缺的资源。
你可以使用 `eviction-hard` 标志来配置一组硬驱逐条件,
例如 `memory.available<1Gi`
<!--
<!--
The kubelet has the following default hard eviction thresholds:
* `memory.available<100Mi`
* `nodefs.available<10%`
* `imagefs.available<15%`
* `nodefs.inodesFree<5%` (Linux nodes)
- `memory.available<100Mi`
- `nodefs.available<10%`
- `imagefs.available<15%`
- `nodefs.inodesFree<5%` (Linux nodes)
-->
kubelet 具有以下默认硬驱逐条件:
@ -294,17 +294,17 @@ kubelet 具有以下默认硬驱逐条件:
* `nodefs.inodesFree<5%`Linux 节点)
<!--
These default values of hard eviction thresholds will only be set if none
of the parameters is changed. If you changed the value of any parameter,
then the values of other parameters will not be inherited as the default
values and will be set to zero. In order to provide custom values, you
These default values of hard eviction thresholds will only be set if none
of the parameters is changed. If you changed the value of any parameter,
then the values of other parameters will not be inherited as the default
values and will be set to zero. In order to provide custom values, you
should provide all the thresholds respectively.
-->
只有在没有更改任何参数的情况下,硬驱逐阈值才会被设置成这些默认值。
如果你更改了任何参数的值,则其他参数的取值不会继承其默认值设置,而将被设置为零。
为了提供自定义值,你应该分别设置所有阈值。
<!--
<!--
### Eviction monitoring interval
The kubelet evaluates eviction thresholds based on its configured `housekeeping-interval`
@ -319,14 +319,14 @@ kubelet 根据其配置的 `housekeeping-interval`(默认为 `10s`)评估驱
The kubelet reports node conditions to reflect that the node is under pressure
because hard or soft eviction threshold is met, independent of configured grace
periods.
periods.
-->
### 节点条件 {#node-conditions}
kubelet 报告节点状况以反映节点处于压力之下,因为满足硬或软驱逐条件,与配置的宽限期无关。
<!--
The kubelet maps eviction signals to node conditions as follows:
<!--
The kubelet maps eviction signals to node conditions as follows:
| Node Condition | Eviction Signal | Description |
|-------------------|---------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------|
@ -334,7 +334,7 @@ The kubelet maps eviction signals to node conditions as follows:
| `DiskPressure` | `nodefs.available`, `nodefs.inodesFree`, `imagefs.available`, or `imagefs.inodesFree` | Available disk space and inodes on either the node's root filesystem or image filesystem has satisfied an eviction threshold |
| `PIDPressure` | `pid.available` | Available processes identifiers on the (Linux) node has fallen below an eviction threshold |
The kubelet updates the node conditions based on the configured
The kubelet updates the node conditions based on the configured
`--node-status-update-frequency`, which defaults to `10s`.
-->
kubelet 根据下表将驱逐信号映射为节点状况:
@ -347,7 +347,7 @@ kubelet 根据下表将驱逐信号映射为节点状况:
kubelet 根据配置的 `--node-status-update-frequency` 更新节点条件,默认为 `10s`
<!--
<!--
#### Node condition oscillation
In some cases, nodes oscillate above and below soft eviction thresholds without
@ -367,13 +367,13 @@ condition to a different state. The transition period has a default value of `5m
该标志控制 kubelet 在将节点条件转换为不同状态之前必须等待的时间。
过渡期的默认值为 `5m`
<!--
<!--
### Reclaiming node level resources {#reclaim-node-resources}
The kubelet tries to reclaim node-level resources before it evicts end-user pods.
When a `DiskPressure` node condition is reported, the kubelet reclaims node-level
resources based on the filesystems on the node.
resources based on the filesystems on the node.
-->
### 回收节点级资源 {#reclaim-node-resources}
@ -387,19 +387,19 @@ kubelet 在驱逐最终用户 Pod 之前会先尝试回收节点级资源。
If the node has a dedicated `imagefs` filesystem for container runtimes to use,
the kubelet does the following:
* If the `nodefs` filesystem meets the eviction thresholds, the kubelet garbage collects
dead pods and containers.
* If the `imagefs` filesystem meets the eviction thresholds, the kubelet
deletes all unused images.
- If the `nodefs` filesystem meets the eviction thresholds, the kubelet garbage collects
dead pods and containers.
- If the `imagefs` filesystem meets the eviction thresholds, the kubelet
deletes all unused images.
-->
#### 有 `imagefs`
如果节点有一个专用的 `imagefs` 文件系统供容器运行时使用kubelet 会执行以下操作:
* 如果 `nodefs` 文件系统满足驱逐条件kubelet 垃圾收集死亡 Pod 和容器。
* 如果 `imagefs` 文件系统满足驱逐条件kubelet 将删除所有未使用的镜像。
- 如果 `nodefs` 文件系统满足驱逐条件kubelet 垃圾收集死亡 Pod 和容器。
- 如果 `imagefs` 文件系统满足驱逐条件kubelet 将删除所有未使用的镜像。
<!--
<!--
#### Without `imagefs`
If the node only has a `nodefs` filesystem that meets eviction thresholds,
@ -416,11 +416,11 @@ kubelet 按以下顺序释放磁盘空间:
1. 对死亡的 Pod 和容器进行垃圾收集
1. 删除未使用的镜像
<!--
<!--
### Pod selection for kubelet eviction
If the kubelet's attempts to reclaim node-level resources don't bring the eviction
signal below the threshold, the kubelet begins to evict end-user pods.
signal below the threshold, the kubelet begins to evict end-user pods.
The kubelet uses the following parameters to determine the pod eviction order:
@ -439,7 +439,7 @@ kubelet 使用以下参数来确定 Pod 驱逐顺序:
1. [Pod 优先级](/zh-cn/docs/concepts/scheduling-eviction/pod-priority-preemption/)
1. Pod 相对于请求的资源使用情况
<!--
<!--
As a result, kubelet ranks and evicts pods in the following order:
1. `BestEffort` or `Burstable` pods where the usage exceeds requests. These pods
@ -457,7 +457,7 @@ As a result, kubelet ranks and evicts pods in the following order:
{{<note>}}
<!--
The kubelet does not use the pod's QoS class to determine the eviction order.
You can use the QoS class to estimate the most likely pod eviction order when
You can use the QoS class to estimate the most likely pod eviction order when
reclaiming resources like memory. QoS does not apply to EphemeralStorage requests,
so the above scenario will not apply if the node is, for example, under `DiskPressure`.
-->
@ -467,11 +467,11 @@ QoS 不适用于临时存储EphemeralStorage请求
因此如果节点在 `DiskPressure` 下,则上述场景将不适用。
{{</note>}}
<!--
<!--
`Guaranteed` pods are guaranteed only when requests and limits are specified for
all the containers and they are equal. These pods will never be evicted because
of another pod's resource consumption. If a system daemon (such as `kubelet`,
and `journald`) is consuming more resources than were reserved via
of another pod's resource consumption. If a system daemon (such as `kubelet`
and `journald`) is consuming more resources than were reserved via
`system-reserved` or `kube-reserved` allocations, and the node only has
`Guaranteed` or `Burstable` pods using less resources than requests left on it,
then the kubelet must choose to evict one of these pods to preserve node stability
@ -483,10 +483,10 @@ will choose to evict pods of lowest Priority first.
如果系统守护进程(例如 `kubelet``journald`
消耗的资源比通过 `system-reserved``kube-reserved` 分配保留的资源多,
并且该节点只有 `Guaranteed``Burstable` Pod 使用的资源少于其上剩余的请求,
那么 kubelet 必须选择驱逐这些 Pod 中的一个以保持节点稳定性并减少资源匮乏对其他 Pod 的影响。
那么 kubelet 必须选择驱逐这些 Pod 中的一个以保持节点稳定性并减少资源匮乏对其他 Pod 的影响。
在这种情况下,它会选择首先驱逐最低优先级的 Pod。
<!--
<!--
When the kubelet evicts pods in response to `inode` or `PID` starvation, it uses
the Priority to determine the eviction order, because `inodes` and `PIDs` have no
requests.
@ -499,7 +499,7 @@ The kubelet sorts pods differently based on whether the node has a dedicated
kubelet 根据节点是否具有专用的 `imagefs` 文件系统对 Pod 进行不同的排序:
<!--
<!--
#### With `imagefs`
If `nodefs` is triggering evictions, the kubelet sorts pods based on `nodefs`
@ -513,6 +513,7 @@ writable layer usage of all containers.
If `nodefs` is triggering evictions, the kubelet sorts pods based on their total
disk usage (`local volumes + logs & writable layer of all containers`)
-->
#### 有 `imagefs`
如果 `nodefs` 触发驱逐,
@ -525,25 +526,26 @@ kubelet 会根据 `nodefs` 使用情况(`本地卷 + 所有容器的日志`
如果 `nodefs` 触发驱逐,
kubelet 会根据磁盘总用量(`本地卷 + 日志和所有容器的可写层`)对 Pod 进行排序。
<!--
<!--
### Minimum eviction reclaim
In some cases, pod eviction only reclaims a small amount of the starved resource.
This can lead to the kubelet repeatedly hitting the configured eviction thresholds
and triggering multiple evictions.
and triggering multiple evictions.
-->
### 最小驱逐回收 {#minimum-eviction-reclaim}
### 最小驱逐回收 {#minimum-eviction-reclaim}
在某些情况下,驱逐 Pod 只会回收少量的紧俏资源。
这可能导致 kubelet 反复达到配置的驱逐条件并触发多次驱逐。
<!--
<!--
You can use the `--eviction-minimum-reclaim` flag or a [kubelet config file](/docs/tasks/administer-cluster/kubelet-config-file/)
to configure a minimum reclaim amount for each resource. When the kubelet notices
that a resource is starved, it continues to reclaim that resource until it
reclaims the quantity you specify.
reclaims the quantity you specify.
For example, the following configuration sets minimum reclaim amounts:
For example, the following configuration sets minimum reclaim amounts:
-->
你可以使用 `--eviction-minimum-reclaim` 标志或
[kubelet 配置文件](/zh-cn/docs/tasks/administer-cluster/kubelet-config-file/)
@ -565,14 +567,14 @@ evictionMinimumReclaim:
imagefs.available: "2Gi"
```
<!--
<!--
In this example, if the `nodefs.available` signal meets the eviction threshold,
the kubelet reclaims the resource until the signal reaches the threshold of `1Gi`,
and then continues to reclaim the minimum amount of `500Mi` it until the signal
reaches `1.5Gi`.
reaches `1.5Gi`.
Similarly, the kubelet reclaims the `imagefs` resource until the `imagefs.available`
signal reaches `102Gi`.
signal reaches `102Gi`.
The default `eviction-minimum-reclaim` is `0` for all resources.
-->
@ -584,7 +586,7 @@ kubelet 会回收资源,直到信号达到 `1Gi` 的条件,
对于所有资源,默认的 `eviction-minimum-reclaim``0`
<!--
<!--
### Node out of memory behavior
If the node experiences an out of memory (OOM) event prior to the kubelet
@ -616,7 +618,7 @@ kubelet 还将具有 `system-node-critical`
的 Pod 中的容器 `oom_score_adj` 值设为 `-997`
{{</note>}}
<!--
<!--
If the kubelet can't reclaim memory before a node experiences OOM, the
`oom_killer` calculates an `oom_score` based on the percentage of memory it's
using on the node, and then adds the `oom_score_adj` to get an effective `oom_score`
@ -625,7 +627,7 @@ for each container. It then kills the container with the highest score.
This means that containers in low QoS pods that consume a large amount of memory
relative to their scheduling requests are killed first.
Unlike pod eviction, if a container is OOM killed, the `kubelet` can restart it
Unlike pod eviction, if a container is OOM killed, the `kubelet` can restart it
based on its `RestartPolicy`.
-->
如果 kubelet 在节点遇到 OOM 之前无法回收内存,
@ -638,7 +640,7 @@ based on its `RestartPolicy`.
与 Pod 驱逐不同,如果容器被 OOM 杀死,
`kubelet` 可以根据其 `RestartPolicy` 重新启动它。
<!--
<!--
### Best practices {#node-pressure-eviction-good-practices}
The following sections describe best practices for eviction configuration.
@ -647,7 +649,7 @@ The following sections describe best practices for eviction configuration.
以下部分描述了驱逐配置的最佳实践。
<!--
<!--
#### Schedulable resources and eviction policies
When you configure the kubelet with an eviction policy, you should make sure that
@ -659,12 +661,12 @@ immediately induce memory pressure.
当你为 kubelet 配置驱逐策略时,
你应该确保调度程序不会在 Pod 触发驱逐时对其进行调度,因为这类 Pod 会立即引起内存压力。
<!--
<!--
Consider the following scenario:
* Node memory capacity: `10Gi`
* Operator wants to reserve 10% of memory capacity for system daemons (kernel, `kubelet`, etc.)
* Operator wants to evict Pods at 95% memory utilization to reduce incidence of system OOM.
- Node memory capacity: `10Gi`
- Operator wants to reserve 10% of memory capacity for system daemons (kernel, `kubelet`, etc.)
- Operator wants to evict Pods at 95% memory utilization to reduce incidence of system OOM.
-->
考虑以下场景:
@ -672,7 +674,7 @@ Consider the following scenario:
* 操作员希望为系统守护进程(内核、`kubelet` 等)保留 10% 的内存容量
* 操作员希望在节点内存利用率达到 95% 以上时驱逐 Pod以减少系统 OOM 的概率。
<!--
<!--
For this to work, the kubelet is launched as follows:
-->
为此kubelet 启动设置如下:
@ -682,13 +684,13 @@ For this to work, the kubelet is launched as follows:
--system-reserved=memory=1.5Gi
```
<!--
<!--
In this configuration, the `--system-reserved` flag reserves `1.5Gi` of memory
for the system, which is `10% of the total memory + the eviction threshold amount`.
for the system, which is `10% of the total memory + the eviction threshold amount`.
The node can reach the eviction threshold if a pod is using more than its request,
or if the system is using more than `1Gi` of memory, which makes the `memory.available`
signal fall below `500Mi` and triggers the threshold.
signal fall below `500Mi` and triggers the threshold.
-->
在此配置中,`--system-reserved` 标志为系统预留了 `1.5Gi` 的内存,
`总内存的 10% + 驱逐条件量`
@ -696,13 +698,13 @@ signal fall below `500Mi` and triggers the threshold.
如果 Pod 使用的内存超过其请求值或者系统使用的内存超过 `1Gi`
则节点可以达到驱逐条件,这使得 `memory.available` 信号低于 `500Mi` 并触发条件。
<!--
<!--
#### DaemonSet
Pod Priority is a major factor in making eviction decisions. If you do not want
the kubelet to evict pods that belong to a `DaemonSet`, give those pods a high
enough `priorityClass` in the pod spec. You can also use a lower `priorityClass`
or the default to only allow `DaemonSet` pods to run when there are enough
or the default to only allow `DaemonSet` pods to run when there are enough
resources.
-->
### DaemonSet
@ -713,7 +715,7 @@ Pod 优先级是做出驱逐决定的主要因素。
你还可以使用优先级较低的 `priorityClass` 或默认配置,
仅在有足够资源时才运行 `DaemonSet` Pod。
<!--
<!--
### Known issues
The following sections describe known issues related to out of resource handling.
@ -722,13 +724,13 @@ The following sections describe known issues related to out of resource handling
以下部分描述了与资源不足处理相关的已知问题。
<!--
<!--
#### kubelet may not observe memory pressure right away
By default, the kubelet polls `cAdvisor` to collect memory usage stats at a
regular interval. If memory usage increases within that window rapidly, the
kubelet may not observe `MemoryPressure` fast enough, and the `OOMKiller`
will still be invoked.
will still be invoked.
-->
#### kubelet 可能不会立即观察到内存压力
@ -736,14 +738,14 @@ will still be invoked.
如果该轮询时间窗口内内存使用量迅速增加kubelet 可能无法足够快地观察到 `MemoryPressure`
但是 `OOMKiller` 仍将被调用。
<!--
<!--
You can use the `--kernel-memcg-notification` flag to enable the `memcg`
notification API on the kubelet to get notified immediately when a threshold
is crossed.
If you are not trying to achieve extreme utilization, but a sensible measure of
overcommit, a viable workaround for this issue is to use the `--kube-reserved`
and `--system-reserved` flags to allocate memory for the system.
and `--system-reserved` flags to allocate memory for the system.
-->
你可以使用 `--kernel-memcg-notification`
标志在 kubelet 上启用 `memcg` 通知 API以便在超过条件时立即收到通知。
@ -751,16 +753,16 @@ and `--system-reserved` flags to allocate memory for the system.
如果你不是追求极端利用率,而是要采取合理的过量使用措施,
则解决此问题的可行方法是使用 `--kube-reserved``--system-reserved` 标志为系统分配内存。
<!--
<!--
#### active_file memory is not considered as available memory
On Linux, the kernel tracks the number of bytes of file-backed memory on active
On Linux, the kernel tracks the number of bytes of file-backed memory on active
LRU list as the `active_file` statistic. The kubelet treats `active_file` memory
areas as not reclaimable. For workloads that make intensive use of block-backed
local storage, including ephemeral local storage, kernel-level caches of file
and block data means that many recently accessed cache pages are likely to be
counted as `active_file`. If enough of these kernel block buffers are on the
active LRU list, the kubelet is liable to observe this as high resource use and
areas as not reclaimable. For workloads that make intensive use of block-backed
local storage, including ephemeral local storage, kernel-level caches of file
and block data means that many recently accessed cache pages are likely to be
counted as `active_file`. If enough of these kernel block buffers are on the
active LRU list, the kubelet is liable to observe this as high resource use and
taint the node as experiencing memory pressure - triggering pod eviction.
-->
#### active_file 内存未被视为可用内存
@ -772,11 +774,11 @@ kubelet 将 `active_file` 内存区域视为不可回收。
如果这些内核块缓冲区中在活动 LRU 列表上有足够多,
kubelet 很容易将其视为资源用量过量并为节点设置内存压力污点,从而触发 Pod 驱逐。
<!--
<!--
For more details, see [https://github.com/kubernetes/kubernetes/issues/43916](https://github.com/kubernetes/kubernetes/issues/43916)
You can work around that behavior by setting the memory limit and memory request
the same for containers likely to perform intensive I/O activity. You will need
the same for containers likely to perform intensive I/O activity. You will need
to estimate or measure an optimal memory limit value for that container.
-->
更多细节请参见 [https://github.com/kubernetes/kubernetes/issues/43916](https://github.com/kubernetes/kubernetes/issues/43916)。
@ -786,12 +788,12 @@ to estimate or measure an optimal memory limit value for that container.
## {{% heading "whatsnext" %}}
<!--
* Learn about [API-initiated Eviction](/docs/concepts/scheduling-eviction/api-eviction/)
* Learn about [Pod Priority and Preemption](/docs/concepts/scheduling-eviction/pod-priority-preemption/)
* Learn about [PodDisruptionBudgets](/docs/tasks/run-application/configure-pdb/)
* Learn about [Quality of Service](/docs/tasks/configure-pod-container/quality-service-pod/) (QoS)
* Check out the [Eviction API](/docs/reference/generated/kubernetes-api/{{<param "version">}}/#create-eviction-pod-v1-core)
<!--
- Learn about [API-initiated Eviction](/docs/concepts/scheduling-eviction/api-eviction/)
- Learn about [Pod Priority and Preemption](/docs/concepts/scheduling-eviction/pod-priority-preemption/)
- Learn about [PodDisruptionBudgets](/docs/tasks/run-application/configure-pdb/)
- Learn about [Quality of Service](/docs/tasks/configure-pod-container/quality-service-pod/) (QoS)
- Check out the [Eviction API](/docs/reference/generated/kubernetes-api/{{<param "version">}}/#create-eviction-pod-v1-core)
-->
* 了解 [API 发起的驱逐](/zh-cn/docs/concepts/scheduling-eviction/api-eviction/)
* 了解 [Pod 优先级和抢占](/zh-cn/docs/concepts/scheduling-eviction/pod-priority-preemption/)