Merge pull request #46410 from windsonsea/flowc

[zh] Sync cluster-administration/flow-control.md
pull/46417/head
Kubernetes Prow Robot 2024-05-17 00:11:35 -07:00 committed by GitHub
commit 909af47790
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
1 changed files with 75 additions and 55 deletions

View File

@ -36,7 +36,7 @@ The API Priority and Fairness feature (APF) is an alternative that improves upon
aforementioned max-inflight limitations. APF classifies
and isolates requests in a more fine-grained way. It also introduces
a limited amount of queuing, so that no requests are rejected in cases
of very brief bursts. Requests are dispatched from queues using a
of very brief bursts. Requests are dispatched from queues using a
fair queuing technique so that, for example, a poorly-behaved
{{< glossary_tooltip text="controller" term_id="controller" >}} need not
starve others (even at the same priority level).
@ -81,16 +81,17 @@ APF 适用于 **watch** 请求。当 APF 被禁用时,**watch** 请求不受 `
<!--
The API Priority and Fairness feature is controlled by a command-line flag
and is enabled by default. See
and is enabled by default. See
[Options](/docs/reference/command-line-tools-reference/kube-apiserver/#options)
for a general explanation of the available kube-apiserver command-line
options and how to enable and disable them. The name of the
command-line option for APF is "--enable-priority-and-fairness". This feature
options and how to enable and disable them. The name of the
command-line option for APF is "--enable-priority-and-fairness". This feature
also involves an {{<glossary_tooltip term_id="api-group" text="API Group" >}}
with: (a) a stable `v1` version, introduced in 1.29, and
enabled by default (b) a `v1beta3` version, enabled by default, and
deprecated in v1.29. You can
deprecated in v1.29. You can
disable the API group beta version `v1beta3` by adding the
following command-line flags to your `kube-apiserver` invocation:
-->
API 优先级与公平性APF特性由命令行标志控制默认情况下启用。
有关可用 kube-apiserver 命令行参数以及如何启用和禁用的说明,
@ -101,6 +102,13 @@ APF 的命令行参数是 "--enable-priority-and-fairness"。
(b) `v1beta3` 版本,默认被启用,在 1.29 中被弃用。
你可以通过添加以下内容来禁用 Beta 版的 `v1beta3` API 组:
<!--
```shell
kube-apiserver \
--runtime-config=flowcontrol.apiserver.k8s.io/v1beta3=false \
# …and other flags as usual
```
-->
```shell
kube-apiserver \
--runtime-config=flowcontrol.apiserver.k8s.io/v1beta3=false \
@ -164,7 +172,7 @@ from succeeding.
<!--
The concurrency limits of the priority levels are periodically
adjusted, allowing under-utilized priority levels to temporarily lend
concurrency to heavily-utilized levels. These limits are based on
concurrency to heavily-utilized levels. These limits are based on
nominal limits and bounds on how much concurrency a priority level may
lend and how much it may borrow, all derived from the configuration
objects mentioned below.
@ -184,10 +192,10 @@ word "seat" is used to mean one unit of concurrency, inspired by the
way each passenger on a train or aircraft takes up one of the fixed
supply of seats.
But some requests take up more than one seat. Some of these are **list**
But some requests take up more than one seat. Some of these are **list**
requests that the server estimates will return a large number of
objects. These have been found to put an exceptionally heavy burden
on the server. For this reason, the server estimates the number of objects
objects. These have been found to put an exceptionally heavy burden
on the server. For this reason, the server estimates the number of objects
that will be returned and considers the request to take a number of seats
that is proportional to that estimated number.
-->
@ -206,19 +214,19 @@ that is proportional to that estimated number.
### Execution time tweaks for watch requests
API Priority and Fairness manages **watch** requests, but this involves a
couple more excursions from the baseline behavior. The first concerns
how long a **watch** request is considered to occupy its seat. Depending
on request parameters, the response to a **watch** request may or may not
begin with **create** notifications for all the relevant pre-existing
objects. API Priority and Fairness considers a **watch** request to be
couple more excursions from the baseline behavior. The first concerns
how long a **watch** request is considered to occupy its seat. Depending
on request parameters, the response to a **watch** request may or may not
begin with **create** notifications for all the relevant pre-existing
objects. API Priority and Fairness considers a **watch** request to be
done with its seat once that initial burst of notifications, if any,
is over.
The normal notifications are sent in a concurrent burst to all
relevant **watch** response streams whenever the server is notified of an
object create/update/delete. To account for this work, API Priority
relevant **watch** response streams whenever the server is notified of an
object create/update/delete. To account for this work, API Priority
and Fairness considers every write request to spend some additional
time occupying seats after the actual writing is done. The server
time occupying seats after the actual writing is done. The server
estimates the number of notifications to be sent and adjusts the write
request's number of seats and seat occupancy time to include this
extra work.
@ -266,7 +274,7 @@ many instances should authenticate with distinct usernames
<!--
After classifying a request into a flow, the API Priority and Fairness
feature then may assign the request to a queue. This assignment uses
feature then may assign the request to a queue. This assignment uses
a technique known as {{< glossary_tooltip term_id="shuffle-sharding"
text="shuffle sharding" >}}, which makes relatively efficient use of
queues to insulate low-intensity flows from high-intensity flows.
@ -353,7 +361,7 @@ API 服务器的总并发量限制通过这些份额按例分配到现有 Priori
<!--
In the versions before `v1beta3` the relevant
PriorityLevelConfiguration field is named "assured concurrency shares"
rather than "nominal concurrency shares". Also, in Kubernetes release
rather than "nominal concurrency shares". Also, in Kubernetes release
1.25 and earlier there were no periodic adjustments: the
nominal/assured limits were always applied without adjustment.
-->
@ -365,12 +373,12 @@ nominal/assured limits were always applied without adjustment.
<!--
The bounds on how much concurrency a priority level may lend and how
much it may borrow are expressed in the PriorityLevelConfiguration as
percentages of the level's nominal limit. These are resolved to
percentages of the level's nominal limit. These are resolved to
absolute numbers of seats by multiplying with the nominal limit /
100.0 and rounding. The dynamically adjusted concurrency limit of a
100.0 and rounding. The dynamically adjusted concurrency limit of a
priority level is constrained to lie between (a) a lower bound of its
nominal limit minus its lendable seats and (b) an upper bound of its
nominal limit plus the seats it may borrow. At each adjustment the
nominal limit plus the seats it may borrow. At each adjustment the
dynamic limits are derived by each priority level reclaiming any lent
seats for which demand recently appeared and then jointly fairly
responding to the recent seat demand on the priority levels, within
@ -473,7 +481,7 @@ https://play.golang.org/p/Gi0PLgVHiUg , which computes this table.
{{< table caption = "混分切片配置示例" >}}
<!-- HandSize | Queues | 1 elephant | 4 elephants | 16 elephants -->
随机分片 | 队列数 | 1 个大象 | 4 个大象 | 16 个大象
随机分片 | 队列数 | 1 头大象 | 4 头大象 | 16 头大象
|----------|-----------|------------|----------------|--------------------|
| 12 | 32 | 4.428838398950118e-09 | 0.11431348830099144 | 0.9935089607656024 |
| 10 | 32 | 1.550093439632541e-08 | 0.0626479840223545 | 0.9753101519027554 |
@ -512,7 +520,7 @@ with the highest `matchingPrecedence`. If multiple FlowSchemas with equal
smaller `name` will win, but it's better not to rely on this, and instead to
ensure that no two FlowSchemas have the same `matchingPrecedence`.
-->
对一个请求来说,只有首个匹配的 FlowSchema 才有意义。
对一个请求来说,只有首个匹配的 FlowSchema 才有意义。
如果一个入站请求与多个 FlowSchema 匹配,则将基于逻辑上最高优先级 `matchingPrecedence` 的请求进行筛选。
如果一个请求匹配多个 FlowSchema 且 `matchingPrecedence` 的值相同,则按 `name` 的字典序选择最小,
但是最好不要依赖它,而是确保不存在两个 FlowSchema 具有相同的 `matchingPrecedence` 值。
@ -570,9 +578,9 @@ mandatory and suggested.
### Mandatory Configuration Objects
The four mandatory configuration objects reflect fixed built-in
guardrail behavior. This is behavior that the servers have before
guardrail behavior. This is behavior that the servers have before
those objects exist, and when those objects exist their specs reflect
this behavior. The four mandatory objects are as follows.
this behavior. The four mandatory objects are as follows.
-->
### 强制的配置对象 {#mandatory-configuration-objects}
@ -613,8 +621,8 @@ this behavior. The four mandatory objects are as follows.
### Suggested Configuration Objects
The suggested FlowSchemas and PriorityLevelConfigurations constitute a
reasonable default configuration. You can modify these and/or create
additional configuration objects if you want. If your cluster is
reasonable default configuration. You can modify these and/or create
additional configuration objects if you want. If your cluster is
likely to experience heavy load then you should consider what
configuration will work best.
@ -660,9 +668,11 @@ The suggested configuration groups requests into six priority levels:
<!--
* The `workload-high` priority level is for other requests from built-in
controllers.
* The `workload-low` priority level is for requests from any other service
account, which will typically include all requests from controllers running in
Pods.
* The `global-default` priority level handles all other traffic, e.g.
interactive `kubectl` commands run by nonprivileged users.
-->
@ -712,10 +722,10 @@ inconsistent with the server's guardrail behavior.
<!--
Maintenance of suggested configuration objects is designed to allow
their specs to be overridden. Deletion, on the other hand, is not
respected: maintenance will restore the object. If you do not want a
their specs to be overridden. Deletion, on the other hand, is not
respected: maintenance will restore the object. If you do not want a
suggested configuration object then you need to keep it around but set
its spec to have minimal consequences. Maintenance of suggested
its spec to have minimal consequences. Maintenance of suggested
objects is also designed to support automatic migration when a new
version of the `kube-apiserver` is rolled out, albeit potentially with
thrashing while there is a mixed population of servers.
@ -729,9 +739,9 @@ thrashing while there is a mixed population of servers.
<!--
Maintenance of a suggested configuration object consists of creating
it --- with the server's suggested spec --- if the object does not
exist. OTOH, if the object already exists, maintenance behavior
exist. OTOH, if the object already exists, maintenance behavior
depends on whether the `kube-apiservers` or the users control the
object. In the former case, the server ensures that the object's spec
object. In the former case, the server ensures that the object's spec
is what the server suggests; in the latter case, the spec is left
alone.
-->
@ -743,16 +753,16 @@ alone.
<!--
The question of who controls the object is answered by first looking
for an annotation with key `apf.kubernetes.io/autoupdate-spec`. If
for an annotation with key `apf.kubernetes.io/autoupdate-spec`. If
there is such an annotation and its value is `true` then the
kube-apiservers control the object. If there is such an annotation
and its value is `false` then the users control the object. If
kube-apiservers control the object. If there is such an annotation
and its value is `false` then the users control the object. If
neither of those conditions holds then the `metadata.generation` of the
object is consulted. If that is 1 then the kube-apiservers control
the object. Otherwise the users control the object. These rules were
object is consulted. If that is 1 then the kube-apiservers control
the object. Otherwise the users control the object. These rules were
introduced in release 1.22 and their consideration of
`metadata.generation` is for the sake of migration from the simpler
earlier behavior. Users who wish to control a suggested configuration
earlier behavior. Users who wish to control a suggested configuration
object should set its `apf.kubernetes.io/autoupdate-spec` annotation
to `false`.
-->
@ -786,7 +796,7 @@ nor suggested but are annotated
The suggested configuration gives no special treatment to the health
check requests on kube-apiservers from their local kubelets --- which
tend to use the secured port but supply no credentials. With the
tend to use the secured port but supply no credentials. With the
suggested config, these requests get assigned to the `global-default`
FlowSchema and the corresponding `global-default` priority level,
where other traffic can crowd them out.
@ -808,7 +818,7 @@ requests from rate limiting.
<!--
Making this change also allows any hostile party to then send
health-check requests that match this FlowSchema, at any volume they
like. If you have a web traffic filter or similar external security
like. If you have a web traffic filter or similar external security
mechanism to protect your cluster's API server from general internet
traffic, you can configure rules to block any health check requests
that originate from outside your cluster.
@ -861,7 +871,7 @@ poorly-behaved workloads that may be harming system health.
(cumulative since server start) of requests that were rejected,
broken down by the labels `flow_schema` (indicating the one that
matched the request), `priority_level` (indicating the one to which
the request was assigned), and `reason`. The `reason` label will be
the request was assigned), and `reason`. The `reason` label will be
one of the following values:
-->
* `apiserver_flowcontrol_rejected_requests_total` 是一个计数器向量,
@ -939,6 +949,16 @@ poorly-behaved workloads that may be harming system health.
因此你可以将一个优先级的所有 FlowSchema 的直方图相加,以得到分配给该优先级的请求的有效直方图。
{{< /note >}}
<!--
* `apiserver_flowcontrol_nominal_limit_seats` is a gauge vector
holding each priority level's nominal concurrency limit, computed
from the API server's total concurrency limit and the priority
level's configured nominal concurrency shares.
-->
* `apiserver_flowcontrol_nominal_limit_seats` 是一个测量向量,
记录了每个优先级的额定并发限制。
此值是根据 API 服务器的总并发限制和优先级的配置额定并发份额计算得出的。
<!--
#### Maturity level ALPHA
-->
@ -949,7 +969,7 @@ poorly-behaved workloads that may be harming system health.
high water marks of the number of queued requests, grouped by a
label named `request_kind` whose value is `mutating` or `readOnly`.
These high water marks describe the largest number seen in the one
second window most recently completed. These complement the older
second window most recently completed. These complement the older
`apiserver_current_inflight_requests` gauge vector that holds the
last window's high water mark of number of requests actively being
served.
@ -976,7 +996,7 @@ poorly-behaved workloads that may be harming system health.
nanosecond, of the number of requests broken down by the labels
`phase` (which takes on the values `waiting` and `executing`) and
`request_kind` (which takes on the values `mutating` and
`readOnly`). Each observed value is a ratio, between 0 and 1, of
`readOnly`). Each observed value is a ratio, between 0 and 1, of
the number of requests divided by the corresponding limit on the
number of requests (queue volume limit for waiting and concurrency
limit for executing).
@ -1000,7 +1020,7 @@ poorly-behaved workloads that may be harming system health.
histogram vector of observations, made at the end of each
nanosecond, of the number of requests broken down by the labels
`phase` (which takes on the values `waiting` and `executing`) and
`priority_level`. Each observed value is a ratio, between 0 and 1,
`priority_level`. Each observed value is a ratio, between 0 and 1,
of a number of requests divided by the corresponding limit on the
number of requests (queue volume limit for waiting and concurrency
limit for executing).
@ -1015,13 +1035,13 @@ poorly-behaved workloads that may be harming system health.
* `apiserver_flowcontrol_priority_level_seat_utilization` is a
histogram vector of observations, made at the end of each
nanosecond, of the utilization of a priority level's concurrency
limit, broken down by `priority_level`. This utilization is the
fraction (number of seats occupied) / (concurrency limit). This
limit, broken down by `priority_level`. This utilization is the
fraction (number of seats occupied) / (concurrency limit). This
metric considers all stages of execution (both normal and the extra
delay at the end of a write to cover for the corresponding
notification work) of all requests except WATCHes; for those it
considers only the initial stage that delivers notifications of
pre-existing objects. Each histogram in the vector is also labeled
pre-existing objects. Each histogram in the vector is also labeled
with `phase: executing` (there is no seat limit for the waiting
phase).
-->
@ -1062,9 +1082,9 @@ poorly-behaved workloads that may be harming system health.
<!--
* `apiserver_flowcontrol_request_concurrency_limit` is the same as
`apiserver_flowcontrol_nominal_limit_seats`. Before the
introduction of concurrency borrowing between priority levels, this
was always equal to `apiserver_flowcontrol_current_limit_seats`
`apiserver_flowcontrol_nominal_limit_seats`. Before the
introduction of concurrency borrowing between priority levels,
this was always equal to `apiserver_flowcontrol_current_limit_seats`
(which did not exist as a distinct metric).
-->
* `apiserver_flowcontrol_request_concurrency_limit`
@ -1087,8 +1107,8 @@ poorly-behaved workloads that may be harming system health.
<!--
* `apiserver_flowcontrol_demand_seats` is a histogram vector counting
observations, at the end of every nanosecond, of each priority
level's ratio of (seat demand) / (nominal concurrency limit). A
priority level's seat demand is the sum, over both queued requests
level's ratio of (seat demand) / (nominal concurrency limit).
A priority level's seat demand is the sum, over both queued requests
and those in the initial phase of execution, of the maximum of the
number of seats occupied in the request's initial and final
execution phases.
@ -1418,9 +1438,9 @@ FlowSchema 将这些列表调用与其他请求隔离开来。
<!--
- You can visit flow control [reference doc](/docs/reference/debug-cluster/flow-control/) to learn more about troubleshooting.
- For background information on design details for API priority and fairness, see
the [enhancement proposal](https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1040-priority-and-fairness).
the [enhancement proposal](https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1040-priority-and-fairness).
- You can make suggestions and feature requests via [SIG API Machinery](https://github.com/kubernetes/community/tree/master/sig-api-machinery)
or the feature's [slack channel](https://kubernetes.slack.com/messages/api-priority-and-fairness).
or the feature's [slack channel](https://kubernetes.slack.com/messages/api-priority-and-fairness).
-->
- 你可以查阅流控[参考文档](/zh-cn/docs/reference/debug-cluster/flow-control/)了解有关故障排查的更多信息。
- 有关 API 优先级和公平性的设计细节的背景信息,