Merge pull request #46410 from windsonsea/flowc
[zh] Sync cluster-administration/flow-control.mdpull/46417/head
commit
909af47790
|
@ -36,7 +36,7 @@ The API Priority and Fairness feature (APF) is an alternative that improves upon
|
|||
aforementioned max-inflight limitations. APF classifies
|
||||
and isolates requests in a more fine-grained way. It also introduces
|
||||
a limited amount of queuing, so that no requests are rejected in cases
|
||||
of very brief bursts. Requests are dispatched from queues using a
|
||||
of very brief bursts. Requests are dispatched from queues using a
|
||||
fair queuing technique so that, for example, a poorly-behaved
|
||||
{{< glossary_tooltip text="controller" term_id="controller" >}} need not
|
||||
starve others (even at the same priority level).
|
||||
|
@ -81,16 +81,17 @@ APF 适用于 **watch** 请求。当 APF 被禁用时,**watch** 请求不受 `
|
|||
|
||||
<!--
|
||||
The API Priority and Fairness feature is controlled by a command-line flag
|
||||
and is enabled by default. See
|
||||
and is enabled by default. See
|
||||
[Options](/docs/reference/command-line-tools-reference/kube-apiserver/#options)
|
||||
for a general explanation of the available kube-apiserver command-line
|
||||
options and how to enable and disable them. The name of the
|
||||
command-line option for APF is "--enable-priority-and-fairness". This feature
|
||||
options and how to enable and disable them. The name of the
|
||||
command-line option for APF is "--enable-priority-and-fairness". This feature
|
||||
also involves an {{<glossary_tooltip term_id="api-group" text="API Group" >}}
|
||||
with: (a) a stable `v1` version, introduced in 1.29, and
|
||||
enabled by default (b) a `v1beta3` version, enabled by default, and
|
||||
deprecated in v1.29. You can
|
||||
deprecated in v1.29. You can
|
||||
disable the API group beta version `v1beta3` by adding the
|
||||
following command-line flags to your `kube-apiserver` invocation:
|
||||
-->
|
||||
API 优先级与公平性(APF)特性由命令行标志控制,默认情况下启用。
|
||||
有关可用 kube-apiserver 命令行参数以及如何启用和禁用的说明,
|
||||
|
@ -101,6 +102,13 @@ APF 的命令行参数是 "--enable-priority-and-fairness"。
|
|||
(b) `v1beta3` 版本,默认被启用,在 1.29 中被弃用。
|
||||
你可以通过添加以下内容来禁用 Beta 版的 `v1beta3` API 组:
|
||||
|
||||
<!--
|
||||
```shell
|
||||
kube-apiserver \
|
||||
--runtime-config=flowcontrol.apiserver.k8s.io/v1beta3=false \
|
||||
# …and other flags as usual
|
||||
```
|
||||
-->
|
||||
```shell
|
||||
kube-apiserver \
|
||||
--runtime-config=flowcontrol.apiserver.k8s.io/v1beta3=false \
|
||||
|
@ -164,7 +172,7 @@ from succeeding.
|
|||
<!--
|
||||
The concurrency limits of the priority levels are periodically
|
||||
adjusted, allowing under-utilized priority levels to temporarily lend
|
||||
concurrency to heavily-utilized levels. These limits are based on
|
||||
concurrency to heavily-utilized levels. These limits are based on
|
||||
nominal limits and bounds on how much concurrency a priority level may
|
||||
lend and how much it may borrow, all derived from the configuration
|
||||
objects mentioned below.
|
||||
|
@ -184,10 +192,10 @@ word "seat" is used to mean one unit of concurrency, inspired by the
|
|||
way each passenger on a train or aircraft takes up one of the fixed
|
||||
supply of seats.
|
||||
|
||||
But some requests take up more than one seat. Some of these are **list**
|
||||
But some requests take up more than one seat. Some of these are **list**
|
||||
requests that the server estimates will return a large number of
|
||||
objects. These have been found to put an exceptionally heavy burden
|
||||
on the server. For this reason, the server estimates the number of objects
|
||||
objects. These have been found to put an exceptionally heavy burden
|
||||
on the server. For this reason, the server estimates the number of objects
|
||||
that will be returned and considers the request to take a number of seats
|
||||
that is proportional to that estimated number.
|
||||
-->
|
||||
|
@ -206,19 +214,19 @@ that is proportional to that estimated number.
|
|||
### Execution time tweaks for watch requests
|
||||
|
||||
API Priority and Fairness manages **watch** requests, but this involves a
|
||||
couple more excursions from the baseline behavior. The first concerns
|
||||
how long a **watch** request is considered to occupy its seat. Depending
|
||||
on request parameters, the response to a **watch** request may or may not
|
||||
begin with **create** notifications for all the relevant pre-existing
|
||||
objects. API Priority and Fairness considers a **watch** request to be
|
||||
couple more excursions from the baseline behavior. The first concerns
|
||||
how long a **watch** request is considered to occupy its seat. Depending
|
||||
on request parameters, the response to a **watch** request may or may not
|
||||
begin with **create** notifications for all the relevant pre-existing
|
||||
objects. API Priority and Fairness considers a **watch** request to be
|
||||
done with its seat once that initial burst of notifications, if any,
|
||||
is over.
|
||||
|
||||
The normal notifications are sent in a concurrent burst to all
|
||||
relevant **watch** response streams whenever the server is notified of an
|
||||
object create/update/delete. To account for this work, API Priority
|
||||
relevant **watch** response streams whenever the server is notified of an
|
||||
object create/update/delete. To account for this work, API Priority
|
||||
and Fairness considers every write request to spend some additional
|
||||
time occupying seats after the actual writing is done. The server
|
||||
time occupying seats after the actual writing is done. The server
|
||||
estimates the number of notifications to be sent and adjusts the write
|
||||
request's number of seats and seat occupancy time to include this
|
||||
extra work.
|
||||
|
@ -266,7 +274,7 @@ many instances should authenticate with distinct usernames
|
|||
|
||||
<!--
|
||||
After classifying a request into a flow, the API Priority and Fairness
|
||||
feature then may assign the request to a queue. This assignment uses
|
||||
feature then may assign the request to a queue. This assignment uses
|
||||
a technique known as {{< glossary_tooltip term_id="shuffle-sharding"
|
||||
text="shuffle sharding" >}}, which makes relatively efficient use of
|
||||
queues to insulate low-intensity flows from high-intensity flows.
|
||||
|
@ -353,7 +361,7 @@ API 服务器的总并发量限制通过这些份额按例分配到现有 Priori
|
|||
<!--
|
||||
In the versions before `v1beta3` the relevant
|
||||
PriorityLevelConfiguration field is named "assured concurrency shares"
|
||||
rather than "nominal concurrency shares". Also, in Kubernetes release
|
||||
rather than "nominal concurrency shares". Also, in Kubernetes release
|
||||
1.25 and earlier there were no periodic adjustments: the
|
||||
nominal/assured limits were always applied without adjustment.
|
||||
-->
|
||||
|
@ -365,12 +373,12 @@ nominal/assured limits were always applied without adjustment.
|
|||
<!--
|
||||
The bounds on how much concurrency a priority level may lend and how
|
||||
much it may borrow are expressed in the PriorityLevelConfiguration as
|
||||
percentages of the level's nominal limit. These are resolved to
|
||||
percentages of the level's nominal limit. These are resolved to
|
||||
absolute numbers of seats by multiplying with the nominal limit /
|
||||
100.0 and rounding. The dynamically adjusted concurrency limit of a
|
||||
100.0 and rounding. The dynamically adjusted concurrency limit of a
|
||||
priority level is constrained to lie between (a) a lower bound of its
|
||||
nominal limit minus its lendable seats and (b) an upper bound of its
|
||||
nominal limit plus the seats it may borrow. At each adjustment the
|
||||
nominal limit plus the seats it may borrow. At each adjustment the
|
||||
dynamic limits are derived by each priority level reclaiming any lent
|
||||
seats for which demand recently appeared and then jointly fairly
|
||||
responding to the recent seat demand on the priority levels, within
|
||||
|
@ -473,7 +481,7 @@ https://play.golang.org/p/Gi0PLgVHiUg , which computes this table.
|
|||
|
||||
{{< table caption = "混分切片配置示例" >}}
|
||||
<!-- HandSize | Queues | 1 elephant | 4 elephants | 16 elephants -->
|
||||
随机分片 | 队列数 | 1 个大象 | 4 个大象 | 16 个大象
|
||||
随机分片 | 队列数 | 1 头大象 | 4 头大象 | 16 头大象
|
||||
|----------|-----------|------------|----------------|--------------------|
|
||||
| 12 | 32 | 4.428838398950118e-09 | 0.11431348830099144 | 0.9935089607656024 |
|
||||
| 10 | 32 | 1.550093439632541e-08 | 0.0626479840223545 | 0.9753101519027554 |
|
||||
|
@ -512,7 +520,7 @@ with the highest `matchingPrecedence`. If multiple FlowSchemas with equal
|
|||
smaller `name` will win, but it's better not to rely on this, and instead to
|
||||
ensure that no two FlowSchemas have the same `matchingPrecedence`.
|
||||
-->
|
||||
对一个请求来说,只有首个匹配的 FlowSchema 才有意义。
|
||||
对一个请求来说,只有首个匹配的 FlowSchema 才有意义。
|
||||
如果一个入站请求与多个 FlowSchema 匹配,则将基于逻辑上最高优先级 `matchingPrecedence` 的请求进行筛选。
|
||||
如果一个请求匹配多个 FlowSchema 且 `matchingPrecedence` 的值相同,则按 `name` 的字典序选择最小,
|
||||
但是最好不要依赖它,而是确保不存在两个 FlowSchema 具有相同的 `matchingPrecedence` 值。
|
||||
|
@ -570,9 +578,9 @@ mandatory and suggested.
|
|||
### Mandatory Configuration Objects
|
||||
|
||||
The four mandatory configuration objects reflect fixed built-in
|
||||
guardrail behavior. This is behavior that the servers have before
|
||||
guardrail behavior. This is behavior that the servers have before
|
||||
those objects exist, and when those objects exist their specs reflect
|
||||
this behavior. The four mandatory objects are as follows.
|
||||
this behavior. The four mandatory objects are as follows.
|
||||
-->
|
||||
### 强制的配置对象 {#mandatory-configuration-objects}
|
||||
|
||||
|
@ -613,8 +621,8 @@ this behavior. The four mandatory objects are as follows.
|
|||
### Suggested Configuration Objects
|
||||
|
||||
The suggested FlowSchemas and PriorityLevelConfigurations constitute a
|
||||
reasonable default configuration. You can modify these and/or create
|
||||
additional configuration objects if you want. If your cluster is
|
||||
reasonable default configuration. You can modify these and/or create
|
||||
additional configuration objects if you want. If your cluster is
|
||||
likely to experience heavy load then you should consider what
|
||||
configuration will work best.
|
||||
|
||||
|
@ -660,9 +668,11 @@ The suggested configuration groups requests into six priority levels:
|
|||
<!--
|
||||
* The `workload-high` priority level is for other requests from built-in
|
||||
controllers.
|
||||
|
||||
* The `workload-low` priority level is for requests from any other service
|
||||
account, which will typically include all requests from controllers running in
|
||||
Pods.
|
||||
|
||||
* The `global-default` priority level handles all other traffic, e.g.
|
||||
interactive `kubectl` commands run by nonprivileged users.
|
||||
-->
|
||||
|
@ -712,10 +722,10 @@ inconsistent with the server's guardrail behavior.
|
|||
|
||||
<!--
|
||||
Maintenance of suggested configuration objects is designed to allow
|
||||
their specs to be overridden. Deletion, on the other hand, is not
|
||||
respected: maintenance will restore the object. If you do not want a
|
||||
their specs to be overridden. Deletion, on the other hand, is not
|
||||
respected: maintenance will restore the object. If you do not want a
|
||||
suggested configuration object then you need to keep it around but set
|
||||
its spec to have minimal consequences. Maintenance of suggested
|
||||
its spec to have minimal consequences. Maintenance of suggested
|
||||
objects is also designed to support automatic migration when a new
|
||||
version of the `kube-apiserver` is rolled out, albeit potentially with
|
||||
thrashing while there is a mixed population of servers.
|
||||
|
@ -729,9 +739,9 @@ thrashing while there is a mixed population of servers.
|
|||
<!--
|
||||
Maintenance of a suggested configuration object consists of creating
|
||||
it --- with the server's suggested spec --- if the object does not
|
||||
exist. OTOH, if the object already exists, maintenance behavior
|
||||
exist. OTOH, if the object already exists, maintenance behavior
|
||||
depends on whether the `kube-apiservers` or the users control the
|
||||
object. In the former case, the server ensures that the object's spec
|
||||
object. In the former case, the server ensures that the object's spec
|
||||
is what the server suggests; in the latter case, the spec is left
|
||||
alone.
|
||||
-->
|
||||
|
@ -743,16 +753,16 @@ alone.
|
|||
|
||||
<!--
|
||||
The question of who controls the object is answered by first looking
|
||||
for an annotation with key `apf.kubernetes.io/autoupdate-spec`. If
|
||||
for an annotation with key `apf.kubernetes.io/autoupdate-spec`. If
|
||||
there is such an annotation and its value is `true` then the
|
||||
kube-apiservers control the object. If there is such an annotation
|
||||
and its value is `false` then the users control the object. If
|
||||
kube-apiservers control the object. If there is such an annotation
|
||||
and its value is `false` then the users control the object. If
|
||||
neither of those conditions holds then the `metadata.generation` of the
|
||||
object is consulted. If that is 1 then the kube-apiservers control
|
||||
the object. Otherwise the users control the object. These rules were
|
||||
object is consulted. If that is 1 then the kube-apiservers control
|
||||
the object. Otherwise the users control the object. These rules were
|
||||
introduced in release 1.22 and their consideration of
|
||||
`metadata.generation` is for the sake of migration from the simpler
|
||||
earlier behavior. Users who wish to control a suggested configuration
|
||||
earlier behavior. Users who wish to control a suggested configuration
|
||||
object should set its `apf.kubernetes.io/autoupdate-spec` annotation
|
||||
to `false`.
|
||||
-->
|
||||
|
@ -786,7 +796,7 @@ nor suggested but are annotated
|
|||
|
||||
The suggested configuration gives no special treatment to the health
|
||||
check requests on kube-apiservers from their local kubelets --- which
|
||||
tend to use the secured port but supply no credentials. With the
|
||||
tend to use the secured port but supply no credentials. With the
|
||||
suggested config, these requests get assigned to the `global-default`
|
||||
FlowSchema and the corresponding `global-default` priority level,
|
||||
where other traffic can crowd them out.
|
||||
|
@ -808,7 +818,7 @@ requests from rate limiting.
|
|||
<!--
|
||||
Making this change also allows any hostile party to then send
|
||||
health-check requests that match this FlowSchema, at any volume they
|
||||
like. If you have a web traffic filter or similar external security
|
||||
like. If you have a web traffic filter or similar external security
|
||||
mechanism to protect your cluster's API server from general internet
|
||||
traffic, you can configure rules to block any health check requests
|
||||
that originate from outside your cluster.
|
||||
|
@ -861,7 +871,7 @@ poorly-behaved workloads that may be harming system health.
|
|||
(cumulative since server start) of requests that were rejected,
|
||||
broken down by the labels `flow_schema` (indicating the one that
|
||||
matched the request), `priority_level` (indicating the one to which
|
||||
the request was assigned), and `reason`. The `reason` label will be
|
||||
the request was assigned), and `reason`. The `reason` label will be
|
||||
one of the following values:
|
||||
-->
|
||||
* `apiserver_flowcontrol_rejected_requests_total` 是一个计数器向量,
|
||||
|
@ -939,6 +949,16 @@ poorly-behaved workloads that may be harming system health.
|
|||
因此你可以将一个优先级的所有 FlowSchema 的直方图相加,以得到分配给该优先级的请求的有效直方图。
|
||||
{{< /note >}}
|
||||
|
||||
<!--
|
||||
* `apiserver_flowcontrol_nominal_limit_seats` is a gauge vector
|
||||
holding each priority level's nominal concurrency limit, computed
|
||||
from the API server's total concurrency limit and the priority
|
||||
level's configured nominal concurrency shares.
|
||||
-->
|
||||
* `apiserver_flowcontrol_nominal_limit_seats` 是一个测量向量,
|
||||
记录了每个优先级的额定并发限制。
|
||||
此值是根据 API 服务器的总并发限制和优先级的配置额定并发份额计算得出的。
|
||||
|
||||
<!--
|
||||
#### Maturity level ALPHA
|
||||
-->
|
||||
|
@ -949,7 +969,7 @@ poorly-behaved workloads that may be harming system health.
|
|||
high water marks of the number of queued requests, grouped by a
|
||||
label named `request_kind` whose value is `mutating` or `readOnly`.
|
||||
These high water marks describe the largest number seen in the one
|
||||
second window most recently completed. These complement the older
|
||||
second window most recently completed. These complement the older
|
||||
`apiserver_current_inflight_requests` gauge vector that holds the
|
||||
last window's high water mark of number of requests actively being
|
||||
served.
|
||||
|
@ -976,7 +996,7 @@ poorly-behaved workloads that may be harming system health.
|
|||
nanosecond, of the number of requests broken down by the labels
|
||||
`phase` (which takes on the values `waiting` and `executing`) and
|
||||
`request_kind` (which takes on the values `mutating` and
|
||||
`readOnly`). Each observed value is a ratio, between 0 and 1, of
|
||||
`readOnly`). Each observed value is a ratio, between 0 and 1, of
|
||||
the number of requests divided by the corresponding limit on the
|
||||
number of requests (queue volume limit for waiting and concurrency
|
||||
limit for executing).
|
||||
|
@ -1000,7 +1020,7 @@ poorly-behaved workloads that may be harming system health.
|
|||
histogram vector of observations, made at the end of each
|
||||
nanosecond, of the number of requests broken down by the labels
|
||||
`phase` (which takes on the values `waiting` and `executing`) and
|
||||
`priority_level`. Each observed value is a ratio, between 0 and 1,
|
||||
`priority_level`. Each observed value is a ratio, between 0 and 1,
|
||||
of a number of requests divided by the corresponding limit on the
|
||||
number of requests (queue volume limit for waiting and concurrency
|
||||
limit for executing).
|
||||
|
@ -1015,13 +1035,13 @@ poorly-behaved workloads that may be harming system health.
|
|||
* `apiserver_flowcontrol_priority_level_seat_utilization` is a
|
||||
histogram vector of observations, made at the end of each
|
||||
nanosecond, of the utilization of a priority level's concurrency
|
||||
limit, broken down by `priority_level`. This utilization is the
|
||||
fraction (number of seats occupied) / (concurrency limit). This
|
||||
limit, broken down by `priority_level`. This utilization is the
|
||||
fraction (number of seats occupied) / (concurrency limit). This
|
||||
metric considers all stages of execution (both normal and the extra
|
||||
delay at the end of a write to cover for the corresponding
|
||||
notification work) of all requests except WATCHes; for those it
|
||||
considers only the initial stage that delivers notifications of
|
||||
pre-existing objects. Each histogram in the vector is also labeled
|
||||
pre-existing objects. Each histogram in the vector is also labeled
|
||||
with `phase: executing` (there is no seat limit for the waiting
|
||||
phase).
|
||||
-->
|
||||
|
@ -1062,9 +1082,9 @@ poorly-behaved workloads that may be harming system health.
|
|||
|
||||
<!--
|
||||
* `apiserver_flowcontrol_request_concurrency_limit` is the same as
|
||||
`apiserver_flowcontrol_nominal_limit_seats`. Before the
|
||||
introduction of concurrency borrowing between priority levels, this
|
||||
was always equal to `apiserver_flowcontrol_current_limit_seats`
|
||||
`apiserver_flowcontrol_nominal_limit_seats`. Before the
|
||||
introduction of concurrency borrowing between priority levels,
|
||||
this was always equal to `apiserver_flowcontrol_current_limit_seats`
|
||||
(which did not exist as a distinct metric).
|
||||
-->
|
||||
* `apiserver_flowcontrol_request_concurrency_limit` 与
|
||||
|
@ -1087,8 +1107,8 @@ poorly-behaved workloads that may be harming system health.
|
|||
<!--
|
||||
* `apiserver_flowcontrol_demand_seats` is a histogram vector counting
|
||||
observations, at the end of every nanosecond, of each priority
|
||||
level's ratio of (seat demand) / (nominal concurrency limit). A
|
||||
priority level's seat demand is the sum, over both queued requests
|
||||
level's ratio of (seat demand) / (nominal concurrency limit).
|
||||
A priority level's seat demand is the sum, over both queued requests
|
||||
and those in the initial phase of execution, of the maximum of the
|
||||
number of seats occupied in the request's initial and final
|
||||
execution phases.
|
||||
|
@ -1418,9 +1438,9 @@ FlowSchema 将这些列表调用与其他请求隔离开来。
|
|||
<!--
|
||||
- You can visit flow control [reference doc](/docs/reference/debug-cluster/flow-control/) to learn more about troubleshooting.
|
||||
- For background information on design details for API priority and fairness, see
|
||||
the [enhancement proposal](https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1040-priority-and-fairness).
|
||||
the [enhancement proposal](https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1040-priority-and-fairness).
|
||||
- You can make suggestions and feature requests via [SIG API Machinery](https://github.com/kubernetes/community/tree/master/sig-api-machinery)
|
||||
or the feature's [slack channel](https://kubernetes.slack.com/messages/api-priority-and-fairness).
|
||||
or the feature's [slack channel](https://kubernetes.slack.com/messages/api-priority-and-fairness).
|
||||
-->
|
||||
- 你可以查阅流控[参考文档](/zh-cn/docs/reference/debug-cluster/flow-control/)了解有关故障排查的更多信息。
|
||||
- 有关 API 优先级和公平性的设计细节的背景信息,
|
||||
|
|
Loading…
Reference in New Issue