Clean up flow-control.md

pull/42873/head
Michael 2023-09-04 20:59:54 +08:00
parent 64b2336468
commit b8cc58a7a3
1 changed files with 74 additions and 73 deletions

View File

@ -792,14 +792,15 @@ your requests.
To detect whether requests are being rejected due to APF, check the following To detect whether requests are being rejected due to APF, check the following
metrics: metrics:
- apiserver_flowcontrol_rejected_requests_total: the total number of requests - apiserver_flowcontrol_rejected_requests_total: the total number of requests
rejected per FlowSchema and PriorityLevelConfiguration. rejected per FlowSchema and PriorityLevelConfiguration.
- apiserver_flowcontrol_current_inqueue_requests: the current number of requests - apiserver_flowcontrol_current_inqueue_requests: the current number of requests
queued per FlowSchema and PriorityLevelConfiguration. queued per FlowSchema and PriorityLevelConfiguration.
- apiserver_flowcontrol_request_wait_duration_seconds: the latency added to - apiserver_flowcontrol_request_wait_duration_seconds: the latency added to
requests waiting in queues. requests waiting in queues.
- apiserver_flowcontrol_priority_level_seat_utilization: the seat utilization - apiserver_flowcontrol_priority_level_seat_utilization: the seat utilization
per PriorityLevelConfiguration. per PriorityLevelConfiguration.
### Workload modifications {#good-practice-workload-modifications} ### Workload modifications {#good-practice-workload-modifications}
@ -807,21 +808,21 @@ To prevent requests from queuing and adding latency or being dropped due to APF,
you can optimize your requests by: you can optimize your requests by:
- Reducing the rate at which requests are executed. A fewer number of requests - Reducing the rate at which requests are executed. A fewer number of requests
over a fixed period will result in a fewer number of seats being needed at a over a fixed period will result in a fewer number of seats being needed at a
given time. given time.
- Avoid issuing a large number of expensive requests concurrently. Requests can - Avoid issuing a large number of expensive requests concurrently. Requests can
be optimized to use fewer seats or have lower latency so that these requests be optimized to use fewer seats or have lower latency so that these requests
hold those seats for a shorter duration. List requests can occupy more than 1 hold those seats for a shorter duration. List requests can occupy more than 1
seat depending on the number of objects fetched during the request. Restricting seat depending on the number of objects fetched during the request. Restricting
the number of objects retrieved in a list request, for example by using the number of objects retrieved in a list request, for example by using
pagination, will use less total seats over a shorter period. Furthermore, pagination, will use less total seats over a shorter period. Furthermore,
replacing list requests with watch requests will require lower total concurrency replacing list requests with watch requests will require lower total concurrency
shares as watch requests only occupy 1 seat during its initial burst of shares as watch requests only occupy 1 seat during its initial burst of
notifications. If using streaming lists in versions 1.27 and later, watch notifications. If using streaming lists in versions 1.27 and later, watch
requests will occupy the same number of seats as a list request for its initial requests will occupy the same number of seats as a list request for its initial
burst of notifications because the entire state of the collection has to be burst of notifications because the entire state of the collection has to be
streamed. Note that in both cases, a watch request will not hold any seats after streamed. Note that in both cases, a watch request will not hold any seats after
this initial phase. this initial phase.
Keep in mind that queuing or rejected requests from APF could be induced by Keep in mind that queuing or rejected requests from APF could be induced by
either an increase in the number of requests or an increase in latency for either an increase in the number of requests or an increase in latency for
@ -840,33 +841,34 @@ objects or create new objects of these types to better accommodate your
workload. workload.
APF settings can be modified to: APF settings can be modified to:
- Give more seats to high priority requests. - Give more seats to high priority requests.
- Isolate non-essential or expensive requests that would starve a concurrency - Isolate non-essential or expensive requests that would starve a concurrency
level if it was shared with other flows. level if it was shared with other flows.
#### Give more seats to high priority requests #### Give more seats to high priority requests
1. If possible, the number of seats available across all priority levels for a 1. If possible, the number of seats available across all priority levels for a
particular `kube-apiserver` can be increased by increasing the values for the particular `kube-apiserver` can be increased by increasing the values for the
`max-requests-inflight` and `max-mutating-requests-inflight` flags. Alternatively, `max-requests-inflight` and `max-mutating-requests-inflight` flags. Alternatively,
horizontally scaling the number of `kube-apiserver` instances will increase the horizontally scaling the number of `kube-apiserver` instances will increase the
total concurrency per priority level across the cluster assuming there is total concurrency per priority level across the cluster assuming there is
sufficient load balancing of requests. sufficient load balancing of requests.
2. You can create a new FlowSchema which references a PriorityLevelConfiguration 1. You can create a new FlowSchema which references a PriorityLevelConfiguration
with a larger concurrency level. This new PriorityLevelConfiguration could be an with a larger concurrency level. This new PriorityLevelConfiguration could be an
existing level or a new level with its own set of nominal concurrency shares. existing level or a new level with its own set of nominal concurrency shares.
For example, a new FlowSchema could be introduced to change the For example, a new FlowSchema could be introduced to change the
PriorityLevelConfiguration for your requests from global-default to workload-low PriorityLevelConfiguration for your requests from global-default to workload-low
to increase the number of seats available to your user. Creating a new to increase the number of seats available to your user. Creating a new
PriorityLevelConfiguration will reduce the number of seats designated for PriorityLevelConfiguration will reduce the number of seats designated for
existing levels. Recall that editing a default FlowSchema or existing levels. Recall that editing a default FlowSchema or
PriorityLevelConfiguration will require setting the PriorityLevelConfiguration will require setting the
`apf.kubernetes.io/autoupdate-spec` annotation to false. `apf.kubernetes.io/autoupdate-spec` annotation to false.
3. You can also increase the NominalConcurrencyShares for the 1. You can also increase the NominalConcurrencyShares for the
PriorityLevelConfiguration which is serving your high priority requests. PriorityLevelConfiguration which is serving your high priority requests.
Alternatively, for versions 1.26 and later, you can increase the LendablePercent Alternatively, for versions 1.26 and later, you can increase the LendablePercent
for competing priority levels so that the given priority level has a higher pool for competing priority levels so that the given priority level has a higher pool
of seats it can borrow. of seats it can borrow.
#### Isolate non-essential requests from starving other flows #### Isolate non-essential requests from starving other flows
@ -886,17 +888,16 @@ Example FlowSchema object to isolate list event requests:
{{% code file="priority-and-fairness/list-events-default-service-account.yaml" %}} {{% code file="priority-and-fairness/list-events-default-service-account.yaml" %}}
- This FlowSchema captures all list event calls made by the default service - This FlowSchema captures all list event calls made by the default service
account in the default namespace. The matching precedence 8000 is lower than the account in the default namespace. The matching precedence 8000 is lower than the
value of 9000 used by the existing service-accounts FlowSchema so these list value of 9000 used by the existing service-accounts FlowSchema so these list
event calls will match list-events-default-service-account rather than event calls will match list-events-default-service-account rather than
service-accounts. service-accounts.
- The catch-all PriorityLevelConfiguration is used to isolate these requests. - The catch-all PriorityLevelConfiguration is used to isolate these requests.
The catch-all priority level has a very small concurrency share and does not The catch-all priority level has a very small concurrency share and does not
queue requests. queue requests.
## {{% heading "whatsnext" %}} ## {{% heading "whatsnext" %}}
For background information on design details for API priority and fairness, see For background information on design details for API priority and fairness, see
the [enhancement proposal](https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1040-priority-and-fairness). the [enhancement proposal](https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1040-priority-and-fairness).
You can make suggestions and feature requests via [SIG API Machinery](https://github.com/kubernetes/community/tree/master/sig-api-machinery) You can make suggestions and feature requests via [SIG API Machinery](https://github.com/kubernetes/community/tree/master/sig-api-machinery)