diff --git a/content/en/docs/concepts/cluster-administration/flow-control.md b/content/en/docs/concepts/cluster-administration/flow-control.md index 9d7ed4d95e..7618a9b497 100644 --- a/content/en/docs/concepts/cluster-administration/flow-control.md +++ b/content/en/docs/concepts/cluster-administration/flow-control.md @@ -782,6 +782,118 @@ APF adds the following two headers to each HTTP response message. - `X-Kubernetes-PF-PriorityLevel-UID` holds the UID of the PriorityLevelConfiguration object associated with that FlowSchema. +## Good practices for using API Priority and Fairness + +When a given priority level exceeds its permitted concurrency, requests can +experience increased latency or be dropped with an HTTP 429 (Too Many Requests) +error. To prevent these side effects of APF, you can modify your workload or +tweak your APF settings to ensure there are sufficient seats available to serve +your requests. + +To detect whether requests are being rejected due to APF, check the following +metrics: +- apiserver_flowcontrol_rejected_requests_total: the total number of requests +rejected per FlowSchema and PriorityLevelConfiguration. +- apiserver_flowcontrol_current_inqueue_requests: the current number of requests +queued per FlowSchema and PriorityLevelConfiguration. +- apiserver_flowcontrol_request_wait_duration_seconds: the latency added to +requests waiting in queues. +- apiserver_flowcontrol_priority_level_seat_utilization: the seat utilization +per PriorityLevelConfiguration. + +### Workload modifications {#good-practice-workload-modifications} + +To prevent requests from queuing and adding latency or being dropped due to APF, +you can optimize your requests by: + +- Reducing the rate at which requests are executed. A fewer number of requests +over a fixed period will result in a fewer number of seats being needed at a +given time. +- Avoid issuing a large number of expensive requests concurrently. Requests can +be optimized to use fewer seats or have lower latency so that these requests +hold those seats for a shorter duration. List requests can occupy more than 1 +seat depending on the number of objects fetched during the request. Restricting +the number of objects retrieved in a list request, for example by using +pagination, will use less total seats over a shorter period. Furthermore, +replacing list requests with watch requests will require lower total concurrency +shares as watch requests only occupy 1 seat during its initial burst of +notifications. If using streaming lists in versions 1.27 and later, watch +requests will occupy the same number of seats as a list request for its initial +burst of notifications because the entire state of the collection has to be +streamed. Note that in both cases, a watch request will not hold any seats after +this initial phase. + +Keep in mind that queuing or rejected requests from APF could be induced by +either an increase in the number of requests or an increase in latency for +existing requests. For example, if requests that normally take 1s to execute +start taking 60s, it is possible that APF will start rejecting requests because +requests are occupying seats for a longer duration than normal due to this +increase in latency. If APF starts rejecting requests across multiple priority +levels without a significant change in workload, it is possible there is an +underlying issue with control plane performance rather than the workload or APF +settings. + +### Priority and fairness settings {#good-practice-apf-settings} + +You can also modify the default FlowSchema and PriorityLevelConfiguration +objects or create new objects of these types to better accommodate your +workload. + +APF settings can be modified to: +- Give more seats to high priority requests. +- Isolate non-essential or expensive requests that would starve a concurrency +level if it was shared with other flows. + +#### Give more seats to high priority requests + +1. If possible, the number of seats available across all priority levels for a +particular `kube-apiserver` can be increased by increasing the values for the +`max-requests-inflight` and `max-mutating-requests-inflight` flags. Alternatively, +horizontally scaling the number of `kube-apiserver` instances will increase the +total concurrency per priority level across the cluster assuming there is +sufficient load balancing of requests. +2. You can create a new FlowSchema which references a PriorityLevelConfiguration +with a larger concurrency level. This new PriorityLevelConfiguration could be an +existing level or a new level with its own set of nominal concurrency shares. +For example, a new FlowSchema could be introduced to change the +PriorityLevelConfiguration for your requests from global-default to workload-low +to increase the number of seats available to your user. Creating a new +PriorityLevelConfiguration will reduce the number of seats designated for +existing levels. Recall that editing a default FlowSchema or +PriorityLevelConfiguration will require setting the +`apf.kubernetes.io/autoupdate-spec` annotation to false. +3. You can also increase the NominalConcurrencyShares for the +PriorityLevelConfiguration which is serving your high priority requests. +Alternatively, for versions 1.26 and later, you can increase the LendablePercent +for competing priority levels so that the given priority level has a higher pool +of seats it can borrow. + +#### Isolate non-essential requests from starving other flows + +For request isolation, you can create a FlowSchema whose subject matches the +user making these requests or create a FlowSchema that matches what the request +is (corresponding to the resourceRules). Next, you can map this FlowSchema to a +PriorityLevelConfiguration with a low share of seats. + +For example, suppose list event requests from Pods running in the default namespace +are using 10 seats each and execute for 1 minute. To prevent these expensive +requests from impacting requests from other Pods using the existing service-accounts +FlowSchema, you can apply the following FlowSchema to isolate these list calls +from other requests. + +Example FlowSchema object to isolate list event requests: + +{{% code file="priority-and-fairness/list-events-default-service-account.yaml" %}} + +- This FlowSchema captures all list event calls made by the default service +account in the default namespace. The matching precedence 8000 is lower than the +value of 9000 used by the existing service-accounts FlowSchema so these list +event calls will match list-events-default-service-account rather than +service-accounts. +- The catch-all PriorityLevelConfiguration is used to isolate these requests. +The catch-all priority level has a very small concurrency share and does not +queue requests. + ## {{% heading "whatsnext" %}} diff --git a/content/en/examples/priority-and-fairness/list-events-default-service-account.yaml b/content/en/examples/priority-and-fairness/list-events-default-service-account.yaml new file mode 100644 index 0000000000..94e73ae948 --- /dev/null +++ b/content/en/examples/priority-and-fairness/list-events-default-service-account.yaml @@ -0,0 +1,25 @@ +apiVersion: flowcontrol.apiserver.k8s.io/v1beta3 +kind: FlowSchema +metadata: + name: list-events-default-service-account +spec: + distinguisherMethod: + type: ByUser + matchingPrecedence: 8000 + priorityLevelConfiguration: + name: catch-all + rules: + - resourceRules: + - apiGroups: + - '*' + namespaces: + - default + resources: + - events + verbs: + - list + subjects: + - kind: ServiceAccount + serviceAccount: + name: default + namespace: default \ No newline at end of file