From 053dc4823440e07e6d8085268a6f68f897a140ef Mon Sep 17 00:00:00 2001 From: Mike Spreitzer Date: Mon, 8 Aug 2022 17:09:46 -0400 Subject: [PATCH] Describe the APF tweaks for LIST and WATCH --- .../cluster-administration/flow-control.md | 44 +++++++++++++++++-- 1 file changed, 40 insertions(+), 4 deletions(-) diff --git a/content/en/docs/concepts/cluster-administration/flow-control.md b/content/en/docs/concepts/cluster-administration/flow-control.md index ccf1eb5887..7168c54e85 100644 --- a/content/en/docs/concepts/cluster-administration/flow-control.md +++ b/content/en/docs/concepts/cluster-administration/flow-control.md @@ -31,10 +31,12 @@ use informers and react to failures of API requests with exponential back-off, and other clients that also work this way. {{< caution >}} -Requests classified as "long-running" — primarily watches — are not -subject to the API Priority and Fairness filter. This is also true for -the `--max-requests-inflight` flag without the API Priority and -Fairness feature enabled. +Some requests classified as "long-running" — such as remote command +execution or log tailing — are not subject to the API Priority and +Fairness filter. This is also true for the `--max-requests-inflight` +flag without the API Priority and Fairness feature enabled. WATCH +requests are considered long-running if API Priority and Fairness is +disabled, NOT long-running if it enabled. {{< /caution >}} @@ -93,6 +95,40 @@ Pods. This means that an ill-behaved Pod that floods the API server with requests cannot prevent leader election or actions by the built-in controllers from succeeding. +### Request Width + +The above description of concurrency management is the baseline story. +In it, all requests have equal "width": each takes up one "seat", one +unit of concurrency. + +But some requests take up more than one seat. Some of these are LIST +requests that the server estimates will return a large number of +objects. These have been found to put an exceptionally heavy burden +on the server, among requests that take a similar amount of time to +run. For this reason, the server estimates the number of objects that +will be returned and considers the request to take a number of seats +that is proportional to that estimated number. + +### Execution Time Tweaks for WATCH + +API Priority and Fairness manages WATCH requests but this involves a +couple more excursions from the baseline behavior. The first concerns +how long a WATCH request is considered to occupy its seat. Depending +on request parameters, the response to a WATCH request may or may not +begin with CREATE notifications for all the relevant pre-existing +objects. API Priority and Fairness considers a WATCH request to be +done with its seat once that initial burst of notifications, if any, +is over. + +The normal notifications are sent in a concurrent burst to all +relevant WATCH response streams whenever the server is notified of an +object create/update/delete. To account for this work, API Priority +and Fairness consiers every write request to spend some additional +time occupying seats after the actual writing is done. The server +estimates the number of notifications to be sent and adjusts the write +request's number of seats and seat occupancy time to include this +extra work. + ### Queuing Even within a priority level there may be a large number of distinct sources of