Describe the APF tweaks for LIST and WATCH

2022-08-08 17:09:46 -04:00 · 2022-08-08 17:09:46 -04:00 · 053dc48234
parent 17c3350879
commit 053dc48234
1 changed files with 40 additions and 4 deletions
--- a/content/en/docs/concepts/cluster-administration/flow-control.md
+++ b/content/en/docs/concepts/cluster-administration/flow-control.md
@ -31,10 +31,12 @@ use informers and react to failures of API requests with exponential
 back-off, and other clients that also work this way.
 {{< caution >}}
-Requests classified as "long-running" — primarily watches — are not
+Some requests classified as "long-running" — such as remote command
-subject to the API Priority and Fairness filter. This is also true for
+execution or log tailing — are not subject to the API Priority and
-the `--max-requests-inflight` flag without the API Priority and
+Fairness filter. This is also true for the `--max-requests-inflight`
-Fairness feature enabled.
+flag without the API Priority and Fairness feature enabled.  WATCH
 requests are considered long-running if API Priority and Fairness is
 disabled, NOT long-running if it enabled.
 {{< /caution >}}
 <!-- body -->
@ -93,6 +95,40 @@ Pods. This means that an ill-behaved Pod that floods the API server with
 requests cannot prevent leader election or actions by the built-in controllers
 from succeeding.
 ### Request Width
 The above description of concurrency management is the baseline story.
 In it, all requests have equal "width": each takes up one "seat", one
 unit of concurrency.
 But some requests take up more than one seat.  Some of these are LIST
 requests that the server estimates will return a large number of
 objects.  These have been found to put an exceptionally heavy burden
 on the server, among requests that take a similar amount of time to
 run.  For this reason, the server estimates the number of objects that
 will be returned and considers the request to take a number of seats
 that is proportional to that estimated number.
 ### Execution Time Tweaks for WATCH
 API Priority and Fairness manages WATCH requests but this involves a
 couple more excursions from the baseline behavior.  The first concerns
 how long a WATCH request is considered to occupy its seat.  Depending
 on request parameters, the response to a WATCH request may or may not
 begin with CREATE notifications for all the relevant pre-existing
 objects.  API Priority and Fairness considers a WATCH request to be
 done with its seat once that initial burst of notifications, if any,
 is over.
 The normal notifications are sent in a concurrent burst to all
 relevant WATCH response streams whenever the server is notified of an
 object create/update/delete.  To account for this work, API Priority
 and Fairness consiers every write request to spend some additional
 time occupying seats after the actual writing is done.  The server
 estimates the number of notifications to be sent and adjusts the write
 request's number of seats and seat occupancy time to include this
 extra work.
 ### Queuing
 Even within a priority level there may be a large number of distinct sources of