Describe the APF tweaks for LIST and WATCH

pull/35791/head
Mike Spreitzer 2022-08-08 17:09:46 -04:00
parent 17c3350879
commit 053dc48234
1 changed files with 40 additions and 4 deletions

View File

@ -31,10 +31,12 @@ use informers and react to failures of API requests with exponential
back-off, and other clients that also work this way.
{{< caution >}}
Requests classified as "long-running" — primarily watches — are not
subject to the API Priority and Fairness filter. This is also true for
the `--max-requests-inflight` flag without the API Priority and
Fairness feature enabled.
Some requests classified as "long-running" — such as remote command
execution or log tailing — are not subject to the API Priority and
Fairness filter. This is also true for the `--max-requests-inflight`
flag without the API Priority and Fairness feature enabled. WATCH
requests are considered long-running if API Priority and Fairness is
disabled, NOT long-running if it enabled.
{{< /caution >}}
<!-- body -->
@ -93,6 +95,40 @@ Pods. This means that an ill-behaved Pod that floods the API server with
requests cannot prevent leader election or actions by the built-in controllers
from succeeding.
### Request Width
The above description of concurrency management is the baseline story.
In it, all requests have equal "width": each takes up one "seat", one
unit of concurrency.
But some requests take up more than one seat. Some of these are LIST
requests that the server estimates will return a large number of
objects. These have been found to put an exceptionally heavy burden
on the server, among requests that take a similar amount of time to
run. For this reason, the server estimates the number of objects that
will be returned and considers the request to take a number of seats
that is proportional to that estimated number.
### Execution Time Tweaks for WATCH
API Priority and Fairness manages WATCH requests but this involves a
couple more excursions from the baseline behavior. The first concerns
how long a WATCH request is considered to occupy its seat. Depending
on request parameters, the response to a WATCH request may or may not
begin with CREATE notifications for all the relevant pre-existing
objects. API Priority and Fairness considers a WATCH request to be
done with its seat once that initial burst of notifications, if any,
is over.
The normal notifications are sent in a concurrent burst to all
relevant WATCH response streams whenever the server is notified of an
object create/update/delete. To account for this work, API Priority
and Fairness consiers every write request to spend some additional
time occupying seats after the actual writing is done. The server
estimates the number of notifications to be sent and adjusts the write
request's number of seats and seat occupancy time to include this
extra work.
### Queuing
Even within a priority level there may be a large number of distinct sources of