Update scheduler framework document (#19606)

* Update scheduler framework document

* address comments

* Add page redirect and keep old link anchors

* address comment
pull/19698/head
Wei Huang 2020-03-16 16:26:22 -07:00 committed by GitHub
parent 8b6dad9c69
commit 098400c958
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 66 additions and 119 deletions

View File

@ -1,7 +1,7 @@
--- ---
title: Kubernetes Scheduler title: Kubernetes Scheduler
content_template: templates/concept content_template: templates/concept
weight: 60 weight: 50
--- ---
{{% capture overview %}} {{% capture overview %}}

View File

@ -3,14 +3,14 @@ reviewers:
- ahg-g - ahg-g
title: Scheduling Framework title: Scheduling Framework
content_template: templates/concept content_template: templates/concept
weight: 70 weight: 60
--- ---
{{% capture overview %}} {{% capture overview %}}
{{< feature-state for_k8s_version="1.15" state="alpha" >}} {{< feature-state for_k8s_version="1.15" state="alpha" >}}
The scheduling framework is a new pluggable architecture for Kubernetes Scheduler The scheduling framework is a pluggable architecture for Kubernetes Scheduler
that makes scheduler customizations easy. It adds a new set of "plugin" APIs to that makes scheduler customizations easy. It adds a new set of "plugin" APIs to
the existing scheduler. Plugins are compiled into the scheduler. The APIs the existing scheduler. Plugins are compiled into the scheduler. The APIs
allow most scheduling features to be implemented as plugins, while keeping the allow most scheduling features to be implemented as plugins, while keeping the
@ -56,16 +56,16 @@ stateful tasks.
{{< figure src="/images/docs/scheduling-framework-extensions.png" title="scheduling framework extension points" >}} {{< figure src="/images/docs/scheduling-framework-extensions.png" title="scheduling framework extension points" >}}
### Queue sort ### QueueSort {#queue-sort}
These plugins are used to sort Pods in the scheduling queue. A queue sort plugin These plugins are used to sort Pods in the scheduling queue. A queue sort plugin
essentially will provide a "less(Pod1, Pod2)" function. Only one queue sort essentially provides a `Less(Pod1, Pod2)` function. Only one queue sort
plugin may be enabled at a time. plugin may be enabled at a time.
### Pre-filter ### PreFilter {#pre-filter}
These plugins are used to pre-process info about the Pod, or to check certain These plugins are used to pre-process info about the Pod, or to check certain
conditions that the cluster or the Pod must meet. If a pre-filter plugin returns conditions that the cluster or the Pod must meet. If a PreFilter plugin returns
an error, the scheduling cycle is aborted. an error, the scheduling cycle is aborted.
### Filter ### Filter
@ -75,28 +75,25 @@ node, the scheduler will call filter plugins in their configured order. If any
filter plugin marks the node as infeasible, the remaining plugins will not be filter plugin marks the node as infeasible, the remaining plugins will not be
called for that node. Nodes may be evaluated concurrently. called for that node. Nodes may be evaluated concurrently.
### Post-filter ### PreScore {#pre-score}
This is an informational extension point. Plugins will be called with a list of These plugins are used to perform "pre-scoring" work, which generates a sharable
nodes that passed the filtering phase. A plugin may use this data to update state for Score plugins to use. If a PreScore plugin returns an error, the
internal state or to generate logs/metrics. scheduling cycle is aborted.
**Note:** Plugins wishing to perform "pre-scoring" work should use the ### Score {#scoring}
post-filter extension point.
### Scoring
These plugins are used to rank nodes that have passed the filtering phase. The These plugins are used to rank nodes that have passed the filtering phase. The
scheduler will call each scoring plugin for each node. There will be a well scheduler will call each scoring plugin for each node. There will be a well
defined range of integers representing the minimum and maximum scores. After the defined range of integers representing the minimum and maximum scores. After the
[normalize scoring](#normalize-scoring) phase, the scheduler will combine node [NormalizeScore](#normalize-scoring) phase, the scheduler will combine node
scores from all plugins according to the configured plugin weights. scores from all plugins according to the configured plugin weights.
### Normalize scoring ### NormalizeScore {#normalize-scoring}
These plugins are used to modify scores before the scheduler computes a final These plugins are used to modify scores before the scheduler computes a final
ranking of Nodes. A plugin that registers for this extension point will be ranking of Nodes. A plugin that registers for this extension point will be
called with the [scoring](#scoring) results from the same plugin. This is called called with the [Score](#scoring) results from the same plugin. This is called
once per plugin per scheduling cycle. once per plugin per scheduling cycle.
For example, suppose a plugin `BlinkingLightScorer` ranks Nodes based on how For example, suppose a plugin `BlinkingLightScorer` ranks Nodes based on how
@ -104,7 +101,7 @@ many blinking lights they have.
```go ```go
func ScoreNode(_ *v1.pod, n *v1.Node) (int, error) { func ScoreNode(_ *v1.pod, n *v1.Node) (int, error) {
return getBlinkingLightCount(n) return getBlinkingLightCount(n)
} }
``` ```
@ -114,21 +111,23 @@ extension point.
```go ```go
func NormalizeScores(scores map[string]int) { func NormalizeScores(scores map[string]int) {
highest := 0 highest := 0
for _, score := range scores { for _, score := range scores {
highest = max(highest, score) highest = max(highest, score)
} }
for node, score := range scores { for node, score := range scores {
scores[node] = score*NodeScoreMax/highest scores[node] = score*NodeScoreMax/highest
} }
} }
``` ```
If any normalize-scoring plugin returns an error, the scheduling cycle is If any NormalizeScore plugin returns an error, the scheduling cycle is
aborted. aborted.
**Note:** Plugins wishing to perform "pre-reserve" work should use the {{< note >}}
normalize-scoring extension point. Plugins wishing to perform "pre-reserve" work should use the
NormalizeScore extension point.
{{< /note >}}
### Reserve ### Reserve
@ -140,53 +139,53 @@ to prevent race conditions while the scheduler waits for the bind to succeed.
This is the last step in a scheduling cycle. Once a Pod is in the reserved This is the last step in a scheduling cycle. Once a Pod is in the reserved
state, it will either trigger [Unreserve](#unreserve) plugins (on failure) or state, it will either trigger [Unreserve](#unreserve) plugins (on failure) or
[Post-bind](#post-bind) plugins (on success) at the end of the binding cycle. [PostBind](#post-bind) plugins (on success) at the end of the binding cycle.
*Note: This concept used to be referred to as "assume".*
### Permit ### Permit
These plugins are used to prevent or delay the binding of a Pod. A permit plugin _Permit_ plugins are invoked at the end of the scheduling cycle for each Pod, to
can do one of three things. prevent or delay the binding to the candidate node. A permit plugin can do one of
the three things:
1. **approve** \ 1. **approve** \
Once all permit plugins approve a Pod, it is sent for binding. Once all Permit plugins approve a Pod, it is sent for binding.
1. **deny** \ 1. **deny** \
If any permit plugin denies a Pod, it is returned to the scheduling queue. If any Permit plugin denies a Pod, it is returned to the scheduling queue.
This will trigger [Unreserve](#unreserve) plugins. This will trigger [Unreserve](#unreserve) plugins.
1. **wait** (with a timeout) \ 1. **wait** (with a timeout) \
If a permit plugin returns "wait", then the Pod is kept in the permit phase If a Permit plugin returns "wait", then the Pod is kept in an internal "waiting"
until a [plugin approves it](#frameworkhandle). If a timeout occurs, **wait** Pods list, and the binding cycle of this Pod starts but directly blocks until it
becomes **deny** and the Pod is returned to the scheduling queue, triggering gets [approved](#frameworkhandle). If a timeout occurs, **wait** becomes **deny**
[Unreserve](#unreserve) plugins. and the Pod is returned to the scheduling queue, triggering [Unreserve](#unreserve)
plugins.
**Approving a Pod binding** {{< note >}}
While any plugin can access the list of "waiting" Pods and approve them
(see [`FrameworkHandle`](#frameworkhandle)), we expect only the permit
plugins to approve binding of reserved Pods that are in "waiting" state. Once a Pod
is approved, it is sent to the [PreBind](#pre-bind) phase.
{{< /note >}}
While any plugin can access the list of "waiting" Pods from the cache and ### PreBind {#pre-bind}
approve them (see [`FrameworkHandle`](#frameworkhandle)) we expect only the permit
plugins to approve binding of reserved Pods that are in "waiting" state. Once a
Pod is approved, it is sent to the pre-bind phase.
### Pre-bind
These plugins are used to perform any work required before a Pod is bound. For These plugins are used to perform any work required before a Pod is bound. For
example, a pre-bind plugin may provision a network volume and mount it on the example, a pre-bind plugin may provision a network volume and mount it on the
target node before allowing the Pod to run there. target node before allowing the Pod to run there.
If any pre-bind plugin returns an error, the Pod is [rejected](#unreserve) and If any PreBind plugin returns an error, the Pod is [rejected](#unreserve) and
returned to the scheduling queue. returned to the scheduling queue.
### Bind ### Bind
These plugins are used to bind a Pod to a Node. Bind plugins will not be called These plugins are used to bind a Pod to a Node. Bind plugins will not be called
until all pre-bind plugins have completed. Each bind plugin is called in the until all PreBind plugins have completed. Each bind plugin is called in the
configured order. A bind plugin may choose whether or not to handle the given configured order. A bind plugin may choose whether or not to handle the given
Pod. If a bind plugin chooses to handle a Pod, **the remaining bind plugins are Pod. If a bind plugin chooses to handle a Pod, **the remaining bind plugins are
skipped**. skipped**.
### Post-bind ### PostBind {#post-bind}
This is an informational extension point. Post-bind plugins are called after a This is an informational extension point. Post-bind plugins are called after a
Pod is successfully bound. This is the end of a binding cycle, and can be used Pod is successfully bound. This is the end of a binding cycle, and can be used
@ -209,88 +208,35 @@ interfaces have the following form.
```go ```go
type Plugin interface { type Plugin interface {
Name() string Name() string
} }
type QueueSortPlugin interface { type QueueSortPlugin interface {
Plugin Plugin
Less(*v1.pod, *v1.pod) bool Less(*v1.pod, *v1.pod) bool
} }
type PreFilterPlugin interface { type PreFilterPlugin interface {
Plugin Plugin
PreFilter(PluginContext, *v1.pod) error PreFilter(context.Context, *framework.CycleState, *v1.pod) error
} }
// ... // ...
``` ```
# Plugin Configuration ## Plugin configuration
Plugins can be enabled in the scheduler configuration. Also, default plugins can You can enable or disable plugins in the scheduler configuration. If you are using
be disabled in the configuration. In 1.15, there are no default plugins for the Kubernetes v1.18 or later, most scheduling
scheduling framework. [plugins](/docs/reference/scheduling/profiles/#scheduling-plugins) are in use and
enabled by default.
The scheduler configuration can include configuration for plugins as well. Such In addition to default plugins, you can also implement your own scheduling
configurations are passed to the plugins at the time the scheduler initializes plugins and get them configured along with default plugins. You can visit
them. The configuration is an arbitrary value. The receiving plugin should [scheduler-plugins](https://github.com/kubernetes-sigs/scheduler-plugins) for more details.
decode and process the configuration.
The following example shows a scheduler configuration that enables some If you are using Kubernetes v1.18 or later, you can configure a set of plugins as
plugins at `reserve` and `preBind` extension points and disables a plugin. It a scheduler profile and then define multiple profiles to fit various kinds of workload.
also provides a configuration to plugin `foo`. Learn more at [multiple profiles](/docs/reference/scheduling/profiles/#multiple-profiles).
```yaml
apiVersion: kubescheduler.config.k8s.io/v1alpha1
kind: KubeSchedulerConfiguration
...
plugins:
reserve:
enabled:
- name: foo
- name: bar
disabled:
- name: baz
preBind:
enabled:
- name: foo
disabled:
- name: baz
pluginConfig:
- name: foo
args: >
Arbitrary set of args to plugin foo
```
When an extension point is omitted from the configuration default plugins for
that extension points are used. When an extension point exists and `enabled` is
provided, the `enabled` plugins are called in addition to default plugins.
Default plugins are called first and then the additional enabled plugins are
called in the same order specified in the configuration. If a different order of
calling default plugins is desired, default plugins must be `disabled` and
`enabled` in the desired order.
Assuming there is a default plugin called `foo` at `reserve` and we are adding
plugin `bar` that we want to be invoked before `foo`, we should disable `foo`
and enable `bar` and `foo` in order. The following example shows the
configuration that achieves this:
```yaml
apiVersion: kubescheduler.config.k8s.io/v1alpha1
kind: KubeSchedulerConfiguration
...
plugins:
reserve:
enabled:
- name: bar
- name: foo
disabled:
- name: foo
```
{{% /capture %}} {{% /capture %}}

View File

@ -109,6 +109,7 @@
/docs/concepts/configuration/container-command-arg/ /docs/tasks/inject-data-application/define-command-argument-container/ 301 /docs/concepts/configuration/container-command-arg/ /docs/tasks/inject-data-application/define-command-argument-container/ 301
/docs/concepts/configuration/container-command-args/ /docs/tasks/inject-data-application/define-command-argument-container/ 301 /docs/concepts/configuration/container-command-args/ /docs/tasks/inject-data-application/define-command-argument-container/ 301
/docs/concepts/configuration/scheduler-perf-tuning/ /docs/concepts/scheduling/scheduler-perf-tuning/ 301 /docs/concepts/configuration/scheduler-perf-tuning/ /docs/concepts/scheduling/scheduler-perf-tuning/ 301
/docs/concepts/configuration/scheduling-framework/ /docs/concepts/scheduling/scheduling-framework/ 301
/docs/concepts/ecosystem/thirdpartyresource/ /docs/tasks/access-kubernetes-api/extend-api-third-party-resource/ 301 /docs/concepts/ecosystem/thirdpartyresource/ /docs/tasks/access-kubernetes-api/extend-api-third-party-resource/ 301
/docs/concepts/jobs/cron-jobs/ /docs/concepts/workloads/controllers/cron-jobs/ 301 /docs/concepts/jobs/cron-jobs/ /docs/concepts/workloads/controllers/cron-jobs/ 301
/docs/concepts/jobs/run-to-completion-finite-workloads/ /docs/concepts/workloads/controllers/jobs-run-to-completion/ 301 /docs/concepts/jobs/run-to-completion-finite-workloads/ /docs/concepts/workloads/controllers/jobs-run-to-completion/ 301

Binary file not shown.

Before

Width:  |  Height:  |  Size: 73 KiB

After

Width:  |  Height:  |  Size: 55 KiB