Update scheduler framework document (#19606)
* Update scheduler framework document * address comments * Add page redirect and keep old link anchors * address commentpull/19698/head
parent
8b6dad9c69
commit
098400c958
|
@ -1,7 +1,7 @@
|
||||||
---
|
---
|
||||||
title: Kubernetes Scheduler
|
title: Kubernetes Scheduler
|
||||||
content_template: templates/concept
|
content_template: templates/concept
|
||||||
weight: 60
|
weight: 50
|
||||||
---
|
---
|
||||||
|
|
||||||
{{% capture overview %}}
|
{{% capture overview %}}
|
||||||
|
|
|
@ -3,14 +3,14 @@ reviewers:
|
||||||
- ahg-g
|
- ahg-g
|
||||||
title: Scheduling Framework
|
title: Scheduling Framework
|
||||||
content_template: templates/concept
|
content_template: templates/concept
|
||||||
weight: 70
|
weight: 60
|
||||||
---
|
---
|
||||||
|
|
||||||
{{% capture overview %}}
|
{{% capture overview %}}
|
||||||
|
|
||||||
{{< feature-state for_k8s_version="1.15" state="alpha" >}}
|
{{< feature-state for_k8s_version="1.15" state="alpha" >}}
|
||||||
|
|
||||||
The scheduling framework is a new pluggable architecture for Kubernetes Scheduler
|
The scheduling framework is a pluggable architecture for Kubernetes Scheduler
|
||||||
that makes scheduler customizations easy. It adds a new set of "plugin" APIs to
|
that makes scheduler customizations easy. It adds a new set of "plugin" APIs to
|
||||||
the existing scheduler. Plugins are compiled into the scheduler. The APIs
|
the existing scheduler. Plugins are compiled into the scheduler. The APIs
|
||||||
allow most scheduling features to be implemented as plugins, while keeping the
|
allow most scheduling features to be implemented as plugins, while keeping the
|
||||||
|
@ -56,16 +56,16 @@ stateful tasks.
|
||||||
|
|
||||||
{{< figure src="/images/docs/scheduling-framework-extensions.png" title="scheduling framework extension points" >}}
|
{{< figure src="/images/docs/scheduling-framework-extensions.png" title="scheduling framework extension points" >}}
|
||||||
|
|
||||||
### Queue sort
|
### QueueSort {#queue-sort}
|
||||||
|
|
||||||
These plugins are used to sort Pods in the scheduling queue. A queue sort plugin
|
These plugins are used to sort Pods in the scheduling queue. A queue sort plugin
|
||||||
essentially will provide a "less(Pod1, Pod2)" function. Only one queue sort
|
essentially provides a `Less(Pod1, Pod2)` function. Only one queue sort
|
||||||
plugin may be enabled at a time.
|
plugin may be enabled at a time.
|
||||||
|
|
||||||
### Pre-filter
|
### PreFilter {#pre-filter}
|
||||||
|
|
||||||
These plugins are used to pre-process info about the Pod, or to check certain
|
These plugins are used to pre-process info about the Pod, or to check certain
|
||||||
conditions that the cluster or the Pod must meet. If a pre-filter plugin returns
|
conditions that the cluster or the Pod must meet. If a PreFilter plugin returns
|
||||||
an error, the scheduling cycle is aborted.
|
an error, the scheduling cycle is aborted.
|
||||||
|
|
||||||
### Filter
|
### Filter
|
||||||
|
@ -75,28 +75,25 @@ node, the scheduler will call filter plugins in their configured order. If any
|
||||||
filter plugin marks the node as infeasible, the remaining plugins will not be
|
filter plugin marks the node as infeasible, the remaining plugins will not be
|
||||||
called for that node. Nodes may be evaluated concurrently.
|
called for that node. Nodes may be evaluated concurrently.
|
||||||
|
|
||||||
### Post-filter
|
### PreScore {#pre-score}
|
||||||
|
|
||||||
This is an informational extension point. Plugins will be called with a list of
|
These plugins are used to perform "pre-scoring" work, which generates a sharable
|
||||||
nodes that passed the filtering phase. A plugin may use this data to update
|
state for Score plugins to use. If a PreScore plugin returns an error, the
|
||||||
internal state or to generate logs/metrics.
|
scheduling cycle is aborted.
|
||||||
|
|
||||||
**Note:** Plugins wishing to perform "pre-scoring" work should use the
|
### Score {#scoring}
|
||||||
post-filter extension point.
|
|
||||||
|
|
||||||
### Scoring
|
|
||||||
|
|
||||||
These plugins are used to rank nodes that have passed the filtering phase. The
|
These plugins are used to rank nodes that have passed the filtering phase. The
|
||||||
scheduler will call each scoring plugin for each node. There will be a well
|
scheduler will call each scoring plugin for each node. There will be a well
|
||||||
defined range of integers representing the minimum and maximum scores. After the
|
defined range of integers representing the minimum and maximum scores. After the
|
||||||
[normalize scoring](#normalize-scoring) phase, the scheduler will combine node
|
[NormalizeScore](#normalize-scoring) phase, the scheduler will combine node
|
||||||
scores from all plugins according to the configured plugin weights.
|
scores from all plugins according to the configured plugin weights.
|
||||||
|
|
||||||
### Normalize scoring
|
### NormalizeScore {#normalize-scoring}
|
||||||
|
|
||||||
These plugins are used to modify scores before the scheduler computes a final
|
These plugins are used to modify scores before the scheduler computes a final
|
||||||
ranking of Nodes. A plugin that registers for this extension point will be
|
ranking of Nodes. A plugin that registers for this extension point will be
|
||||||
called with the [scoring](#scoring) results from the same plugin. This is called
|
called with the [Score](#scoring) results from the same plugin. This is called
|
||||||
once per plugin per scheduling cycle.
|
once per plugin per scheduling cycle.
|
||||||
|
|
||||||
For example, suppose a plugin `BlinkingLightScorer` ranks Nodes based on how
|
For example, suppose a plugin `BlinkingLightScorer` ranks Nodes based on how
|
||||||
|
@ -104,7 +101,7 @@ many blinking lights they have.
|
||||||
|
|
||||||
```go
|
```go
|
||||||
func ScoreNode(_ *v1.pod, n *v1.Node) (int, error) {
|
func ScoreNode(_ *v1.pod, n *v1.Node) (int, error) {
|
||||||
return getBlinkingLightCount(n)
|
return getBlinkingLightCount(n)
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -114,21 +111,23 @@ extension point.
|
||||||
|
|
||||||
```go
|
```go
|
||||||
func NormalizeScores(scores map[string]int) {
|
func NormalizeScores(scores map[string]int) {
|
||||||
highest := 0
|
highest := 0
|
||||||
for _, score := range scores {
|
for _, score := range scores {
|
||||||
highest = max(highest, score)
|
highest = max(highest, score)
|
||||||
}
|
}
|
||||||
for node, score := range scores {
|
for node, score := range scores {
|
||||||
scores[node] = score*NodeScoreMax/highest
|
scores[node] = score*NodeScoreMax/highest
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
If any normalize-scoring plugin returns an error, the scheduling cycle is
|
If any NormalizeScore plugin returns an error, the scheduling cycle is
|
||||||
aborted.
|
aborted.
|
||||||
|
|
||||||
**Note:** Plugins wishing to perform "pre-reserve" work should use the
|
{{< note >}}
|
||||||
normalize-scoring extension point.
|
Plugins wishing to perform "pre-reserve" work should use the
|
||||||
|
NormalizeScore extension point.
|
||||||
|
{{< /note >}}
|
||||||
|
|
||||||
### Reserve
|
### Reserve
|
||||||
|
|
||||||
|
@ -140,53 +139,53 @@ to prevent race conditions while the scheduler waits for the bind to succeed.
|
||||||
|
|
||||||
This is the last step in a scheduling cycle. Once a Pod is in the reserved
|
This is the last step in a scheduling cycle. Once a Pod is in the reserved
|
||||||
state, it will either trigger [Unreserve](#unreserve) plugins (on failure) or
|
state, it will either trigger [Unreserve](#unreserve) plugins (on failure) or
|
||||||
[Post-bind](#post-bind) plugins (on success) at the end of the binding cycle.
|
[PostBind](#post-bind) plugins (on success) at the end of the binding cycle.
|
||||||
|
|
||||||
*Note: This concept used to be referred to as "assume".*
|
|
||||||
|
|
||||||
### Permit
|
### Permit
|
||||||
|
|
||||||
These plugins are used to prevent or delay the binding of a Pod. A permit plugin
|
_Permit_ plugins are invoked at the end of the scheduling cycle for each Pod, to
|
||||||
can do one of three things.
|
prevent or delay the binding to the candidate node. A permit plugin can do one of
|
||||||
|
the three things:
|
||||||
|
|
||||||
1. **approve** \
|
1. **approve** \
|
||||||
Once all permit plugins approve a Pod, it is sent for binding.
|
Once all Permit plugins approve a Pod, it is sent for binding.
|
||||||
|
|
||||||
1. **deny** \
|
1. **deny** \
|
||||||
If any permit plugin denies a Pod, it is returned to the scheduling queue.
|
If any Permit plugin denies a Pod, it is returned to the scheduling queue.
|
||||||
This will trigger [Unreserve](#unreserve) plugins.
|
This will trigger [Unreserve](#unreserve) plugins.
|
||||||
|
|
||||||
1. **wait** (with a timeout) \
|
1. **wait** (with a timeout) \
|
||||||
If a permit plugin returns "wait", then the Pod is kept in the permit phase
|
If a Permit plugin returns "wait", then the Pod is kept in an internal "waiting"
|
||||||
until a [plugin approves it](#frameworkhandle). If a timeout occurs, **wait**
|
Pods list, and the binding cycle of this Pod starts but directly blocks until it
|
||||||
becomes **deny** and the Pod is returned to the scheduling queue, triggering
|
gets [approved](#frameworkhandle). If a timeout occurs, **wait** becomes **deny**
|
||||||
[Unreserve](#unreserve) plugins.
|
and the Pod is returned to the scheduling queue, triggering [Unreserve](#unreserve)
|
||||||
|
plugins.
|
||||||
|
|
||||||
**Approving a Pod binding**
|
{{< note >}}
|
||||||
|
While any plugin can access the list of "waiting" Pods and approve them
|
||||||
|
(see [`FrameworkHandle`](#frameworkhandle)), we expect only the permit
|
||||||
|
plugins to approve binding of reserved Pods that are in "waiting" state. Once a Pod
|
||||||
|
is approved, it is sent to the [PreBind](#pre-bind) phase.
|
||||||
|
{{< /note >}}
|
||||||
|
|
||||||
While any plugin can access the list of "waiting" Pods from the cache and
|
### PreBind {#pre-bind}
|
||||||
approve them (see [`FrameworkHandle`](#frameworkhandle)) we expect only the permit
|
|
||||||
plugins to approve binding of reserved Pods that are in "waiting" state. Once a
|
|
||||||
Pod is approved, it is sent to the pre-bind phase.
|
|
||||||
|
|
||||||
### Pre-bind
|
|
||||||
|
|
||||||
These plugins are used to perform any work required before a Pod is bound. For
|
These plugins are used to perform any work required before a Pod is bound. For
|
||||||
example, a pre-bind plugin may provision a network volume and mount it on the
|
example, a pre-bind plugin may provision a network volume and mount it on the
|
||||||
target node before allowing the Pod to run there.
|
target node before allowing the Pod to run there.
|
||||||
|
|
||||||
If any pre-bind plugin returns an error, the Pod is [rejected](#unreserve) and
|
If any PreBind plugin returns an error, the Pod is [rejected](#unreserve) and
|
||||||
returned to the scheduling queue.
|
returned to the scheduling queue.
|
||||||
|
|
||||||
### Bind
|
### Bind
|
||||||
|
|
||||||
These plugins are used to bind a Pod to a Node. Bind plugins will not be called
|
These plugins are used to bind a Pod to a Node. Bind plugins will not be called
|
||||||
until all pre-bind plugins have completed. Each bind plugin is called in the
|
until all PreBind plugins have completed. Each bind plugin is called in the
|
||||||
configured order. A bind plugin may choose whether or not to handle the given
|
configured order. A bind plugin may choose whether or not to handle the given
|
||||||
Pod. If a bind plugin chooses to handle a Pod, **the remaining bind plugins are
|
Pod. If a bind plugin chooses to handle a Pod, **the remaining bind plugins are
|
||||||
skipped**.
|
skipped**.
|
||||||
|
|
||||||
### Post-bind
|
### PostBind {#post-bind}
|
||||||
|
|
||||||
This is an informational extension point. Post-bind plugins are called after a
|
This is an informational extension point. Post-bind plugins are called after a
|
||||||
Pod is successfully bound. This is the end of a binding cycle, and can be used
|
Pod is successfully bound. This is the end of a binding cycle, and can be used
|
||||||
|
@ -209,88 +208,35 @@ interfaces have the following form.
|
||||||
|
|
||||||
```go
|
```go
|
||||||
type Plugin interface {
|
type Plugin interface {
|
||||||
Name() string
|
Name() string
|
||||||
}
|
}
|
||||||
|
|
||||||
type QueueSortPlugin interface {
|
type QueueSortPlugin interface {
|
||||||
Plugin
|
Plugin
|
||||||
Less(*v1.pod, *v1.pod) bool
|
Less(*v1.pod, *v1.pod) bool
|
||||||
}
|
}
|
||||||
|
|
||||||
type PreFilterPlugin interface {
|
type PreFilterPlugin interface {
|
||||||
Plugin
|
Plugin
|
||||||
PreFilter(PluginContext, *v1.pod) error
|
PreFilter(context.Context, *framework.CycleState, *v1.pod) error
|
||||||
}
|
}
|
||||||
|
|
||||||
// ...
|
// ...
|
||||||
```
|
```
|
||||||
|
|
||||||
# Plugin Configuration
|
## Plugin configuration
|
||||||
|
|
||||||
Plugins can be enabled in the scheduler configuration. Also, default plugins can
|
You can enable or disable plugins in the scheduler configuration. If you are using
|
||||||
be disabled in the configuration. In 1.15, there are no default plugins for the
|
Kubernetes v1.18 or later, most scheduling
|
||||||
scheduling framework.
|
[plugins](/docs/reference/scheduling/profiles/#scheduling-plugins) are in use and
|
||||||
|
enabled by default.
|
||||||
|
|
||||||
The scheduler configuration can include configuration for plugins as well. Such
|
In addition to default plugins, you can also implement your own scheduling
|
||||||
configurations are passed to the plugins at the time the scheduler initializes
|
plugins and get them configured along with default plugins. You can visit
|
||||||
them. The configuration is an arbitrary value. The receiving plugin should
|
[scheduler-plugins](https://github.com/kubernetes-sigs/scheduler-plugins) for more details.
|
||||||
decode and process the configuration.
|
|
||||||
|
|
||||||
The following example shows a scheduler configuration that enables some
|
If you are using Kubernetes v1.18 or later, you can configure a set of plugins as
|
||||||
plugins at `reserve` and `preBind` extension points and disables a plugin. It
|
a scheduler profile and then define multiple profiles to fit various kinds of workload.
|
||||||
also provides a configuration to plugin `foo`.
|
Learn more at [multiple profiles](/docs/reference/scheduling/profiles/#multiple-profiles).
|
||||||
|
|
||||||
```yaml
|
|
||||||
apiVersion: kubescheduler.config.k8s.io/v1alpha1
|
|
||||||
kind: KubeSchedulerConfiguration
|
|
||||||
|
|
||||||
...
|
|
||||||
|
|
||||||
plugins:
|
|
||||||
reserve:
|
|
||||||
enabled:
|
|
||||||
- name: foo
|
|
||||||
- name: bar
|
|
||||||
disabled:
|
|
||||||
- name: baz
|
|
||||||
preBind:
|
|
||||||
enabled:
|
|
||||||
- name: foo
|
|
||||||
disabled:
|
|
||||||
- name: baz
|
|
||||||
|
|
||||||
pluginConfig:
|
|
||||||
- name: foo
|
|
||||||
args: >
|
|
||||||
Arbitrary set of args to plugin foo
|
|
||||||
```
|
|
||||||
|
|
||||||
When an extension point is omitted from the configuration default plugins for
|
|
||||||
that extension points are used. When an extension point exists and `enabled` is
|
|
||||||
provided, the `enabled` plugins are called in addition to default plugins.
|
|
||||||
Default plugins are called first and then the additional enabled plugins are
|
|
||||||
called in the same order specified in the configuration. If a different order of
|
|
||||||
calling default plugins is desired, default plugins must be `disabled` and
|
|
||||||
`enabled` in the desired order.
|
|
||||||
|
|
||||||
Assuming there is a default plugin called `foo` at `reserve` and we are adding
|
|
||||||
plugin `bar` that we want to be invoked before `foo`, we should disable `foo`
|
|
||||||
and enable `bar` and `foo` in order. The following example shows the
|
|
||||||
configuration that achieves this:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
apiVersion: kubescheduler.config.k8s.io/v1alpha1
|
|
||||||
kind: KubeSchedulerConfiguration
|
|
||||||
|
|
||||||
...
|
|
||||||
|
|
||||||
plugins:
|
|
||||||
reserve:
|
|
||||||
enabled:
|
|
||||||
- name: bar
|
|
||||||
- name: foo
|
|
||||||
disabled:
|
|
||||||
- name: foo
|
|
||||||
```
|
|
||||||
|
|
||||||
{{% /capture %}}
|
{{% /capture %}}
|
|
@ -109,6 +109,7 @@
|
||||||
/docs/concepts/configuration/container-command-arg/ /docs/tasks/inject-data-application/define-command-argument-container/ 301
|
/docs/concepts/configuration/container-command-arg/ /docs/tasks/inject-data-application/define-command-argument-container/ 301
|
||||||
/docs/concepts/configuration/container-command-args/ /docs/tasks/inject-data-application/define-command-argument-container/ 301
|
/docs/concepts/configuration/container-command-args/ /docs/tasks/inject-data-application/define-command-argument-container/ 301
|
||||||
/docs/concepts/configuration/scheduler-perf-tuning/ /docs/concepts/scheduling/scheduler-perf-tuning/ 301
|
/docs/concepts/configuration/scheduler-perf-tuning/ /docs/concepts/scheduling/scheduler-perf-tuning/ 301
|
||||||
|
/docs/concepts/configuration/scheduling-framework/ /docs/concepts/scheduling/scheduling-framework/ 301
|
||||||
/docs/concepts/ecosystem/thirdpartyresource/ /docs/tasks/access-kubernetes-api/extend-api-third-party-resource/ 301
|
/docs/concepts/ecosystem/thirdpartyresource/ /docs/tasks/access-kubernetes-api/extend-api-third-party-resource/ 301
|
||||||
/docs/concepts/jobs/cron-jobs/ /docs/concepts/workloads/controllers/cron-jobs/ 301
|
/docs/concepts/jobs/cron-jobs/ /docs/concepts/workloads/controllers/cron-jobs/ 301
|
||||||
/docs/concepts/jobs/run-to-completion-finite-workloads/ /docs/concepts/workloads/controllers/jobs-run-to-completion/ 301
|
/docs/concepts/jobs/run-to-completion-finite-workloads/ /docs/concepts/workloads/controllers/jobs-run-to-completion/ 301
|
||||||
|
|
Binary file not shown.
Before Width: | Height: | Size: 73 KiB After Width: | Height: | Size: 55 KiB |
Loading…
Reference in New Issue