Doc for Alpha feature PodSchedulingReadiness
parent
b8fc810198
commit
21a7c4cc7e
|
@ -28,6 +28,7 @@ of terminating one or more Pods on Nodes.
|
|||
* [Scheduling Framework](/docs/concepts/scheduling-eviction/scheduling-framework)
|
||||
* [Scheduler Performance Tuning](/docs/concepts/scheduling-eviction/scheduler-perf-tuning/)
|
||||
* [Resource Bin Packing for Extended Resources](/docs/concepts/scheduling-eviction/resource-bin-packing/)
|
||||
* [Pod Scheduling Readiness](/docs/concepts/scheduling-eviction/pod-scheduling-readiness/)
|
||||
|
||||
## Pod Disruption
|
||||
|
||||
|
|
|
@ -0,0 +1,110 @@
|
|||
---
|
||||
title: Pod Scheduling Readiness
|
||||
content_type: concept
|
||||
weight: 40
|
||||
---
|
||||
|
||||
<!-- overview -->
|
||||
|
||||
{{< feature-state for_k8s_version="v1.26" state="alpha" >}}
|
||||
|
||||
Pods were considered ready for scheduling once created. Kubernetes scheduler
|
||||
does its due diligence to find nodes to place all pending Pods. However, in a
|
||||
real-world case, some Pods may stay in a "miss-essential-resources" state for a long period.
|
||||
These Pods actually churn the scheduler (and downstream integrators like Cluster AutoScaler)
|
||||
in an unnecessary manner.
|
||||
|
||||
By specifying/removing a Pod's `.spec.schedulingGates`, you can control when a Pod is ready
|
||||
to be considered for scheduling.
|
||||
|
||||
<!-- body -->
|
||||
|
||||
## Configuring Pod schedulingGates
|
||||
|
||||
The `schedulingGates` field contains a list of strings, and each string literal is perceived as a
|
||||
criteria that Pod should be satisfied before considered schedulable. This field can be initialized
|
||||
only when a Pod is created (either by the client, or mutated during admission). After creation,
|
||||
each schedulingGate can be removed in arbitrary order, but addition of a new scheduling gate is disallowed.
|
||||
|
||||
{{<mermaid>}}
|
||||
stateDiagram-v2
|
||||
s1: pod created
|
||||
s2: pod scheduling gated
|
||||
s3: pod scheduling ready
|
||||
s4: pod running
|
||||
if: empty scheduling gates?
|
||||
state if <<choice>>
|
||||
[*] --> s1
|
||||
s1 --> if
|
||||
s2 --> if: scheduling gate removed
|
||||
if --> s2: no
|
||||
if --> s3: yes
|
||||
s3 --> s4
|
||||
s4 --> [*]
|
||||
{{< /mermaid >}}
|
||||
|
||||
## Usage example
|
||||
|
||||
To mark a Pod not-ready for scheduling, you can create it with one or more scheduling gates like this:
|
||||
|
||||
{{< codenew file="pods/pod-with-scheduling-gates.yaml" >}}
|
||||
|
||||
After the Pod's creation, you can check its state using:
|
||||
|
||||
```bash
|
||||
kubectl get pod test-pod
|
||||
```
|
||||
|
||||
The output reveals it's in `SchedulingGated` state:
|
||||
|
||||
```bash
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
test-pod 0/1 SchedulingGated 0 7s
|
||||
```
|
||||
|
||||
You can also check its `schedulingGates` field by running:
|
||||
|
||||
```bash
|
||||
kubectl get pod test-pod -o jsonpath='{.spec.schedulingGates}'
|
||||
```
|
||||
|
||||
The output is:
|
||||
|
||||
```bash
|
||||
[{"name":"foo"},{"name":"bar"}]
|
||||
```
|
||||
|
||||
To inform scheduler this Pod is ready for scheduling, you can remove its `schedulingGates` entirely
|
||||
by re-applying a modified manifest:
|
||||
|
||||
{{< codenew file="pods/pod-without-scheduling-gates.yaml" >}}
|
||||
|
||||
You can check if the `schedulingGates` is cleared by running:
|
||||
|
||||
```bash
|
||||
kubectl get pod test-pod -o jsonpath='{.spec.schedulingGates}'
|
||||
```
|
||||
|
||||
The output is expected to be empty. And you can check its latest status by running:
|
||||
|
||||
```bash
|
||||
kubectl get pod test-pod -o wide
|
||||
```
|
||||
|
||||
Given the test-pod doesn't request any CPU/memory resources, it's expected that this Pod's state get
|
||||
transited from previous `SchedulingGated` to `Running`:
|
||||
|
||||
```bash
|
||||
NAME READY STATUS RESTARTS AGE IP NODE
|
||||
test-pod 1/1 Running 0 15s 10.0.0.4 node-2
|
||||
```
|
||||
|
||||
## Observability
|
||||
|
||||
The metric `scheduler_pending_pods` comes with a new label `"gated"` to distinguish whether a Pod
|
||||
has been tried scheduling but claimed as unschedulable, or explicitly marked as not ready for
|
||||
scheduling. You can use `scheduler_pending_pods{queue="gated"}` to check the metric result.
|
||||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
* Read the [PodSchedulingReadiness KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-scheduling/3521-pod-scheduling-readiness) for more details
|
|
@ -152,6 +152,7 @@ For a reference to old feature gates that are removed, please refer to
|
|||
| `PodDeletionCost` | `true` | Beta | 1.22 | |
|
||||
| `PodDisruptionConditions` | `false` | Alpha | 1.25 | - |
|
||||
| `PodHasNetworkCondition` | `false` | Alpha | 1.25 | |
|
||||
| `PodSchedulingReadiness` | `false` | Alpha | 1.26 | |
|
||||
| `ProbeTerminationGracePeriod` | `false` | Alpha | 1.21 | 1.21 |
|
||||
| `ProbeTerminationGracePeriod` | `false` | Beta | 1.22 | 1.24 |
|
||||
| `ProbeTerminationGracePeriod` | `true` | Beta | 1.25 | |
|
||||
|
@ -652,6 +653,7 @@ Each feature gate is designed for enabling/disabling a specific feature:
|
|||
pod stats from the CRI container runtime rather than gathering them from cAdvisor.
|
||||
- `PodDisruptionConditions`: Enables support for appending a dedicated pod condition indicating that the pod is being deleted due to a disruption.
|
||||
- `PodHasNetworkCondition`: Enable the kubelet to mark the [PodHasNetwork](/docs/concepts/workloads/pods/pod-lifecycle/#pod-has-network) condition on pods.
|
||||
- `PodSchedulingReadiness`: Enable setting `schedulingGates` field to control a Pod's [scheduling readiness](/docs/concepts/scheduling-eviction/pod-scheduling-readiness).
|
||||
- `PodSecurity`: Enables the `PodSecurity` admission plugin.
|
||||
- `PreferNominatedNode`: This flag tells the scheduler whether the nominated
|
||||
nodes will be checked first before looping through all the other nodes in
|
||||
|
|
|
@ -0,0 +1,11 @@
|
|||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: test-pod
|
||||
spec:
|
||||
schedulingGates:
|
||||
- name: foo
|
||||
- name: bar
|
||||
containers:
|
||||
- name: pause
|
||||
image: registry.k8s.io/pause:3.6
|
|
@ -0,0 +1,8 @@
|
|||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: test-pod
|
||||
spec:
|
||||
containers:
|
||||
- name: pause
|
||||
image: registry.k8s.io/pause:3.6
|
Loading…
Reference in New Issue