Fix trailing whitespace in scheduler section
parent
e839bf7aee
commit
2f298d2077
|
@ -11,11 +11,11 @@ using a client of the {{<glossary_tooltip term_id="kube-apiserver" text="API ser
|
|||
creates an `Eviction` object, which causes the API server to terminate the Pod.
|
||||
|
||||
API-initiated evictions respect your configured [`PodDisruptionBudgets`](/docs/tasks/run-application/configure-pdb/)
|
||||
and [`terminationGracePeriodSeconds`](/docs/concepts/workloads/pods/pod-lifecycle#pod-termination).
|
||||
and [`terminationGracePeriodSeconds`](/docs/concepts/workloads/pods/pod-lifecycle#pod-termination).
|
||||
|
||||
Using the API to create an Eviction object for a Pod is like performing a
|
||||
policy-controlled [`DELETE` operation](/docs/reference/kubernetes-api/workload-resources/pod-v1/#delete-delete-a-pod)
|
||||
on the Pod.
|
||||
on the Pod.
|
||||
|
||||
## Calling the Eviction API
|
||||
|
||||
|
@ -75,13 +75,13 @@ checks and responds in one of the following ways:
|
|||
* `429 Too Many Requests`: the eviction is not currently allowed because of the
|
||||
configured {{<glossary_tooltip term_id="pod-disruption-budget" text="PodDisruptionBudget">}}.
|
||||
You may be able to attempt the eviction again later. You might also see this
|
||||
response because of API rate limiting.
|
||||
response because of API rate limiting.
|
||||
* `500 Internal Server Error`: the eviction is not allowed because there is a
|
||||
misconfiguration, like if multiple PodDisruptionBudgets reference the same Pod.
|
||||
|
||||
If the Pod you want to evict isn't part of a workload that has a
|
||||
PodDisruptionBudget, the API server always returns `200 OK` and allows the
|
||||
eviction.
|
||||
eviction.
|
||||
|
||||
If the API server allows the eviction, the Pod is deleted as follows:
|
||||
|
||||
|
@ -103,12 +103,12 @@ If the API server allows the eviction, the Pod is deleted as follows:
|
|||
## Troubleshooting stuck evictions
|
||||
|
||||
In some cases, your applications may enter a broken state, where the Eviction
|
||||
API will only return `429` or `500` responses until you intervene. This can
|
||||
happen if, for example, a ReplicaSet creates pods for your application but new
|
||||
API will only return `429` or `500` responses until you intervene. This can
|
||||
happen if, for example, a ReplicaSet creates pods for your application but new
|
||||
pods do not enter a `Ready` state. You may also notice this behavior in cases
|
||||
where the last evicted Pod had a long termination grace period.
|
||||
|
||||
If you notice stuck evictions, try one of the following solutions:
|
||||
If you notice stuck evictions, try one of the following solutions:
|
||||
|
||||
* Abort or pause the automated operation causing the issue. Investigate the stuck
|
||||
application before you restart the operation.
|
||||
|
|
|
@ -96,7 +96,7 @@ define. Some of the benefits of affinity and anti-affinity include:
|
|||
The affinity feature consists of two types of affinity:
|
||||
|
||||
- *Node affinity* functions like the `nodeSelector` field but is more expressive and
|
||||
allows you to specify soft rules.
|
||||
allows you to specify soft rules.
|
||||
- *Inter-pod affinity/anti-affinity* allows you to constrain Pods against labels
|
||||
on other Pods.
|
||||
|
||||
|
@ -305,22 +305,22 @@ Pod affinity rule uses the "hard"
|
|||
`requiredDuringSchedulingIgnoredDuringExecution`, while the anti-affinity rule
|
||||
uses the "soft" `preferredDuringSchedulingIgnoredDuringExecution`.
|
||||
|
||||
The affinity rule specifies that the scheduler is allowed to place the example Pod
|
||||
The affinity rule specifies that the scheduler is allowed to place the example Pod
|
||||
on a node only if that node belongs to a specific [zone](/docs/concepts/scheduling-eviction/topology-spread-constraints/)
|
||||
where other Pods have been labeled with `security=S1`.
|
||||
For instance, if we have a cluster with a designated zone, let's call it "Zone V,"
|
||||
consisting of nodes labeled with `topology.kubernetes.io/zone=V`, the scheduler can
|
||||
assign the Pod to any node within Zone V, as long as there is at least one Pod within
|
||||
Zone V already labeled with `security=S1`. Conversely, if there are no Pods with `security=S1`
|
||||
where other Pods have been labeled with `security=S1`.
|
||||
For instance, if we have a cluster with a designated zone, let's call it "Zone V,"
|
||||
consisting of nodes labeled with `topology.kubernetes.io/zone=V`, the scheduler can
|
||||
assign the Pod to any node within Zone V, as long as there is at least one Pod within
|
||||
Zone V already labeled with `security=S1`. Conversely, if there are no Pods with `security=S1`
|
||||
labels in Zone V, the scheduler will not assign the example Pod to any node in that zone.
|
||||
|
||||
The anti-affinity rule specifies that the scheduler should try to avoid scheduling the Pod
|
||||
The anti-affinity rule specifies that the scheduler should try to avoid scheduling the Pod
|
||||
on a node if that node belongs to a specific [zone](/docs/concepts/scheduling-eviction/topology-spread-constraints/)
|
||||
where other Pods have been labeled with `security=S2`.
|
||||
For instance, if we have a cluster with a designated zone, let's call it "Zone R,"
|
||||
consisting of nodes labeled with `topology.kubernetes.io/zone=R`, the scheduler should avoid
|
||||
assigning the Pod to any node within Zone R, as long as there is at least one Pod within
|
||||
Zone R already labeled with `security=S2`. Conversely, the anti-affinity rule does not impact
|
||||
where other Pods have been labeled with `security=S2`.
|
||||
For instance, if we have a cluster with a designated zone, let's call it "Zone R,"
|
||||
consisting of nodes labeled with `topology.kubernetes.io/zone=R`, the scheduler should avoid
|
||||
assigning the Pod to any node within Zone R, as long as there is at least one Pod within
|
||||
Zone R already labeled with `security=S2`. Conversely, the anti-affinity rule does not impact
|
||||
scheduling into Zone R if there are no Pods with `security=S2` labels.
|
||||
|
||||
To get yourself more familiar with the examples of Pod affinity and anti-affinity,
|
||||
|
@ -371,12 +371,12 @@ When you want to use it, you have to enable it via the
|
|||
{{< /note >}}
|
||||
|
||||
Kubernetes includes an optional `matchLabelKeys` field for Pod affinity
|
||||
or anti-affinity. The field specifies keys for the labels that should match with the incoming Pod's labels,
|
||||
or anti-affinity. The field specifies keys for the labels that should match with the incoming Pod's labels,
|
||||
when satisfying the Pod (anti)affinity.
|
||||
|
||||
The keys are used to look up values from the pod labels; those key-value labels are combined
|
||||
(using `AND`) with the match restrictions defined using the `labelSelector` field. The combined
|
||||
filtering selects the set of existing pods that will be taken into Pod (anti)affinity calculation.
|
||||
filtering selects the set of existing pods that will be taken into Pod (anti)affinity calculation.
|
||||
|
||||
A common use case is to use `matchLabelKeys` with `pod-template-hash` (set on Pods
|
||||
managed as part of a Deployment, where the value is unique for each revision).
|
||||
|
@ -405,7 +405,7 @@ spec:
|
|||
# Only Pods from a given rollout are taken into consideration when calculating pod affinity.
|
||||
# If you update the Deployment, the replacement Pods follow their own affinity rules
|
||||
# (if there are any defined in the new Pod template)
|
||||
matchLabelKeys:
|
||||
matchLabelKeys:
|
||||
- pod-template-hash
|
||||
```
|
||||
|
||||
|
@ -422,7 +422,7 @@ When you want to use it, you have to enable it via the
|
|||
{{< /note >}}
|
||||
|
||||
Kubernetes includes an optional `mismatchLabelKeys` field for Pod affinity
|
||||
or anti-affinity. The field specifies keys for the labels that should **not** match with the incoming Pod's labels,
|
||||
or anti-affinity. The field specifies keys for the labels that should **not** match with the incoming Pod's labels,
|
||||
when satisfying the Pod (anti)affinity.
|
||||
|
||||
One example use case is to ensure Pods go to the topology domain (node, zone, etc) where only Pods from the same tenant or team are scheduled in.
|
||||
|
@ -438,22 +438,22 @@ metadata:
|
|||
...
|
||||
spec:
|
||||
affinity:
|
||||
podAffinity:
|
||||
podAffinity:
|
||||
requiredDuringSchedulingIgnoredDuringExecution:
|
||||
# ensure that pods associated with this tenant land on the correct node pool
|
||||
- matchLabelKeys:
|
||||
- tenant
|
||||
topologyKey: node-pool
|
||||
podAntiAffinity:
|
||||
podAntiAffinity:
|
||||
requiredDuringSchedulingIgnoredDuringExecution:
|
||||
# ensure that pods associated with this tenant can't schedule to nodes used for another tenant
|
||||
- mismatchLabelKeys:
|
||||
- tenant # whatever the value of the "tenant" label for this Pod, prevent
|
||||
- tenant # whatever the value of the "tenant" label for this Pod, prevent
|
||||
# scheduling to nodes in any pool where any Pod from a different
|
||||
# tenant is running.
|
||||
labelSelector:
|
||||
# We have to have the labelSelector which selects only Pods with the tenant label,
|
||||
# otherwise this Pod would hate Pods from daemonsets as well, for example,
|
||||
# otherwise this Pod would hate Pods from daemonsets as well, for example,
|
||||
# which aren't supposed to have the tenant label.
|
||||
matchExpressions:
|
||||
- key: tenant
|
||||
|
@ -633,13 +633,13 @@ The following operators can only be used with `nodeAffinity`.
|
|||
|
||||
| Operator | Behaviour |
|
||||
| :------------: | :-------------: |
|
||||
| `Gt` | The supplied value will be parsed as an integer, and that integer is less than the integer that results from parsing the value of a label named by this selector |
|
||||
| `Lt` | The supplied value will be parsed as an integer, and that integer is greater than the integer that results from parsing the value of a label named by this selector |
|
||||
| `Gt` | The supplied value will be parsed as an integer, and that integer is less than the integer that results from parsing the value of a label named by this selector |
|
||||
| `Lt` | The supplied value will be parsed as an integer, and that integer is greater than the integer that results from parsing the value of a label named by this selector |
|
||||
|
||||
|
||||
{{<note>}}
|
||||
`Gt` and `Lt` operators will not work with non-integer values. If the given value
|
||||
doesn't parse as an integer, the pod will fail to get scheduled. Also, `Gt` and `Lt`
|
||||
`Gt` and `Lt` operators will not work with non-integer values. If the given value
|
||||
doesn't parse as an integer, the pod will fail to get scheduled. Also, `Gt` and `Lt`
|
||||
are not available for `podAffinity`.
|
||||
{{</note>}}
|
||||
|
||||
|
|
|
@ -64,7 +64,7 @@ and it cannot be prefixed with `system-`.
|
|||
|
||||
A PriorityClass object can have any 32-bit integer value smaller than or equal
|
||||
to 1 billion. This means that the range of values for a PriorityClass object is
|
||||
from -2147483648 to 1000000000 inclusive. Larger numbers are reserved for
|
||||
from -2147483648 to 1000000000 inclusive. Larger numbers are reserved for
|
||||
built-in PriorityClasses that represent critical system Pods. A cluster
|
||||
admin should create one PriorityClass object for each such mapping that they want.
|
||||
|
||||
|
@ -256,9 +256,9 @@ the Node is not considered for preemption.
|
|||
|
||||
If a pending Pod has inter-pod {{< glossary_tooltip text="affinity" term_id="affinity" >}}
|
||||
to one or more of the lower-priority Pods on the Node, the inter-Pod affinity
|
||||
rule cannot be satisfied in the absence of those lower-priority Pods. In this case,
|
||||
rule cannot be satisfied in the absence of those lower-priority Pods. In this case,
|
||||
the scheduler does not preempt any Pods on the Node. Instead, it looks for another
|
||||
Node. The scheduler might find a suitable Node or it might not. There is no
|
||||
Node. The scheduler might find a suitable Node or it might not. There is no
|
||||
guarantee that the pending Pod can be scheduled.
|
||||
|
||||
Our recommended solution for this problem is to create inter-Pod affinity only
|
||||
|
@ -361,7 +361,7 @@ to get evicted. The kubelet ranks pods for eviction based on the following facto
|
|||
|
||||
1. Whether the starved resource usage exceeds requests
|
||||
1. Pod Priority
|
||||
1. Amount of resource usage relative to requests
|
||||
1. Amount of resource usage relative to requests
|
||||
|
||||
See [Pod selection for kubelet eviction](/docs/concepts/scheduling-eviction/node-pressure-eviction/#pod-selection-for-kubelet-eviction)
|
||||
for more details.
|
||||
|
|
|
@ -9,7 +9,7 @@ weight: 40
|
|||
{{< feature-state for_k8s_version="v1.27" state="beta" >}}
|
||||
|
||||
Pods were considered ready for scheduling once created. Kubernetes scheduler
|
||||
does its due diligence to find nodes to place all pending Pods. However, in a
|
||||
does its due diligence to find nodes to place all pending Pods. However, in a
|
||||
real-world case, some Pods may stay in a "miss-essential-resources" state for a long period.
|
||||
These Pods actually churn the scheduler (and downstream integrators like Cluster AutoScaler)
|
||||
in an unnecessary manner.
|
||||
|
@ -79,7 +79,7 @@ Given the test-pod doesn't request any CPU/memory resources, it's expected that
|
|||
transited from previous `SchedulingGated` to `Running`:
|
||||
|
||||
```none
|
||||
NAME READY STATUS RESTARTS AGE IP NODE
|
||||
NAME READY STATUS RESTARTS AGE IP NODE
|
||||
test-pod 1/1 Running 0 15s 10.0.0.4 node-2
|
||||
```
|
||||
|
||||
|
@ -94,8 +94,8 @@ scheduling. You can use `scheduler_pending_pods{queue="gated"}` to check the met
|
|||
{{< feature-state for_k8s_version="v1.27" state="beta" >}}
|
||||
|
||||
You can mutate scheduling directives of Pods while they have scheduling gates, with certain constraints.
|
||||
At a high level, you can only tighten the scheduling directives of a Pod. In other words, the updated
|
||||
directives would cause the Pods to only be able to be scheduled on a subset of the nodes that it would
|
||||
At a high level, you can only tighten the scheduling directives of a Pod. In other words, the updated
|
||||
directives would cause the Pods to only be able to be scheduled on a subset of the nodes that it would
|
||||
previously match. More concretely, the rules for updating a Pod's scheduling directives are as follows:
|
||||
|
||||
1. For `.spec.nodeSelector`, only additions are allowed. If absent, it will be allowed to be set.
|
||||
|
@ -107,8 +107,8 @@ previously match. More concretely, the rules for updating a Pod's scheduling dir
|
|||
or `fieldExpressions` are allowed, and no changes to existing `matchExpressions`
|
||||
and `fieldExpressions` will be allowed. This is because the terms in
|
||||
`.requiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms`, are ORed
|
||||
while the expressions in `nodeSelectorTerms[].matchExpressions` and
|
||||
`nodeSelectorTerms[].fieldExpressions` are ANDed.
|
||||
while the expressions in `nodeSelectorTerms[].matchExpressions` and
|
||||
`nodeSelectorTerms[].fieldExpressions` are ANDed.
|
||||
|
||||
4. For `.preferredDuringSchedulingIgnoredDuringExecution`, all updates are allowed.
|
||||
This is because preferred terms are not authoritative, and so policy controllers
|
||||
|
|
|
@ -57,8 +57,8 @@ the `NodeResourcesFit` score function can be controlled by the
|
|||
Within the `scoringStrategy` field, you can configure two parameters: `requestedToCapacityRatio` and
|
||||
`resources`. The `shape` in the `requestedToCapacityRatio`
|
||||
parameter allows the user to tune the function as least requested or most
|
||||
requested based on `utilization` and `score` values. The `resources` parameter
|
||||
comprises both the `name` of the resource to be considered during scoring and
|
||||
requested based on `utilization` and `score` values. The `resources` parameter
|
||||
comprises both the `name` of the resource to be considered during scoring and
|
||||
its corresponding `weight`, which specifies the weight of each resource.
|
||||
|
||||
Below is an example configuration that sets
|
||||
|
|
|
@ -83,7 +83,7 @@ the Pod is put into the active queue or the backoff queue
|
|||
so that the scheduler will retry the scheduling of the Pod.
|
||||
|
||||
{{< note >}}
|
||||
QueueingHint evaluation during scheduling is a beta-level feature.
|
||||
QueueingHint evaluation during scheduling is a beta-level feature.
|
||||
The v1.28 release series initially enabled the associated feature gate; however, after the
|
||||
discovery of an excessive memory footprint, the Kubernetes project set that feature gate
|
||||
to be disabled by default. In Kubernetes {{< skew currentVersion >}}, this feature gate is
|
||||
|
|
|
@ -99,7 +99,7 @@ your cluster. Those fields are:
|
|||
{{< note >}}
|
||||
The `MinDomainsInPodTopologySpread` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
enables `minDomains` for pod topology spread. Starting from v1.28,
|
||||
the `MinDomainsInPodTopologySpread` gate
|
||||
the `MinDomainsInPodTopologySpread` gate
|
||||
is enabled by default. In older Kubernetes clusters it might be explicitly
|
||||
disabled or the field might not be available.
|
||||
{{< /note >}}
|
||||
|
|
Loading…
Reference in New Issue