commit
5638f85cff
|
@ -105,30 +105,24 @@ If you do not specify either, then the DaemonSet controller will create Pods on
|
|||
|
||||
## How Daemon Pods are scheduled
|
||||
|
||||
### Scheduled by default scheduler
|
||||
A DaemonSet ensures that all eligible nodes run a copy of a Pod. The DaemonSet
|
||||
controller creates a Pod for each eligible node and adds the
|
||||
`spec.affinity.nodeAffinity` field of the Pod to match the target host. After
|
||||
the Pod is created, the default scheduler typically takes over and then binds
|
||||
the Pod to the target host by setting the `.spec.nodeName` field. If the new
|
||||
Pod cannot fit on the node, the default scheduler may preempt (evict) some of
|
||||
the existing Pods based on the
|
||||
[priority](/docs/concepts/scheduling-eviction/pod-priority-preemption/#pod-priority)
|
||||
of the new Pod.
|
||||
|
||||
{{< feature-state for_k8s_version="1.17" state="stable" >}}
|
||||
The user can specify a different scheduler for the Pods of the DamonSet, by
|
||||
setting the `.spec.template.spec.schedulerName` field of the DaemonSet.
|
||||
|
||||
A DaemonSet ensures that all eligible nodes run a copy of a Pod. Normally, the
|
||||
node that a Pod runs on is selected by the Kubernetes scheduler. However,
|
||||
DaemonSet pods are created and scheduled by the DaemonSet controller instead.
|
||||
That introduces the following issues:
|
||||
|
||||
* Inconsistent Pod behavior: Normal Pods waiting to be scheduled are created
|
||||
and in `Pending` state, but DaemonSet pods are not created in `Pending`
|
||||
state. This is confusing to the user.
|
||||
* [Pod preemption](/docs/concepts/scheduling-eviction/pod-priority-preemption/)
|
||||
is handled by default scheduler. When preemption is enabled, the DaemonSet controller
|
||||
will make scheduling decisions without considering pod priority and preemption.
|
||||
|
||||
`ScheduleDaemonSetPods` allows you to schedule DaemonSets using the default
|
||||
scheduler instead of the DaemonSet controller, by adding the `NodeAffinity` term
|
||||
to the DaemonSet pods, instead of the `.spec.nodeName` term. The default
|
||||
scheduler is then used to bind the pod to the target host. If node affinity of
|
||||
the DaemonSet pod already exists, it is replaced (the original node affinity was
|
||||
taken into account before selecting the target host). The DaemonSet controller only
|
||||
performs these operations when creating or modifying DaemonSet pods, and no
|
||||
changes are made to the `spec.template` of the DaemonSet.
|
||||
The original node affinity specified at the
|
||||
`.spec.template.spec.affinity.nodeAffinity` field (if specified) is taken into
|
||||
consideration by the DaemonSet controller when evaluating the eligible nodes,
|
||||
but is replaced on the created Pod with the node affinity that matches the name
|
||||
of the eligible node.
|
||||
|
||||
```yaml
|
||||
nodeAffinity:
|
||||
|
@ -141,25 +135,40 @@ nodeAffinity:
|
|||
- target-host-name
|
||||
```
|
||||
|
||||
In addition, `node.kubernetes.io/unschedulable:NoSchedule` toleration is added
|
||||
automatically to DaemonSet Pods. The default scheduler ignores
|
||||
`unschedulable` Nodes when scheduling DaemonSet Pods.
|
||||
|
||||
### Taints and Tolerations
|
||||
### Taints and tolerations
|
||||
|
||||
Although Daemon Pods respect
|
||||
[taints and tolerations](/docs/concepts/scheduling-eviction/taint-and-toleration/),
|
||||
the following tolerations are added to DaemonSet Pods automatically according to
|
||||
the related features.
|
||||
The DaemonSet controller automatically adds a set of {{< glossary_tooltip
|
||||
text="tolerations" term_id="toleration" >}} to DaemonSet Pods:
|
||||
|
||||
| Toleration Key | Effect | Version | Description |
|
||||
| ---------------------------------------- | ---------- | ------- | ----------- |
|
||||
| `node.kubernetes.io/not-ready` | NoExecute | 1.13+ | DaemonSet pods will not be evicted when there are node problems such as a network partition. |
|
||||
| `node.kubernetes.io/unreachable` | NoExecute | 1.13+ | DaemonSet pods will not be evicted when there are node problems such as a network partition. |
|
||||
| `node.kubernetes.io/disk-pressure` | NoSchedule | 1.8+ | DaemonSet pods tolerate disk-pressure attributes by default scheduler. |
|
||||
| `node.kubernetes.io/memory-pressure` | NoSchedule | 1.8+ | DaemonSet pods tolerate memory-pressure attributes by default scheduler. |
|
||||
| `node.kubernetes.io/unschedulable` | NoSchedule | 1.12+ | DaemonSet pods tolerate unschedulable attributes by default scheduler. |
|
||||
| `node.kubernetes.io/network-unavailable` | NoSchedule | 1.12+ | DaemonSet pods, who uses host network, tolerate network-unavailable attributes by default scheduler. |
|
||||
{{< table caption="Tolerations for DaemonSet pods" >}}
|
||||
|
||||
| Toleration key | Effect | Details |
|
||||
| --------------------------------------------------------------------------------------------------------------------- | ------------ | --------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| [`node.kubernetes.io/not-ready`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-not-ready) | `NoExecute` | DaemonSet Pods can be scheduled onto nodes that are not healthy or ready to accept Pods. Any DaemonSet Pods running on such nodes will not be evicted. |
|
||||
| [`node.kubernetes.io/unreachable`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-unreachable) | `NoExecute` | DaemonSet Pods can be scheduled onto nodes that are unreachable from the node controller. Any DaemonSet Pods running on such nodes will not be evicted. |
|
||||
| [`node.kubernetes.io/disk-pressure`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-disk-pressure) | `NoSchedule` | DaemonSet Pods can be scheduled onto nodes with disk pressure issues. |
|
||||
| [`node.kubernetes.io/memory-pressure`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-memory-pressure) | `NoSchedule` | DaemonSet Pods can be scheduled onto nodes with memory pressure issues. |
|
||||
| [`node.kubernetes.io/pid-pressure`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-pid-pressure) | `NoSchedule` | DaemonSet Pods can be scheduled onto nodes with process pressure issues. |
|
||||
| [`node.kubernetes.io/unschedulable`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-unschedulable) | `NoSchedule` | DaemonSet Pods can be scheduled onto nodes that are unschedulable. |
|
||||
| [`node.kubernetes.io/network-unavailable`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-network-unavailable) | `NoSchedule` | **Only added for DaemonSet Pods that request host networking**, i.e., Pods having `spec.hostNetwork: true`. Such DaemonSet Pods can be scheduled onto nodes with unavailable network.|
|
||||
|
||||
{{< /table >}}
|
||||
|
||||
You can add your own tolerations to the Pods of a Daemonset as well, by
|
||||
defining these in the Pod template of the DaemonSet.
|
||||
|
||||
Because the DaemonSet controller sets the
|
||||
`node.kubernetes.io/unschedulable:NoSchedule` toleration automatically,
|
||||
Kubernetes can run DaemonSet Pods on nodes that are marked as _unschedulable_.
|
||||
|
||||
If you use a DaemonSet to provide an important node-level function, such as
|
||||
[cluster networking](/docs/concepts/cluster-administration/networking/), it is
|
||||
helpful that Kubernetes places DaemonSet Pods on nodes before they are ready.
|
||||
For example, without that special toleration, you could end up in a deadlock
|
||||
situation where the node is not marked as ready because the network plugin is
|
||||
not running there, and at the same time the network plugin is not running on
|
||||
that node because the node is not yet ready.
|
||||
|
||||
## Communicating with Daemon Pods
|
||||
|
||||
|
|
Loading…
Reference in New Issue