2017-06-26 20:54:25 +00:00
|
|
|
---
|
2018-02-27 18:51:46 +00:00
|
|
|
reviewers:
|
2017-06-26 20:54:25 +00:00
|
|
|
- erictune
|
|
|
|
- foxish
|
|
|
|
- davidopp
|
|
|
|
title: Disruptions
|
2018-05-05 16:00:51 +00:00
|
|
|
content_template: templates/concept
|
2018-06-06 23:51:26 +00:00
|
|
|
weight: 60
|
2017-06-26 20:54:25 +00:00
|
|
|
---
|
|
|
|
|
2018-05-05 16:00:51 +00:00
|
|
|
{{% capture overview %}}
|
2017-06-26 20:54:25 +00:00
|
|
|
This guide is for application owners who want to build
|
2017-06-28 04:42:07 +00:00
|
|
|
highly available applications, and thus need to understand
|
2017-06-26 20:54:25 +00:00
|
|
|
what types of Disruptions can happen to Pods.
|
|
|
|
|
|
|
|
It is also for Cluster Administrators who want to perform automated
|
|
|
|
cluster actions, like upgrading and autoscaling clusters.
|
|
|
|
|
2018-05-05 16:00:51 +00:00
|
|
|
{{% /capture %}}
|
2017-06-26 20:54:25 +00:00
|
|
|
|
|
|
|
|
2018-05-05 16:00:51 +00:00
|
|
|
{{% capture body %}}
|
2017-06-26 20:54:25 +00:00
|
|
|
|
|
|
|
## Voluntary and Involuntary Disruptions
|
|
|
|
|
|
|
|
Pods do not disappear until someone (a person or a controller) destroys them, or
|
|
|
|
there is an unavoidable hardware or system software error.
|
|
|
|
|
|
|
|
We call these unavoidable cases *involuntary disruptions* to
|
2017-06-28 04:42:07 +00:00
|
|
|
an application. Examples are:
|
2017-06-26 20:54:25 +00:00
|
|
|
|
|
|
|
- a hardware failure of the physical machine backing the node
|
|
|
|
- cluster administrator deletes VM (instance) by mistake
|
2017-06-28 04:42:07 +00:00
|
|
|
- cloud provider or hypervisor failure makes VM disappear
|
2017-06-26 20:54:25 +00:00
|
|
|
- a kernel panic
|
2017-08-29 22:27:33 +00:00
|
|
|
- the node disappears from the cluster due to cluster network partition
|
2017-08-11 08:20:47 +00:00
|
|
|
- eviction of a pod due to the node being [out-of-resources](/docs/tasks/administer-cluster/out-of-resource/).
|
2017-06-26 20:54:25 +00:00
|
|
|
|
|
|
|
Except for the out-of-resources condition, all these conditions
|
2017-08-01 08:28:37 +00:00
|
|
|
should be familiar to most users; they are not specific
|
2017-06-26 20:54:25 +00:00
|
|
|
to Kubernetes.
|
|
|
|
|
|
|
|
We call other cases *voluntary disruptions*. These include both
|
|
|
|
actions initiated by the application owner and those initiated by a Cluster
|
|
|
|
Administrator. Typical application owner actions include:
|
|
|
|
|
|
|
|
- deleting the deployment or other controller that manages the pod
|
|
|
|
- updating a deployment's pod template causing a restart
|
|
|
|
- directly deleting a pod (e.g. by accident)
|
|
|
|
|
|
|
|
Cluster Administrator actions include:
|
|
|
|
|
2017-08-11 08:20:47 +00:00
|
|
|
- [Draining a node](/docs/tasks/administer-cluster/safely-drain-node/) for repair or upgrade.
|
2017-06-26 20:54:25 +00:00
|
|
|
- Draining a node from a cluster to scale the cluster down (learn about
|
|
|
|
[Cluster Autoscaling](/docs/tasks/administer-cluster/cluster-management/#cluster-autoscaler)
|
|
|
|
).
|
|
|
|
- Removing a pod from a node to permit something else to fit on that node.
|
|
|
|
|
|
|
|
These actions might be taken directly by the cluster administrator, or by automation
|
|
|
|
run by the cluster administrator, or by your cluster hosting provider.
|
|
|
|
|
|
|
|
Ask your cluster administrator or consult your cloud provider or distribution documentation
|
|
|
|
to determine if any sources of voluntary disruptions are enabled for your cluster.
|
|
|
|
If none are enabled, you can skip creating Pod Disruption Budgets.
|
|
|
|
|
|
|
|
## Dealing with Disruptions
|
|
|
|
|
|
|
|
Here are some ways to mitigate involuntary disruptions:
|
|
|
|
|
|
|
|
- Ensure your pod [requests the resources](/docs/tasks/configure-pod-container/assign-cpu-ram-container) it needs.
|
|
|
|
- Replicate your application if you need higher availability. (Learn about running replicated
|
2017-08-11 08:20:47 +00:00
|
|
|
[stateless](/docs/tasks/run-application/run-stateless-application-deployment/)
|
|
|
|
and [stateful](/docs/tasks/run-application/run-replicated-stateful-application/) applications.)
|
2017-06-26 20:54:25 +00:00
|
|
|
- For even higher availability when running replicated applications,
|
|
|
|
spread applications across racks (using
|
2017-09-25 20:43:15 +00:00
|
|
|
[anti-affinity](/docs/user-guide/node-selection/#inter-pod-affinity-and-anti-affinity-beta-feature))
|
2017-06-26 20:54:25 +00:00
|
|
|
or across zones (if using a
|
2018-06-26 05:18:52 +00:00
|
|
|
[multi-zone cluster](/docs/setup/multiple-zones).)
|
2017-06-26 20:54:25 +00:00
|
|
|
|
|
|
|
The frequency of voluntary disruptions varies. On a basic Kubernetes cluster, there are
|
2017-06-28 04:42:07 +00:00
|
|
|
no voluntary disruptions at all. However, your cluster administrator or hosting provider
|
2017-10-09 01:11:51 +00:00
|
|
|
may run some additional services which cause voluntary disruptions. For example,
|
|
|
|
rolling out node software updates can cause voluntary disruptions. Also, some implementations
|
2017-06-26 20:54:25 +00:00
|
|
|
of cluster (node) autoscaling may cause voluntary disruptions to defragment and compact nodes.
|
2017-10-28 08:57:42 +00:00
|
|
|
Your cluster administrator or hosting provider should have documented what level of voluntary
|
2017-06-26 20:54:25 +00:00
|
|
|
disruptions, if any, to expect.
|
|
|
|
|
|
|
|
Kubernetes offers features to help run highly available applications at the same
|
|
|
|
time as frequent voluntary disruptions. We call this set of features
|
|
|
|
*Disruption Budgets*.
|
|
|
|
|
|
|
|
|
|
|
|
## How Disruption Budgets Work
|
|
|
|
|
|
|
|
An Application Owner can create a `PodDisruptionBudget` object (PDB) for each application.
|
|
|
|
A PDB limits the number pods of a replicated application that are down simultaneously from
|
|
|
|
voluntary disruptions. For example, a quorum-based application would
|
|
|
|
like to ensure that the number of replicas running is never brought below the
|
|
|
|
number needed for a quorum. A web front end might want to
|
|
|
|
ensure that the number of replicas serving load never falls below a certain
|
2017-07-25 16:37:19 +00:00
|
|
|
percentage of the total.
|
2017-06-26 20:54:25 +00:00
|
|
|
|
|
|
|
Cluster managers and hosting providers should use tools which
|
|
|
|
respect Pod Disruption Budgets by calling the [Eviction API](/docs/tasks/administer-cluster/safely-drain-node/#the-eviction-api)
|
|
|
|
instead of directly deleting pods. Examples are the `kubectl drain` command
|
|
|
|
and the Kubernetes-on-GCE cluster upgrade script (`cluster/gce/upgrade.sh`).
|
|
|
|
|
|
|
|
When a cluster administrator wants to drain a node
|
|
|
|
they use the `kubectl drain` command. That tool tries to evict all
|
|
|
|
the pods on the machine. The eviction request may be temporarily rejected,
|
|
|
|
and the tool periodically retries all failed requests until all pods
|
2017-06-28 04:42:07 +00:00
|
|
|
are terminated, or until a configurable timeout is reached.
|
2017-06-26 20:54:25 +00:00
|
|
|
|
|
|
|
A PDB specifies the number of replicas that an application can tolerate having, relative to how
|
2018-06-01 17:46:55 +00:00
|
|
|
many it is intended to have. For example, a Deployment which has a `.spec.replicas: 5` is
|
2017-06-26 20:54:25 +00:00
|
|
|
supposed to have 5 pods at any given time. If its PDB allows for there to be 4 at a time,
|
|
|
|
then the Eviction API will allow voluntary disruption of one, but not two pods, at a time.
|
|
|
|
|
|
|
|
The group of pods that comprise the application is specified using a label selector, the same
|
|
|
|
as the one used by the application's controller (deployment, stateful-set, etc).
|
|
|
|
|
|
|
|
The "intended" number of pods is computed from the `.spec.replicas` of the pods controller.
|
|
|
|
The controller is discovered from the pods using the `.metadata.ownerReferences` of the object.
|
|
|
|
|
|
|
|
PDBs cannot prevent [involuntary disruptions](#voluntary-and-involuntary-disruptions) from
|
2017-08-22 23:48:04 +00:00
|
|
|
occurring, but they do count against the budget.
|
2017-06-26 20:54:25 +00:00
|
|
|
|
|
|
|
Pods which are deleted or unavailable due to a rolling upgrade to an application do count
|
|
|
|
against the disruption budget, but controllers (like deployment and stateful-set)
|
|
|
|
are not limited by PDBs when doing rolling upgrades -- the handling of failures
|
|
|
|
during application updates is configured in the controller spec.
|
2017-09-02 03:41:03 +00:00
|
|
|
(Learn about [updating a deployment](/docs/concepts/workloads/controllers/deployment/#updating-a-deployment).)
|
2017-06-26 20:54:25 +00:00
|
|
|
|
|
|
|
When a pod is evicted using the eviction API, it is gracefully terminated (see
|
2018-05-05 16:00:51 +00:00
|
|
|
`terminationGracePeriodSeconds` in [PodSpec](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#podspec-v1-core).)
|
2017-06-26 20:54:25 +00:00
|
|
|
|
|
|
|
## PDB Example
|
|
|
|
|
|
|
|
Consider a cluster with 3 nodes, `node-1` through `node-3`.
|
|
|
|
The cluster is running several applications. One of them has 3 replicas initially called
|
|
|
|
`pod-a`, `pod-b`, and `pod-c`. Another, unrelated pod without a PDB, called `pod-x`, is also shown.
|
2017-06-28 04:42:07 +00:00
|
|
|
Initially, the pods are laid out as follows:
|
2017-06-26 20:54:25 +00:00
|
|
|
|
|
|
|
| node-1 | node-2 | node-3 |
|
|
|
|
|:--------------------:|:-------------------:|:------------------:|
|
|
|
|
| pod-a *available* | pod-b *available* | pod-c *available* |
|
|
|
|
| pod-x *available* | | |
|
|
|
|
|
2017-08-29 22:27:33 +00:00
|
|
|
All 3 pods are part of a deployment, and they collectively have a PDB which requires
|
2017-06-26 20:54:25 +00:00
|
|
|
there be at least 2 of the 3 pods to be available at all times.
|
|
|
|
|
|
|
|
For example, assume the cluster administrator wants to reboot into a new kernel version to fix a bug in the kernel.
|
|
|
|
The cluster administrator first tries to drain `node-1` using the `kubectl drain` command.
|
|
|
|
That tool tries to evict `pod-a` and `pod-x`. This succeeds immediately.
|
|
|
|
Both pods go into the `terminating` state at the same time.
|
|
|
|
This puts the cluster in this state:
|
|
|
|
|
|
|
|
| node-1 *draining* | node-2 | node-3 |
|
|
|
|
|:--------------------:|:-------------------:|:------------------:|
|
|
|
|
| pod-a *terminating* | pod-b *available* | pod-c *available* |
|
|
|
|
| pod-x *terminating* | | |
|
|
|
|
|
|
|
|
The deployment notices that one of the pods is terminating, so it creates a replacement
|
|
|
|
called `pod-d`. Since `node-1` is cordoned, it lands on another node. Something has
|
|
|
|
also created `pod-y` as a replacement for `pod-x`.
|
|
|
|
|
|
|
|
(Note: for a StatefulSet, `pod-a`, which would be called something like `pod-1`, would need
|
|
|
|
to terminate completely before its replacement, which is also called `pod-1` but has a
|
|
|
|
different UID, could be created. Otherwise, the example applies to a StatefulSet as well.)
|
|
|
|
|
|
|
|
Now the cluster is in this state:
|
|
|
|
|
|
|
|
| node-1 *draining* | node-2 | node-3 |
|
|
|
|
|:--------------------:|:-------------------:|:------------------:|
|
|
|
|
| pod-a *terminating* | pod-b *available* | pod-c *available* |
|
|
|
|
| pod-x *terminating* | pod-d *starting* | pod-y |
|
|
|
|
|
2017-08-29 22:27:33 +00:00
|
|
|
At some point, the pods terminate, and the cluster looks like this:
|
2017-06-26 20:54:25 +00:00
|
|
|
|
|
|
|
| node-1 *drained* | node-2 | node-3 |
|
|
|
|
|:--------------------:|:-------------------:|:------------------:|
|
|
|
|
| | pod-b *available* | pod-c *available* |
|
|
|
|
| | pod-d *starting* | pod-y |
|
|
|
|
|
|
|
|
At this point, if an impatient cluster administrator tries to drain `node-2` or
|
|
|
|
`node-3`, the drain command will block, because there are only 2 available
|
2017-08-01 09:16:30 +00:00
|
|
|
pods for the deployment, and its PDB requires at least 2. After some time passes, `pod-d` becomes available.
|
2017-06-26 20:54:25 +00:00
|
|
|
|
|
|
|
The cluster state now looks like this:
|
|
|
|
|
|
|
|
| node-1 *drained* | node-2 | node-3 |
|
|
|
|
|:--------------------:|:-------------------:|:------------------:|
|
|
|
|
| | pod-b *available* | pod-c *available* |
|
|
|
|
| | pod-d *available* | pod-y |
|
|
|
|
|
2017-08-08 00:28:46 +00:00
|
|
|
Now, the cluster administrator tries to drain `node-2`.
|
2017-07-25 16:37:19 +00:00
|
|
|
The drain command will try to evict the two pods in some order, say
|
2017-06-26 20:54:25 +00:00
|
|
|
`pod-b` first and then `pod-d`. It will succeed at evicting `pod-b`.
|
|
|
|
But, when it tries to evict `pod-d`, it will be refused because that would leave only
|
|
|
|
one pod available for the deployment.
|
|
|
|
|
|
|
|
The deployment creates a replacement for `pod-b` called `pod-e`.
|
2018-02-12 23:42:49 +00:00
|
|
|
Because there are not enough resources in the cluster to schedule
|
|
|
|
`pod-e` the drain will again block. The cluster may end up in this
|
2017-06-26 20:54:25 +00:00
|
|
|
state:
|
|
|
|
|
|
|
|
| node-1 *drained* | node-2 | node-3 | *no node* |
|
|
|
|
|:--------------------:|:-------------------:|:------------------:|:------------------:|
|
|
|
|
| | pod-b *available* | pod-c *available* | pod-e *pending* |
|
|
|
|
| | pod-d *available* | pod-y | |
|
|
|
|
|
|
|
|
At this point, the cluster administrator needs to
|
|
|
|
add a node back to the cluster to proceed with the upgrade.
|
|
|
|
|
|
|
|
You can see how Kubernetes varies the rate at which disruptions
|
|
|
|
can happen, according to:
|
|
|
|
|
|
|
|
- how many replicas an application needs
|
|
|
|
- how long it takes to gracefully shutdown an instance
|
|
|
|
- how long it takes a new instance to start up
|
|
|
|
- the type of controller
|
|
|
|
- the cluster's resource capacity
|
|
|
|
|
|
|
|
## Separating Cluster Owner and Application Owner Roles
|
|
|
|
|
|
|
|
Often, it is useful to think of the Cluster Manager
|
2017-06-28 04:42:07 +00:00
|
|
|
and Application Owner as separate roles with limited knowledge
|
2017-06-26 20:54:25 +00:00
|
|
|
of each other. This separation of responsibilities
|
|
|
|
may make sense in these scenarios:
|
|
|
|
|
2017-07-25 16:37:19 +00:00
|
|
|
- when there are many application teams sharing a Kubernetes cluster, and
|
2017-06-26 20:54:25 +00:00
|
|
|
there is natural specialization of roles
|
|
|
|
- when third-party tools or services are used to automate cluster management
|
|
|
|
|
2017-06-28 04:42:07 +00:00
|
|
|
Pod Disruption Budgets support this separation of roles by providing an
|
2017-06-26 20:54:25 +00:00
|
|
|
interface between the roles.
|
|
|
|
|
|
|
|
If you do not have such a separation of responsibilities in your organization,
|
|
|
|
you may not need to use Pod Disruption Budgets.
|
|
|
|
|
2017-07-14 18:42:30 +00:00
|
|
|
## How to perform Disruptive Actions on your Cluster
|
2017-06-26 20:54:25 +00:00
|
|
|
|
|
|
|
If you are a Cluster Administrator, and you need to perform a disruptive action on all
|
|
|
|
the nodes in your cluster, such as a node or system software upgrade, here are some options:
|
|
|
|
|
2017-07-25 16:37:19 +00:00
|
|
|
- Accept downtime during the upgrade.
|
2017-06-26 20:54:25 +00:00
|
|
|
- Fail over to another complete replica cluster.
|
|
|
|
- No downtime, but may be costly both for the duplicated nodes,
|
|
|
|
and for human effort to orchestrate the switchover.
|
|
|
|
- Write disruption tolerant applications and use PDBs.
|
|
|
|
- No downtime.
|
|
|
|
- Minimal resource duplication.
|
|
|
|
- Allows more automation of cluster administration.
|
|
|
|
- Writing disruption-tolerant applications is tricky, but the work to tolerate voluntary
|
|
|
|
disruptions largely overlaps with work to support autoscaling and tolerating
|
|
|
|
involuntary disruptions.
|
|
|
|
|
2018-05-05 16:00:51 +00:00
|
|
|
{{% /capture %}}
|
2017-06-26 20:54:25 +00:00
|
|
|
|
|
|
|
|
2018-05-05 16:00:51 +00:00
|
|
|
{{% capture whatsnext %}}
|
2017-06-26 20:54:25 +00:00
|
|
|
|
2017-08-11 08:20:47 +00:00
|
|
|
* Follow steps to protect your application by [configuring a Pod Disruption Budget](/docs/tasks/run-application/configure-pdb/).
|
2017-06-26 20:54:25 +00:00
|
|
|
|
2017-08-11 08:20:47 +00:00
|
|
|
* Learn more about [draining nodes](/docs/tasks/administer-cluster/safely-drain-node/)
|
2017-06-26 20:54:25 +00:00
|
|
|
|
2018-05-05 16:00:51 +00:00
|
|
|
{{% /capture %}}
|