2016-03-16 05:18:42 +00:00
|
|
|
---
|
2018-02-22 22:07:43 +00:00
|
|
|
reviewers:
|
2016-07-29 17:36:25 +00:00
|
|
|
- jlowdermilk
|
|
|
|
- justinsb
|
|
|
|
- quinton-hoole
|
2019-06-12 11:57:29 +00:00
|
|
|
title: Running in multiple zones
|
2020-08-27 22:08:56 +00:00
|
|
|
weight: 20
|
2020-05-30 19:10:23 +00:00
|
|
|
content_type: concept
|
2016-03-16 05:18:42 +00:00
|
|
|
---
|
|
|
|
|
2020-05-30 19:10:23 +00:00
|
|
|
<!-- overview -->
|
2019-02-28 04:42:57 +00:00
|
|
|
|
2020-08-27 22:08:56 +00:00
|
|
|
This page describes running Kubernetes across multiple zones.
|
2020-05-30 19:10:23 +00:00
|
|
|
|
|
|
|
<!-- body -->
|
2019-02-28 04:42:57 +00:00
|
|
|
|
2020-08-27 22:08:56 +00:00
|
|
|
## Background
|
2016-03-16 05:18:42 +00:00
|
|
|
|
2020-08-27 22:08:56 +00:00
|
|
|
Kubernetes is designed so that a single Kubernetes cluster can run
|
|
|
|
across multiple failure zones, typically where these zones fit within
|
|
|
|
a logical grouping called a _region_. Major cloud providers define a region
|
|
|
|
as a set of failure zones (also called _availability zones_) that provide
|
|
|
|
a consistent set of features: within a region, each zone offers the same
|
|
|
|
APIs and services.
|
2019-02-28 04:42:57 +00:00
|
|
|
|
2020-08-27 22:08:56 +00:00
|
|
|
Typical cloud architectures aim to minimize the chance that a failure in
|
|
|
|
one zone also impairs services in another zone.
|
2016-03-16 05:18:42 +00:00
|
|
|
|
2020-08-27 22:08:56 +00:00
|
|
|
## Control plane behavior
|
2016-03-16 05:18:42 +00:00
|
|
|
|
2020-08-27 22:08:56 +00:00
|
|
|
All [control plane components](/docs/concepts/overview/components/#control-plane-components)
|
2021-01-24 19:27:42 +00:00
|
|
|
support running as a pool of interchangeable resources, replicated per
|
2020-08-27 22:08:56 +00:00
|
|
|
component.
|
2016-03-16 05:18:42 +00:00
|
|
|
|
2020-08-27 22:08:56 +00:00
|
|
|
When you deploy a cluster control plane, place replicas of
|
2020-12-02 04:52:30 +00:00
|
|
|
control plane components across multiple failure zones. If availability is
|
2020-08-27 22:08:56 +00:00
|
|
|
an important concern, select at least three failure zones and replicate
|
|
|
|
each individual control plane component (API server, scheduler, etcd,
|
|
|
|
cluster controller manager) across at least three failure zones.
|
|
|
|
If you are running a cloud controller manager then you should
|
|
|
|
also replicate this across all the failure zones you selected.
|
2016-03-16 05:18:42 +00:00
|
|
|
|
2018-11-06 19:33:04 +00:00
|
|
|
{{< note >}}
|
2020-08-27 22:08:56 +00:00
|
|
|
Kubernetes does not provide cross-zone resilience for the API server
|
|
|
|
endpoints. You can use various techniques to improve availability for
|
|
|
|
the cluster API server, including DNS round-robin, SRV records, or
|
|
|
|
a third-party load balancing solution with health checking.
|
2018-11-06 19:33:04 +00:00
|
|
|
{{< /note >}}
|
2016-12-28 20:54:52 +00:00
|
|
|
|
2020-08-27 22:08:56 +00:00
|
|
|
## Node behavior
|
2016-03-16 05:18:42 +00:00
|
|
|
|
2020-08-27 22:08:56 +00:00
|
|
|
Kubernetes automatically spreads the Pods for
|
|
|
|
workload resources (such as {{< glossary_tooltip text="Deployment" term_id="deployment" >}}
|
|
|
|
or {{< glossary_tooltip text="StatefulSet" term_id="statefulset" >}})
|
|
|
|
across different nodes in a cluster. This spreading helps
|
|
|
|
reduce the impact of failures.
|
2016-03-16 05:18:42 +00:00
|
|
|
|
2020-08-27 22:08:56 +00:00
|
|
|
When nodes start up, the kubelet on each node automatically adds
|
|
|
|
{{< glossary_tooltip text="labels" term_id="label" >}} to the Node object
|
|
|
|
that represents that specific kubelet in the Kubernetes API.
|
|
|
|
These labels can include
|
2021-03-10 06:01:58 +00:00
|
|
|
[zone information](/docs/reference/labels-annotations-taints/#topologykubernetesiozone).
|
2016-03-16 05:18:42 +00:00
|
|
|
|
2020-08-27 22:08:56 +00:00
|
|
|
If your cluster spans multiple zones or regions, you can use node labels
|
|
|
|
in conjunction with
|
|
|
|
[Pod topology spread constraints](/docs/concepts/workloads/pods/pod-topology-spread-constraints/)
|
|
|
|
to control how Pods are spread across your cluster among fault domains:
|
|
|
|
regions, zones, and even specific nodes.
|
|
|
|
These hints enable the
|
|
|
|
{{< glossary_tooltip text="scheduler" term_id="kube-scheduler" >}} to place
|
|
|
|
Pods for better expected availability, reducing the risk that a correlated
|
|
|
|
failure affects your whole workload.
|
2019-02-28 04:42:57 +00:00
|
|
|
|
2020-08-27 22:08:56 +00:00
|
|
|
For example, you can set a constraint to make sure that the
|
|
|
|
3 replicas of a StatefulSet are all running in different zones to each
|
|
|
|
other, whenever that is feasible. You can define this declaratively
|
|
|
|
without explicitly defining which availability zones are in use for
|
|
|
|
each workload.
|
2016-03-16 05:18:42 +00:00
|
|
|
|
2020-08-27 22:08:56 +00:00
|
|
|
### Distributing nodes across zones
|
2019-02-28 04:42:57 +00:00
|
|
|
|
2020-08-27 22:08:56 +00:00
|
|
|
Kubernetes' core does not create nodes for you; you need to do that yourself,
|
|
|
|
or use a tool such as the [Cluster API](https://cluster-api.sigs.k8s.io/) to
|
|
|
|
manage nodes on your behalf.
|
2016-03-16 05:18:42 +00:00
|
|
|
|
2020-08-27 22:08:56 +00:00
|
|
|
Using tools such as the Cluster API you can define sets of machines to run as
|
|
|
|
worker nodes for your cluster across multiple failure domains, and rules to
|
|
|
|
automatically heal the cluster in case of whole-zone service disruption.
|
2016-03-16 05:18:42 +00:00
|
|
|
|
2020-08-27 22:08:56 +00:00
|
|
|
## Manual zone assignment for Pods
|
2016-03-16 05:18:42 +00:00
|
|
|
|
2020-08-27 22:08:56 +00:00
|
|
|
You can apply [node selector constraints](/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector)
|
|
|
|
to Pods that you create, as well as to Pod templates in workload resources
|
|
|
|
such as Deployment, StatefulSet, or Job.
|
2019-02-28 04:42:57 +00:00
|
|
|
|
2020-08-27 22:08:56 +00:00
|
|
|
## Storage access for zones
|
2020-05-30 19:10:23 +00:00
|
|
|
|
2020-08-27 22:08:56 +00:00
|
|
|
When persistent volumes are created, the `PersistentVolumeLabel`
|
|
|
|
[admission controller](/docs/reference/access-authn-authz/admission-controllers/)
|
|
|
|
automatically adds zone labels to any PersistentVolumes that are linked to a specific
|
|
|
|
zone. The {{< glossary_tooltip text="scheduler" term_id="kube-scheduler" >}} then ensures,
|
|
|
|
through its `NoVolumeZoneConflict` predicate, that pods which claim a given PersistentVolume
|
|
|
|
are only placed into the same zone as that volume.
|
|
|
|
|
|
|
|
You can specify a {{< glossary_tooltip text="StorageClass" term_id="storage-class" >}}
|
|
|
|
for PersistentVolumeClaims that specifies the failure domains (zones) that the
|
|
|
|
storage in that class may use.
|
|
|
|
To learn about configuring a StorageClass that is aware of failure domains or zones,
|
|
|
|
see [Allowed topologies](/docs/concepts/storage/storage-classes/#allowed-topologies).
|
|
|
|
|
|
|
|
## Networking
|
|
|
|
|
|
|
|
By itself, Kubernetes does not include zone-aware networking. You can use a
|
2020-11-27 04:27:04 +00:00
|
|
|
[network plugin](/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/)
|
2020-08-27 22:08:56 +00:00
|
|
|
to configure cluster networking, and that network solution might have zone-specific
|
|
|
|
elements. For example, if your cloud provider supports Services with
|
|
|
|
`type=LoadBalancer`, the load balancer might only send traffic to Pods running in the
|
|
|
|
same zone as the load balancer element processing a given connection.
|
|
|
|
Check your cloud provider's documentation for details.
|
|
|
|
|
|
|
|
For custom or on-premises deployments, similar considerations apply.
|
|
|
|
{{< glossary_tooltip text="Service" term_id="service" >}} and
|
|
|
|
{{< glossary_tooltip text="Ingress" term_id="ingress" >}} behavior, including handling
|
|
|
|
of different failure zones, does vary depending on exactly how your cluster is set up.
|
|
|
|
|
|
|
|
## Fault recovery
|
|
|
|
|
|
|
|
When you set up your cluster, you might also need to consider whether and how
|
2020-12-02 04:52:30 +00:00
|
|
|
your setup can restore service if all the failure zones in a region go
|
2020-08-27 22:08:56 +00:00
|
|
|
off-line at the same time. For example, do you rely on there being at least
|
|
|
|
one node able to run Pods in a zone?
|
|
|
|
Make sure that any cluster-critical repair work does not rely
|
|
|
|
on there being at least one healthy node in your cluster. For example: if all nodes
|
|
|
|
are unhealthy, you might need to run a repair Job with a special
|
|
|
|
{{< glossary_tooltip text="toleration" term_id="toleration" >}} so that the repair
|
|
|
|
can complete enough to bring at least one node into service.
|
|
|
|
|
|
|
|
Kubernetes doesn't come with an answer for this challenge; however, it's
|
|
|
|
something to consider.
|
|
|
|
|
|
|
|
## {{% heading "whatsnext" %}}
|
|
|
|
|
|
|
|
To learn how the scheduler places Pods in a cluster, honoring the configured constraints,
|
|
|
|
visit [Scheduling and Eviction](/docs/concepts/scheduling-eviction/).
|