diff --git a/_data/guides.yml b/_data/guides.yml index 59f44d7123..4dcb59fece 100644 --- a/_data/guides.yml +++ b/_data/guides.yml @@ -120,6 +120,8 @@ toc: path: /docs/admin/multi-cluster/ - title: Using Large Clusters path: /docs/admin/cluster-large/ + - title: Running in Multiple Zones + path: /docs/admin/multiple-zones/ - title: Building High-Availability Clusters path: /docs/admin/high-availability/ - title: Accessing Clusters diff --git a/docs/admin/multiple-zones.md b/docs/admin/multiple-zones.md new file mode 100644 index 0000000000..e420bda304 --- /dev/null +++ b/docs/admin/multiple-zones.md @@ -0,0 +1,313 @@ +--- +--- + +## Introduction + +Kubernetes 1.2 adds support for running a single cluster in multiple failure zones +(GCE calls them simply "zones", AWS calls them "availability zones", here we'll refer to them as "zones"). +This is a lightweight version of a broader effort for federating multiple +Kubernetes clusters together (sometimes referred to by the affectionate +nickname ["Ubernetes"](https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/federation.md). +Full federation will allow combining separate +Kubernetes clusters running in different regions or clouds. However, many +users simply want to run a more available Kubernetes cluster in multiple zones +of their cloud provider, and this is what the multizone support in 1.2 allows +(we nickname this "Ubernetes Lite"). + +Multizone support is deliberately limited: a single Kubernetes cluster can run +in multiple zones, but only within the same region (and cloud provider). Only +GCE and AWS are currently supported automatically (though it is easy to +add similar support for other clouds or even bare metal, by simply arranging +for the appropriate labels to be added to nodes and volumes). + + +* TOC +{:toc} + +## Functionality + +When nodes are started, the kubelet automatically adds labels to them with +zone information. + +Kubernetes will automatically spread the pods in a replication controller +or service across nodes in a single-zone cluster (to reduce the impact of +failures.) With multiple-zone clusters, this spreading behaviour is +extended across zones (to reduce the impact of zone failures.) (This is +achieved via `SelectorSpreadPriority`). This is a best-effort +placement, and so if the zones in your cluster are heterogenous +(e.g. different numbers of nodes, different types of nodes, or +different pod resource requirements), this might prevent perfectly +even spreading of your pods across zones. If desired, you can use +homogenous zones (same number and types of nodes) to reduce the +probability of unequal spreading. + +When persistent volumes are created, the `PersistentVolumeLabel` +admission controller automatically adds zone labels to them. The scheduler (via the +`VolumeZonePredicate` predicate) will then ensure that pods that claim a +given volume are only placed into the same zone as that volume, as volumes +cannot be attached across zones. + +## Limitations + +There are some important limitations of the multizone support: + +* We assume that the different zones are located close to each other in the +network, so we don't perform any zone-aware routing. In particular, traffic +that goes via services might cross zones (even if pods in some pods backing that service +exist in the same zone as the client), and this may incur additional latency and cost. + +* Volume zone-affinity will only work with a `PersistentVolume`, and will not +work if you directly specify an EBS volume in the pod spec (for example). + +* Clusters cannot span clouds or regions (this functionality will require full +federation support). + +* Although your nodes are in multiple zones, kube-up currently builds +a single master node by default. While services are highly +available and can tolerate the loss of a zone, the control plane is +located in a single zone. Users that want a highly available control +plane should follow the [high availability](/docs/admin/high-availability) instructions. + + +## Walkthough + +We're now going to walk through setting up and using a multi-zone +cluster on both GCE & AWS. To do so, you bring up a full cluster +(specifying `MULTIZONE=1`), and then you add nodes in additional zones +by running `kube-up` again (specifying `KUBE_USE_EXISTING_MASTER=true`). + +### Bringing up your cluster + +Create the cluster as normal, but pass MULTIZONE to tell the cluster to manage multiple zones; creating nodes in us-central1-a. + +GCE: + +```shell +curl -sS https://get.k8s.io | MULTIZONE=1 KUBERNETES_PROVIDER=gce KUBE_GCE_ZONE=us-central1-a NUM_NODES=3 bash +``` + +AWS: + +```shell +curl -sS https://get.k8s.io | MULTIZONE=1 KUBERNETES_PROVIDER=aws KUBE_AWS_ZONE=us-west-2a NUM_NODES=3 bash +``` + +This step brings up a cluster as normal, still running in a single zone +(but `MULTIZONE=1` has enabled multi-zone capabilities). + +### Nodes are labeled + +View the nodes; you can see that they are labeled with zone information. +They are all in `us-central1-a` (GCE) or `us-west-2a` (AWS) so far. The +labels are `failure-domain.beta.kubernetes.io/region` for the region, +and `failure-domain.beta.kubernetes.io/zone` for the zone: + +```shell +> kubectl get nodes --show-labels + + +NAME STATUS AGE LABELS +kubernetes-master Ready,SchedulingDisabled 6m beta.kubernetes.io/instance-type=n1-standard-1,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=kubernetes-master +kubernetes-minion-87j9 Ready 6m beta.kubernetes.io/instance-type=n1-standard-2,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=kubernetes-minion-87j9 +kubernetes-minion-9vlv Ready 6m beta.kubernetes.io/instance-type=n1-standard-2,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=kubernetes-minion-9vlv +kubernetes-minion-a12q Ready 6m beta.kubernetes.io/instance-type=n1-standard-2,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=kubernetes-minion-a12q +``` + +### Add more nodes in a second zone + +Let's add another set of nodes to the existing cluster, reusing the +existing master, running in a different zone (us-central1-b or us-west-2b). +We run kube-up again, but by specifying `KUBE_USE_EXISTING_MASTER=1` +kube-up will not create a new master, but will reuse one that was previously +created instead. + +GCE: + +```shell +KUBE_USE_EXISTING_MASTER=true MULTIZONE=1 KUBERNETES_PROVIDER=gce KUBE_GCE_ZONE=us-central1-b NUM_NODES=3 kubernetes/cluster/kube-up.sh +``` + +On AWS we also need to specify the network CIDR for the additional +subnet, along with the master internal IP address: + +```shell +KUBE_USE_EXISTING_MASTER=true MULTIZONE=1 KUBERNETES_PROVIDER=aws KUBE_AWS_ZONE=us-west-2b NUM_NODES=3 KUBE_SUBNET_CIDR=172.20.1.0/24 MASTER_INTERNAL_IP=172.20.0.9 kubernetes/cluster/kube-up.sh +``` + + +View the nodes again; 3 more nodes should have launched and be tagged +in us-central1-b: + +```shell +> kubectl get nodes --show-labels + +NAME STATUS AGE LABELS +kubernetes-master Ready,SchedulingDisabled 16m beta.kubernetes.io/instance-type=n1-standard-1,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=kubernetes-master +kubernetes-minion-281d Ready 2m beta.kubernetes.io/instance-type=n1-standard-2,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-b,kubernetes.io/hostname=kubernetes-minion-281d +kubernetes-minion-87j9 Ready 16m beta.kubernetes.io/instance-type=n1-standard-2,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=kubernetes-minion-87j9 +kubernetes-minion-9vlv Ready 16m beta.kubernetes.io/instance-type=n1-standard-2,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=kubernetes-minion-9vlv +kubernetes-minion-a12q Ready 17m beta.kubernetes.io/instance-type=n1-standard-2,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=kubernetes-minion-a12q +kubernetes-minion-pp2f Ready 2m beta.kubernetes.io/instance-type=n1-standard-2,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-b,kubernetes.io/hostname=kubernetes-minion-pp2f +kubernetes-minion-wf8i Ready 2m beta.kubernetes.io/instance-type=n1-standard-2,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-b,kubernetes.io/hostname=kubernetes-minion-wf8i +``` + +### Volume affinity + +Create a volume (only PersistentVolumes are supported for zone +affinity), using the new dynamic volume creation: + +```json +kubectl create -f - < kubectl get pv --show-labels +NAME CAPACITY ACCESSMODES STATUS CLAIM REASON AGE LABELS +pv-gce-mj4gm 5Gi RWO Bound default/claim1 46s failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a +``` + +So now we will create a pod that uses the persistent volume claim. +Because GCE PDs / AWS EBS volumes cannot be attached across zones, +this means that this pod can only be created in the same zone as the volume: + +```yaml +kubectl create -f - < kubectl describe pod mypod | grep Node +Node: kubernetes-minion-9vlv/10.240.0.5 +> kubectl get node kubernetes-minion-9vlv --show-labels +NAME STATUS AGE LABELS +kubernetes-minion-9vlv Ready 22m beta.kubernetes.io/instance-type=n1-standard-2,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=kubernetes-minion-9vlv +``` + +### Pods are spread across zones + +Pods in a replication controller or service are automatically spread +across zones. First, let's launch more nodes in a third zone: + +GCE: + +```shell +KUBE_USE_EXISTING_MASTER=true MULTIZONE=1 KUBERNETES_PROVIDER=gce KUBE_GCE_ZONE=us-central1-f NUM_NODES=3 kubernetes/cluster/kube-up.sh +``` + +AWS: + +```shell +KUBE_USE_EXISTING_MASTER=true MULTIZONE=1 KUBERNETES_PROVIDER=aws KUBE_AWS_ZONE=us-west-2c NUM_NODES=3 KUBE_SUBNET_CIDR=172.20.2.0/24 MASTER_INTERNAL_IP=172.20.0.9 kubernetes/cluster/kube-up.sh +``` + +Verify that you now have nodes in 3 zones: + +```shell +kubectl get nodes --show-labels +``` + +Create the guestbook-go example, which includes an RC of size 3, running a simple web app: + +```shell +find kubernetes/examples/guestbook-go/ -name '*.json' | xargs -I {} kubectl create -f {} +``` + +The pods should be spread across all 3 zones: + +```shell +> kubectl describe pod -l app=guestbook | grep Node +Node: kubernetes-minion-9vlv/10.240.0.5 +Node: kubernetes-minion-281d/10.240.0.8 +Node: kubernetes-minion-olsh/10.240.0.11 + + > kubectl get node kubernetes-minion-9vlv kubernetes-minion-281d kubernetes-minion-olsh --show-labels +NAME STATUS AGE LABELS +kubernetes-minion-9vlv Ready 34m beta.kubernetes.io/instance-type=n1-standard-2,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=kubernetes-minion-9vlv +kubernetes-minion-281d Ready 20m beta.kubernetes.io/instance-type=n1-standard-2,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-b,kubernetes.io/hostname=kubernetes-minion-281d +kubernetes-minion-olsh Ready 3m beta.kubernetes.io/instance-type=n1-standard-2,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-f,kubernetes.io/hostname=kubernetes-minion-olsh +``` + + +Load-balancers span all zones in a cluster; the guestbook-go example +includes an example load-balanced service: + +```shell +> kubectl describe service guestbook | grep LoadBalancer.Ingress +LoadBalancer Ingress: 130.211.126.21 + +> ip=130.211.126.21 + +> curl -s http://${ip}:3000/env | grep HOSTNAME + "HOSTNAME": "guestbook-44sep", + +> (for i in `seq 20`; do curl -s http://${ip}:3000/env | grep HOSTNAME; done) | sort | uniq + "HOSTNAME": "guestbook-44sep", + "HOSTNAME": "guestbook-hum5n", + "HOSTNAME": "guestbook-ppm40", +``` + +The load balancer correctly targets all the pods, even though they are in multiple zones. + +### Shutting down the cluster + +When you're done, clean up: + +GCE: + +```shell +KUBERNETES_PROVIDER=gce KUBE_USE_EXISTING_MASTER=true KUBE_GCE_ZONE=us-central1-f kubernetes/cluster/kube-down.sh +KUBERNETES_PROVIDER=gce KUBE_USE_EXISTING_MASTER=true KUBE_GCE_ZONE=us-central1-b kubernetes/cluster/kube-down.sh +KUBERNETES_PROVIDER=gce KUBE_GCE_ZONE=us-central1-a kubernetes/cluster/kube-down.sh +``` + +AWS: + +```shell +KUBERNETES_PROVIDER=aws KUBE_USE_EXISTING_MASTER=true KUBE_AWS_ZONE=us-west-2c kubernetes/cluster/kube-down.sh +KUBERNETES_PROVIDER=aws KUBE_USE_EXISTING_MASTER=true KUBE_AWS_ZONE=us-west-2b kubernetes/cluster/kube-down.sh +KUBERNETES_PROVIDER=aws KUBE_AWS_ZONE=us-west-2a kubernetes/cluster/kube-down.sh +```