2016-03-16 05:18:42 +00:00
|
|
|
---
|
2016-07-29 17:36:25 +00:00
|
|
|
assignees:
|
|
|
|
- jlowdermilk
|
|
|
|
- justinsb
|
|
|
|
- quinton-hoole
|
|
|
|
|
2016-03-16 05:18:42 +00:00
|
|
|
---
|
|
|
|
|
|
|
|
## Introduction
|
|
|
|
|
|
|
|
Kubernetes 1.2 adds support for running a single cluster in multiple failure zones
|
|
|
|
(GCE calls them simply "zones", AWS calls them "availability zones", here we'll refer to them as "zones").
|
2016-07-06 21:14:25 +00:00
|
|
|
This is a lightweight version of a broader Cluster Federation feature (previously referred to by the affectionate
|
2016-07-14 23:48:52 +00:00
|
|
|
nickname ["Ubernetes"](https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/federation.md)).
|
2016-07-06 21:14:25 +00:00
|
|
|
Full Cluster Federation allows combining separate
|
|
|
|
Kubernetes clusters running in different regions or cloud providers
|
|
|
|
(or on-premise data centers). However, many
|
2016-03-16 05:18:42 +00:00
|
|
|
users simply want to run a more available Kubernetes cluster in multiple zones
|
2016-07-06 21:14:25 +00:00
|
|
|
of their single cloud provider, and this is what the multizone support in 1.2 allows
|
|
|
|
(this previously went by the nickname "Ubernetes Lite").
|
2016-03-16 05:18:42 +00:00
|
|
|
|
|
|
|
Multizone support is deliberately limited: a single Kubernetes cluster can run
|
|
|
|
in multiple zones, but only within the same region (and cloud provider). Only
|
|
|
|
GCE and AWS are currently supported automatically (though it is easy to
|
|
|
|
add similar support for other clouds or even bare metal, by simply arranging
|
|
|
|
for the appropriate labels to be added to nodes and volumes).
|
|
|
|
|
|
|
|
|
|
|
|
* TOC
|
|
|
|
{:toc}
|
|
|
|
|
|
|
|
## Functionality
|
|
|
|
|
|
|
|
When nodes are started, the kubelet automatically adds labels to them with
|
|
|
|
zone information.
|
|
|
|
|
|
|
|
Kubernetes will automatically spread the pods in a replication controller
|
|
|
|
or service across nodes in a single-zone cluster (to reduce the impact of
|
|
|
|
failures.) With multiple-zone clusters, this spreading behaviour is
|
|
|
|
extended across zones (to reduce the impact of zone failures.) (This is
|
|
|
|
achieved via `SelectorSpreadPriority`). This is a best-effort
|
2016-09-19 11:48:04 +00:00
|
|
|
placement, and so if the zones in your cluster are heterogeneous
|
2016-03-16 05:18:42 +00:00
|
|
|
(e.g. different numbers of nodes, different types of nodes, or
|
|
|
|
different pod resource requirements), this might prevent perfectly
|
|
|
|
even spreading of your pods across zones. If desired, you can use
|
|
|
|
homogenous zones (same number and types of nodes) to reduce the
|
|
|
|
probability of unequal spreading.
|
|
|
|
|
|
|
|
When persistent volumes are created, the `PersistentVolumeLabel`
|
|
|
|
admission controller automatically adds zone labels to them. The scheduler (via the
|
|
|
|
`VolumeZonePredicate` predicate) will then ensure that pods that claim a
|
|
|
|
given volume are only placed into the same zone as that volume, as volumes
|
|
|
|
cannot be attached across zones.
|
|
|
|
|
|
|
|
## Limitations
|
|
|
|
|
|
|
|
There are some important limitations of the multizone support:
|
|
|
|
|
|
|
|
* We assume that the different zones are located close to each other in the
|
|
|
|
network, so we don't perform any zone-aware routing. In particular, traffic
|
|
|
|
that goes via services might cross zones (even if pods in some pods backing that service
|
|
|
|
exist in the same zone as the client), and this may incur additional latency and cost.
|
|
|
|
|
|
|
|
* Volume zone-affinity will only work with a `PersistentVolume`, and will not
|
|
|
|
work if you directly specify an EBS volume in the pod spec (for example).
|
|
|
|
|
|
|
|
* Clusters cannot span clouds or regions (this functionality will require full
|
|
|
|
federation support).
|
|
|
|
|
|
|
|
* Although your nodes are in multiple zones, kube-up currently builds
|
|
|
|
a single master node by default. While services are highly
|
|
|
|
available and can tolerate the loss of a zone, the control plane is
|
|
|
|
located in a single zone. Users that want a highly available control
|
|
|
|
plane should follow the [high availability](/docs/admin/high-availability) instructions.
|
|
|
|
|
|
|
|
|
|
|
|
## Walkthough
|
|
|
|
|
|
|
|
We're now going to walk through setting up and using a multi-zone
|
|
|
|
cluster on both GCE & AWS. To do so, you bring up a full cluster
|
2016-06-09 22:08:48 +00:00
|
|
|
(specifying `MULTIZONE=true`), and then you add nodes in additional zones
|
2016-03-16 05:18:42 +00:00
|
|
|
by running `kube-up` again (specifying `KUBE_USE_EXISTING_MASTER=true`).
|
|
|
|
|
|
|
|
### Bringing up your cluster
|
|
|
|
|
|
|
|
Create the cluster as normal, but pass MULTIZONE to tell the cluster to manage multiple zones; creating nodes in us-central1-a.
|
|
|
|
|
|
|
|
GCE:
|
|
|
|
|
|
|
|
```shell
|
2016-06-09 22:08:48 +00:00
|
|
|
curl -sS https://get.k8s.io | MULTIZONE=true KUBERNETES_PROVIDER=gce KUBE_GCE_ZONE=us-central1-a NUM_NODES=3 bash
|
2016-03-16 05:18:42 +00:00
|
|
|
```
|
|
|
|
|
|
|
|
AWS:
|
|
|
|
|
|
|
|
```shell
|
2016-06-09 22:08:48 +00:00
|
|
|
curl -sS https://get.k8s.io | MULTIZONE=true KUBERNETES_PROVIDER=aws KUBE_AWS_ZONE=us-west-2a NUM_NODES=3 bash
|
2016-03-16 05:18:42 +00:00
|
|
|
```
|
|
|
|
|
|
|
|
This step brings up a cluster as normal, still running in a single zone
|
2016-06-09 22:08:48 +00:00
|
|
|
(but `MULTIZONE=true` has enabled multi-zone capabilities).
|
2016-03-16 05:18:42 +00:00
|
|
|
|
|
|
|
### Nodes are labeled
|
|
|
|
|
|
|
|
View the nodes; you can see that they are labeled with zone information.
|
|
|
|
They are all in `us-central1-a` (GCE) or `us-west-2a` (AWS) so far. The
|
|
|
|
labels are `failure-domain.beta.kubernetes.io/region` for the region,
|
|
|
|
and `failure-domain.beta.kubernetes.io/zone` for the zone:
|
|
|
|
|
|
|
|
```shell
|
|
|
|
> kubectl get nodes --show-labels
|
|
|
|
|
|
|
|
|
|
|
|
NAME STATUS AGE LABELS
|
|
|
|
kubernetes-master Ready,SchedulingDisabled 6m beta.kubernetes.io/instance-type=n1-standard-1,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=kubernetes-master
|
|
|
|
kubernetes-minion-87j9 Ready 6m beta.kubernetes.io/instance-type=n1-standard-2,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=kubernetes-minion-87j9
|
|
|
|
kubernetes-minion-9vlv Ready 6m beta.kubernetes.io/instance-type=n1-standard-2,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=kubernetes-minion-9vlv
|
|
|
|
kubernetes-minion-a12q Ready 6m beta.kubernetes.io/instance-type=n1-standard-2,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=kubernetes-minion-a12q
|
|
|
|
```
|
|
|
|
|
|
|
|
### Add more nodes in a second zone
|
|
|
|
|
|
|
|
Let's add another set of nodes to the existing cluster, reusing the
|
|
|
|
existing master, running in a different zone (us-central1-b or us-west-2b).
|
2016-07-10 21:36:55 +00:00
|
|
|
We run kube-up again, but by specifying `KUBE_USE_EXISTING_MASTER=true`
|
2016-03-16 05:18:42 +00:00
|
|
|
kube-up will not create a new master, but will reuse one that was previously
|
|
|
|
created instead.
|
|
|
|
|
|
|
|
GCE:
|
|
|
|
|
|
|
|
```shell
|
2016-06-09 22:08:48 +00:00
|
|
|
KUBE_USE_EXISTING_MASTER=true MULTIZONE=true KUBERNETES_PROVIDER=gce KUBE_GCE_ZONE=us-central1-b NUM_NODES=3 kubernetes/cluster/kube-up.sh
|
2016-03-16 05:18:42 +00:00
|
|
|
```
|
|
|
|
|
|
|
|
On AWS we also need to specify the network CIDR for the additional
|
|
|
|
subnet, along with the master internal IP address:
|
|
|
|
|
|
|
|
```shell
|
2016-06-09 22:08:48 +00:00
|
|
|
KUBE_USE_EXISTING_MASTER=true MULTIZONE=true KUBERNETES_PROVIDER=aws KUBE_AWS_ZONE=us-west-2b NUM_NODES=3 KUBE_SUBNET_CIDR=172.20.1.0/24 MASTER_INTERNAL_IP=172.20.0.9 kubernetes/cluster/kube-up.sh
|
2016-03-16 05:18:42 +00:00
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
View the nodes again; 3 more nodes should have launched and be tagged
|
|
|
|
in us-central1-b:
|
|
|
|
|
|
|
|
```shell
|
|
|
|
> kubectl get nodes --show-labels
|
|
|
|
|
|
|
|
NAME STATUS AGE LABELS
|
|
|
|
kubernetes-master Ready,SchedulingDisabled 16m beta.kubernetes.io/instance-type=n1-standard-1,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=kubernetes-master
|
|
|
|
kubernetes-minion-281d Ready 2m beta.kubernetes.io/instance-type=n1-standard-2,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-b,kubernetes.io/hostname=kubernetes-minion-281d
|
|
|
|
kubernetes-minion-87j9 Ready 16m beta.kubernetes.io/instance-type=n1-standard-2,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=kubernetes-minion-87j9
|
|
|
|
kubernetes-minion-9vlv Ready 16m beta.kubernetes.io/instance-type=n1-standard-2,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=kubernetes-minion-9vlv
|
|
|
|
kubernetes-minion-a12q Ready 17m beta.kubernetes.io/instance-type=n1-standard-2,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=kubernetes-minion-a12q
|
|
|
|
kubernetes-minion-pp2f Ready 2m beta.kubernetes.io/instance-type=n1-standard-2,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-b,kubernetes.io/hostname=kubernetes-minion-pp2f
|
|
|
|
kubernetes-minion-wf8i Ready 2m beta.kubernetes.io/instance-type=n1-standard-2,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-b,kubernetes.io/hostname=kubernetes-minion-wf8i
|
|
|
|
```
|
|
|
|
|
|
|
|
### Volume affinity
|
|
|
|
|
|
|
|
Create a volume (only PersistentVolumes are supported for zone
|
|
|
|
affinity), using the new dynamic volume creation:
|
|
|
|
|
|
|
|
```json
|
|
|
|
kubectl create -f - <<EOF
|
|
|
|
{
|
|
|
|
"kind": "PersistentVolumeClaim",
|
|
|
|
"apiVersion": "v1",
|
|
|
|
"metadata": {
|
|
|
|
"name": "claim1",
|
|
|
|
"annotations": {
|
|
|
|
"volume.alpha.kubernetes.io/storage-class": "foo"
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"spec": {
|
|
|
|
"accessModes": [
|
|
|
|
"ReadWriteOnce"
|
|
|
|
],
|
|
|
|
"resources": {
|
|
|
|
"requests": {
|
|
|
|
"storage": "5Gi"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
EOF
|
|
|
|
```
|
|
|
|
|
|
|
|
The PV is also labeled with the zone & region it was created in. For
|
|
|
|
version 1.2, dynamic persistent volumes are always created in the zone
|
2016-07-10 21:36:55 +00:00
|
|
|
of the cluster master (here us-central1-a / us-west-2a); this will
|
2016-03-16 05:18:42 +00:00
|
|
|
be improved in a future version (issue [#23330](https://github.com/kubernetes/kubernetes/issues/23330).)
|
|
|
|
|
|
|
|
```shell
|
|
|
|
> kubectl get pv --show-labels
|
|
|
|
NAME CAPACITY ACCESSMODES STATUS CLAIM REASON AGE LABELS
|
|
|
|
pv-gce-mj4gm 5Gi RWO Bound default/claim1 46s failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a
|
|
|
|
```
|
|
|
|
|
|
|
|
So now we will create a pod that uses the persistent volume claim.
|
|
|
|
Because GCE PDs / AWS EBS volumes cannot be attached across zones,
|
|
|
|
this means that this pod can only be created in the same zone as the volume:
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
kubectl create -f - <<EOF
|
|
|
|
kind: Pod
|
|
|
|
apiVersion: v1
|
|
|
|
metadata:
|
|
|
|
name: mypod
|
|
|
|
spec:
|
|
|
|
containers:
|
|
|
|
- name: myfrontend
|
|
|
|
image: nginx
|
|
|
|
volumeMounts:
|
|
|
|
- mountPath: "/var/www/html"
|
|
|
|
name: mypd
|
|
|
|
volumes:
|
|
|
|
- name: mypd
|
|
|
|
persistentVolumeClaim:
|
|
|
|
claimName: claim1
|
|
|
|
EOF
|
|
|
|
```
|
|
|
|
|
|
|
|
Note that the pod was automatically created in the same zone as the volume, as
|
|
|
|
cross-zone attachments are not generally permitted by cloud providers:
|
|
|
|
|
|
|
|
```shell
|
|
|
|
> kubectl describe pod mypod | grep Node
|
|
|
|
Node: kubernetes-minion-9vlv/10.240.0.5
|
|
|
|
> kubectl get node kubernetes-minion-9vlv --show-labels
|
|
|
|
NAME STATUS AGE LABELS
|
|
|
|
kubernetes-minion-9vlv Ready 22m beta.kubernetes.io/instance-type=n1-standard-2,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=kubernetes-minion-9vlv
|
|
|
|
```
|
|
|
|
|
|
|
|
### Pods are spread across zones
|
|
|
|
|
|
|
|
Pods in a replication controller or service are automatically spread
|
|
|
|
across zones. First, let's launch more nodes in a third zone:
|
|
|
|
|
|
|
|
GCE:
|
|
|
|
|
|
|
|
```shell
|
2016-06-09 22:08:48 +00:00
|
|
|
KUBE_USE_EXISTING_MASTER=true MULTIZONE=true KUBERNETES_PROVIDER=gce KUBE_GCE_ZONE=us-central1-f NUM_NODES=3 kubernetes/cluster/kube-up.sh
|
2016-03-16 05:18:42 +00:00
|
|
|
```
|
|
|
|
|
|
|
|
AWS:
|
|
|
|
|
|
|
|
```shell
|
2016-06-09 22:08:48 +00:00
|
|
|
KUBE_USE_EXISTING_MASTER=true MULTIZONE=true KUBERNETES_PROVIDER=aws KUBE_AWS_ZONE=us-west-2c NUM_NODES=3 KUBE_SUBNET_CIDR=172.20.2.0/24 MASTER_INTERNAL_IP=172.20.0.9 kubernetes/cluster/kube-up.sh
|
2016-03-16 05:18:42 +00:00
|
|
|
```
|
|
|
|
|
|
|
|
Verify that you now have nodes in 3 zones:
|
|
|
|
|
|
|
|
```shell
|
|
|
|
kubectl get nodes --show-labels
|
|
|
|
```
|
|
|
|
|
|
|
|
Create the guestbook-go example, which includes an RC of size 3, running a simple web app:
|
|
|
|
|
|
|
|
```shell
|
|
|
|
find kubernetes/examples/guestbook-go/ -name '*.json' | xargs -I {} kubectl create -f {}
|
|
|
|
```
|
|
|
|
|
|
|
|
The pods should be spread across all 3 zones:
|
|
|
|
|
|
|
|
```shell
|
|
|
|
> kubectl describe pod -l app=guestbook | grep Node
|
|
|
|
Node: kubernetes-minion-9vlv/10.240.0.5
|
|
|
|
Node: kubernetes-minion-281d/10.240.0.8
|
|
|
|
Node: kubernetes-minion-olsh/10.240.0.11
|
|
|
|
|
|
|
|
> kubectl get node kubernetes-minion-9vlv kubernetes-minion-281d kubernetes-minion-olsh --show-labels
|
|
|
|
NAME STATUS AGE LABELS
|
|
|
|
kubernetes-minion-9vlv Ready 34m beta.kubernetes.io/instance-type=n1-standard-2,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=kubernetes-minion-9vlv
|
|
|
|
kubernetes-minion-281d Ready 20m beta.kubernetes.io/instance-type=n1-standard-2,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-b,kubernetes.io/hostname=kubernetes-minion-281d
|
|
|
|
kubernetes-minion-olsh Ready 3m beta.kubernetes.io/instance-type=n1-standard-2,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-f,kubernetes.io/hostname=kubernetes-minion-olsh
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
Load-balancers span all zones in a cluster; the guestbook-go example
|
|
|
|
includes an example load-balanced service:
|
|
|
|
|
|
|
|
```shell
|
|
|
|
> kubectl describe service guestbook | grep LoadBalancer.Ingress
|
|
|
|
LoadBalancer Ingress: 130.211.126.21
|
|
|
|
|
|
|
|
> ip=130.211.126.21
|
|
|
|
|
|
|
|
> curl -s http://${ip}:3000/env | grep HOSTNAME
|
|
|
|
"HOSTNAME": "guestbook-44sep",
|
|
|
|
|
|
|
|
> (for i in `seq 20`; do curl -s http://${ip}:3000/env | grep HOSTNAME; done) | sort | uniq
|
|
|
|
"HOSTNAME": "guestbook-44sep",
|
|
|
|
"HOSTNAME": "guestbook-hum5n",
|
|
|
|
"HOSTNAME": "guestbook-ppm40",
|
|
|
|
```
|
|
|
|
|
|
|
|
The load balancer correctly targets all the pods, even though they are in multiple zones.
|
|
|
|
|
|
|
|
### Shutting down the cluster
|
|
|
|
|
|
|
|
When you're done, clean up:
|
|
|
|
|
|
|
|
GCE:
|
|
|
|
|
|
|
|
```shell
|
|
|
|
KUBERNETES_PROVIDER=gce KUBE_USE_EXISTING_MASTER=true KUBE_GCE_ZONE=us-central1-f kubernetes/cluster/kube-down.sh
|
|
|
|
KUBERNETES_PROVIDER=gce KUBE_USE_EXISTING_MASTER=true KUBE_GCE_ZONE=us-central1-b kubernetes/cluster/kube-down.sh
|
|
|
|
KUBERNETES_PROVIDER=gce KUBE_GCE_ZONE=us-central1-a kubernetes/cluster/kube-down.sh
|
|
|
|
```
|
|
|
|
|
|
|
|
AWS:
|
|
|
|
|
|
|
|
```shell
|
|
|
|
KUBERNETES_PROVIDER=aws KUBE_USE_EXISTING_MASTER=true KUBE_AWS_ZONE=us-west-2c kubernetes/cluster/kube-down.sh
|
|
|
|
KUBERNETES_PROVIDER=aws KUBE_USE_EXISTING_MASTER=true KUBE_AWS_ZONE=us-west-2b kubernetes/cluster/kube-down.sh
|
|
|
|
KUBERNETES_PROVIDER=aws KUBE_AWS_ZONE=us-west-2a kubernetes/cluster/kube-down.sh
|
|
|
|
```
|