2017-03-31 01:16:57 +00:00
---
2018-02-18 19:29:37 +00:00
reviewers:
2017-03-31 01:16:57 +00:00
- lavalamp
- thockin
title: Cluster Management
2018-06-22 18:20:04 +00:00
content_template: templates/concept
2017-03-31 01:16:57 +00:00
---
2018-06-22 18:20:04 +00:00
{{% capture overview %}}
2017-03-31 01:16:57 +00:00
This document describes several topics related to the lifecycle of a cluster: creating a new cluster,
upgrading your cluster's
master and worker nodes, performing node maintenance (e.g. kernel upgrades), and upgrading the Kubernetes API version of a
running cluster.
2018-06-22 18:20:04 +00:00
{{% /capture %}}
{{< toc > }}
{{% capture body %}}
2017-03-31 01:16:57 +00:00
## Creating and configuring a Cluster
2017-10-10 02:17:59 +00:00
To install Kubernetes on a set of machines, consult one of the existing [Getting Started guides ](/docs/setup/ ) depending on your environment.
2017-03-31 01:16:57 +00:00
## Upgrading a cluster
2017-06-26 22:40:13 +00:00
The current state of cluster upgrades is provider dependent, and some releases may require special care when upgrading. It is recommended that administrators consult both the [release notes ](https://git.k8s.io/kubernetes/CHANGELOG.md ), as well as the version specific upgrade notes prior to upgrading their clusters.
2017-03-31 01:16:57 +00:00
* [Upgrading to 1.6 ](/docs/admin/upgrade-1-6 )
2018-02-13 02:00:50 +00:00
### Upgrading an Azure Kubernetes Service (AKS) cluster
Azure Kubernetes Service enables easy self-service upgrades of the control plane and nodes in your cluster. The process is
currently user-initiated and is described in the [Azure AKS documentation ](https://docs.microsoft.com/en-us/azure/aks/upgrade-cluster ).
2017-03-31 01:16:57 +00:00
### Upgrading Google Compute Engine clusters
Google Compute Engine Open Source (GCE-OSS) support master upgrades by deleting and
recreating the master, while maintaining the same Persistent Disk (PD) to ensure that data is retained across the
upgrade.
Node upgrades for GCE use a [Managed Instance Group ](https://cloud.google.com/compute/docs/instance-groups/ ), each node
is sequentially destroyed and then recreated with new software. Any Pods that are running on that node need to be
controlled by a Replication Controller, or manually re-created after the roll out.
Upgrades on open source Google Compute Engine (GCE) clusters are controlled by the `cluster/gce/upgrade.sh` script.
Get its usage by running `cluster/gce/upgrade.sh -h` .
For example, to upgrade just your master to a specific version (v1.0.2):
```shell
cluster/gce/upgrade.sh -M v1.0.2
```
Alternatively, to upgrade your entire cluster to the latest stable release:
```shell
cluster/gce/upgrade.sh release/stable
```
2017-11-13 20:02:31 +00:00
### Upgrading Google Kubernetes Engine clusters
2017-03-31 01:16:57 +00:00
2017-11-13 20:02:31 +00:00
Google Kubernetes Engine automatically updates master components (e.g. `kube-apiserver` , `kube-scheduler` ) to the latest version. It also handles upgrading the operating system and other components that the master runs on.
2017-03-31 01:16:57 +00:00
2017-11-16 12:05:32 +00:00
The node upgrade process is user-initiated and is described in the [Google Kubernetes Engine documentation ](https://cloud.google.com/kubernetes-engine/docs/clusters/upgrade ).
2017-03-31 01:16:57 +00:00
### Upgrading clusters on other platforms
Different providers, and tools, will manage upgrades differently. It is recommended that you consult their main documentation regarding upgrades.
* [kops ](https://github.com/kubernetes/kops )
2017-06-25 20:46:02 +00:00
* [kubespray ](https://github.com/kubernetes-incubator/kubespray )
2017-03-31 01:16:57 +00:00
* [CoreOS Tectonic ](https://coreos.com/tectonic/docs/latest/admin/upgrade.html )
* ...
## Resizing a cluster
If your cluster runs short on resources you can easily add more machines to it if your cluster is running in [Node self-registration mode ](/docs/admin/node/#self-registration-of-nodes ).
2017-11-13 20:02:31 +00:00
If you're using GCE or Google Kubernetes Engine it's done by resizing Instance Group managing your Nodes. It can be accomplished by modifying number of instances on `Compute > Compute Engine > Instance groups > your group > Edit group` [Google Cloud Console page ](https://console.developers.google.com ) or using gcloud CLI:
2017-03-31 01:16:57 +00:00
```shell
2017-08-25 19:59:22 +00:00
gcloud compute instance-groups managed resize kubernetes-minion-group --size=42 --zone=$ZONE
2017-03-31 01:16:57 +00:00
```
Instance Group will take care of putting appropriate image on new machines and start them, while Kubelet will register its Node with API server to make it available for scheduling. If you scale the instance group down, system will randomly choose Nodes to kill.
In other environments you may need to configure the machine yourself and tell the Kubelet on which machine API server is running.
2018-03-15 16:07:26 +00:00
### Resizing an Azure Kubernetes Service (AKS) cluster
Azure Kubernetes Service enables user-initiated resizing of the cluster from either the CLI or the Azure Portal and is described in the [Azure AKS documentation ](https://docs.microsoft.com/en-us/azure/aks/scale-cluster ).
2017-03-31 01:16:57 +00:00
### Cluster autoscaling
2017-11-13 20:02:31 +00:00
If you are using GCE or Google Kubernetes Engine, you can configure your cluster so that it is automatically rescaled based on
2017-03-31 01:16:57 +00:00
pod needs.
2017-04-19 17:56:47 +00:00
As described in [Compute Resource ](/docs/concepts/configuration/manage-compute-resources-container/ ), users can reserve how much CPU and memory is allocated to pods.
2017-03-31 01:16:57 +00:00
This information is used by the Kubernetes scheduler to find a place to run the pod. If there is
no node that has enough free capacity (or doesn't match other pod requirements) then the pod has
to wait until some pods are terminated or a new node is added.
Cluster autoscaler looks for the pods that cannot be scheduled and checks if adding a new node, similar
to the other in the cluster, would help. If yes, then it resizes the cluster to accommodate the waiting pods.
2017-10-02 22:44:45 +00:00
Cluster autoscaler also scales down the cluster if it notices that one or more nodes are not needed anymore for
2017-03-31 01:16:57 +00:00
an extended period of time (10min but it may change in the future).
2017-11-13 20:02:31 +00:00
Cluster autoscaler is configured per instance group (GCE) or node pool (Google Kubernetes Engine).
2017-03-31 01:16:57 +00:00
If you are using GCE then you can either enable it while creating a cluster with kube-up.sh script.
To configure cluster autoscaler you have to set three environment variables:
* `KUBE_ENABLE_CLUSTER_AUTOSCALER` - it enables cluster autoscaler if set to true.
* `KUBE_AUTOSCALER_MIN_NODES` - minimum number of nodes in the cluster.
* `KUBE_AUTOSCALER_MAX_NODES` - maximum number of nodes in the cluster.
Example:
```shell
KUBE_ENABLE_CLUSTER_AUTOSCALER=true KUBE_AUTOSCALER_MIN_NODES=3 KUBE_AUTOSCALER_MAX_NODES=10 NUM_NODES=5 ./cluster/kube-up.sh
```
2017-11-13 20:02:31 +00:00
On Google Kubernetes Engine you configure cluster autoscaler either on cluster creation or update or when creating a particular node pool
2017-03-31 01:16:57 +00:00
(which you want to be autoscaled) by passing flags `--enable-autoscaling` `--min-nodes` and `--max-nodes`
to the corresponding `gcloud` commands.
Examples:
```shell
gcloud container clusters create mytestcluster --zone=us-central1-b --enable-autoscaling --min-nodes=3 --max-nodes=10 --num-nodes=5
```
```shell
gcloud container clusters update mytestcluster --enable-autoscaling --min-nodes=1 --max-nodes=15
```
**Cluster autoscaler expects that nodes have not been manually modified (e.g. by adding labels via kubectl) as those properties would not be propagated to the new nodes within the same instance group.**
2018-01-26 02:11:30 +00:00
For more details about how the cluster autoscaler decides whether, when and how
to scale a cluster, please refer to the [FAQ ](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md )
documentation from the autoscaler project.
2017-03-31 01:16:57 +00:00
## Maintenance on a Node
If you need to reboot a node (such as for a kernel upgrade, libc upgrade, hardware repair, etc.), and the downtime is
brief, then when the Kubelet restarts, it will attempt to restart the pods scheduled to it. If the reboot takes longer
(the default time is 5 minutes, controlled by `--pod-eviction-timeout` on the controller-manager),
then the node controller will terminate the pods that are bound to the unavailable node. If there is a corresponding
replica set (or replication controller), then a new copy of the pod will be started on a different node. So, in the case where all
pods are replicated, upgrades can be done without special coordination, assuming that not all nodes will go down at the same time.
If you want more control over the upgrading process, you may use the following workflow:
Use `kubectl drain` to gracefully terminate all pods on the node while marking the node as unschedulable:
```shell
kubectl drain $NODENAME
```
This keeps new pods from landing on the node while you are trying to get them off.
For pods with a replica set, the pod will be replaced by a new pod which will be scheduled to a new node. Additionally, if the pod is part of a service, then clients will automatically be redirected to the new pod.
For pods with no replica set, you need to bring up a new copy of the pod, and assuming it is not part of a service, redirect clients to it.
Perform maintenance work on the node.
Make the node schedulable again:
```shell
kubectl uncordon $NODENAME
```
If you deleted the node's VM instance and created a new one, then a new schedulable node resource will
be created automatically (if you're using a cloud provider that supports
node discovery; currently this is only Google Compute Engine, not including CoreOS on Google Compute Engine using kube-register). See [Node ](/docs/admin/node ) for more details.
## Advanced Topics
### Upgrading to a different API version
When a new API version is released, you may need to upgrade a cluster to support the new API version (e.g. switching from 'v1' to 'v2' when 'v2' is launched).
This is an infrequent event, but it requires careful management. There is a sequence of steps to upgrade to a new API version.
1. Turn on the new API version.
1. Upgrade the cluster's storage to use the new version.
1. Upgrade all config files. Identify users of the old API version endpoints.
1. Update existing objects in the storage to new version by running `cluster/update-storage-objects.sh` .
1. Turn off the old API version.
### Turn on or off an API version for your cluster
Specific API versions can be turned on or off by passing `--runtime-config=api/<version>` flag while bringing up the API server. For example: to turn off v1 API, pass `--runtime-config=api/v1=false` .
runtime-config also supports 2 special keys: api/all and api/legacy to control all and legacy APIs respectively.
For example, for turning off all API versions except v1, pass `--runtime-config=api/all=false,api/v1=true` .
For the purposes of these flags, _legacy_ APIs are those APIs which have been explicitly deprecated (e.g. `v1beta3` ).
### Switching your cluster's storage API version
The objects that are stored to disk for a cluster's internal representation of the Kubernetes resources active in the cluster are written using a particular version of the API.
When the supported API changes, these objects may need to be rewritten in the newer API. Failure to do this will eventually result in resources that are no longer decodable or usable
by the Kubernetes API server.
### Switching your config files to a new API version
You can use `kubectl convert` command to convert config files between different API versions.
```shell
kubectl convert -f pod.yaml --output-version v1
```
Release docs for Kubernetes 1.11 (#9171)
* Seperate priority and preemption (#8144)
* Doc about PID pressure condition. (#8211)
* Doc about PID pressure condition.
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
* "so" -> "too"
* Update version selector for 1.11
* StorageObjectInUseProtection is GA (#8291)
* Feature gate: StorageObjectInUseProtection is GA
Update feature gate reference for 1.11
* Trivial commit to re-trigger Netlify
* CRIContainerLogRotation is Beta in 1.11 (#8665)
* Seperate priority and preemption (#8144)
* CRIContainerLogRotation is Beta in 1.11
xref: kubernetes/kubernetes#64046
* Bring StorageObjectInUseProtection feature to GA (#8159)
* StorageObjectInUseProtection is GA (#8291)
* Feature gate: StorageObjectInUseProtection is GA
Update feature gate reference for 1.11
* Trivial commit to re-trigger Netlify
* Bring StorageObjectInUseProtection feature to GA
StorageObjectInUseProtection is Beta in K8s 1.10.
It's brought to GA in K8s 1.11.
* Fixed typo and added feature state tags.
* Remove KUBE_API_VERSIONS doc (#8292)
The support to the KUBER_API_VERSIONS environment variable is completely
dropped (no deprecation). This PR removes the related doc in
release-1.11.
xref: kubernetes/kubernetes#63165
* Remove InitialResources from admission controllers (#8293)
The feature (was experimental) is dropped in 1.11.
xref: kubernetes/kubernetes#58784
* Remove docs related to in-tree support to GPU (#8294)
* Remove docs related to in-tree support to GPU
The in-tree support to GPU is completely removed in release 1.11.
This PR removes the related docs in release-1.11 branch.
xref: kubernetes/kubernetes#61498
* Update content updated by PR to Hugo syntax
Signed-off-by: Misty Stanley-Jones <mistyhacks@google.com>
* Update the doc about extra volume in kubeadm config (#8453)
Signed-off-by: Xianglin Gao <xianglin.gxl@alibaba-inc.com>
* Update CRD Subresources for 1.11 (#8519)
* coredns: update notes in administer-cluster/coredns.md (#8697)
CoreDNS is installed by default in 1.11.
Add notes on how to install kube-dns instead.
Update notes about CoreDNS->CoreDNS upgrades as in 1.11
the Corefile is retained.
Add example on upgrading from kube-dns to CoreDNS.
* kubeadm-alpha: CoreDNS related changes (#8727)
Update note about CoreDNS feature gate.
This change also updates a tab as a kubeadm sub-command
will change.
It looks for a new generated file:
generated/kubeadm_alpha_phase_addon_coredns.md
instead of:
generated/kubeadm_alpha_phase_addon_kube-dns.md
* Update cloud controller manager docs to beta 1.11 (#8756)
* Update cloud controller manager docs to beta 1.11
* Use Hugo shortcode for feature state
* kubeadm-upgrade: include new command `kubeadm upgrade diff` (#8617)
Also:
- Include note that this was added in 1.11.
- Modify the note about upgrade guidance.
* independent: update CoreDNS mentions for kubeadm (#8753)
Give CoreDNS instead of kube-dns examples in:
- docs/setup/independent/create-cluster-kubeadm.md
- docs/setup/independent/troubleshooting-kubeadm.md
* update 1.11 --server-print info (#8870)
* update 1.11 --server-print info
* Copyedit
* Mark ExpandPersistentVolumes feature to beta (#8778)
* Update version selector for 1.11
* Mark ExpandPersistentVolumes Beta
xref: kubernetes/kubernetes#64288
* fix shortcode, add placeholder files to fix deploy failures (#8874)
* declare ipvs ga (#8850)
* kubeadm: update info about CoreDNS in kubeadm-init.md (#8728)
Add info to install kube-dns instead of CoreDNS, as CoreDNS
is the default DNS server in 1.11.
Add notes that kubeadm config images can be used to list and pull
the required images in 1.11.
* kubeadm: update implementation-details.md about CoreDNS (#8829)
- Replace examples from kube-dns to CoreDNS
- Add notes about the CoreDNS feature gate status in 1.11
- Add note that the service name for CoreDNS is also
called `kube-dns`
* Update block device support for 1.11 (#8895)
* Update block device support for 1.11
* Copyedits
* Fix typo 'fiber channel' (#8957)
Signed-off-by: Misty Stanley-Jones <mistyhacks@google.com>
* kubeadm-upgrade: add the 'node [config]' sub-command (#8960)
- Add includes for the generated pages
- Include placeholder generated pages
* kubeadm-init: update the example for the MasterConfiguration (#8958)
- include godocs link for MasterConfiguration
- include example MasterConfiguration
- add note that `kubeadm config print-default` can be used
* kubeadm-config: include new commands (#8862)
Add notes and includes for these new commands in 1.11:
- kubeadm config print-default
- kubeadm config migrate
- kubeadm config images list
- kubeadm config images pull
Include placeholder generated files for the above.
* administer-cluster/coredns: include more changes (#8985)
It was requested that for this page a couple of methods
should be outlined:
- manual installation for CoreDNS explained at the Kubernetes
section of the GitHub project for CoreDNS
- installation and upgrade via kubeadm
Make the above changes and also add a section "About CoreDNS".
This commit also lowercases a section title.
* Update CRD subresources doc for 1.11 (#8918)
* Add docs for volume expansion and online resizing (#8896)
* Add docs for volume expansion going beta
* Copyedit
* Address feedback
* Update exec plugin docs with TLS credentials (#8826)
* Update exec plugin docs with TLS credentials
kubernetes/kubernetes#61803 implements TLS client credential support for
1.11.
* Copyedit
* More copyedits for clarification
* Additional copyedit
* Change token->credential
* NodeRestriction admission prevents kubelet taint removal (#8911)
* dns-custom-namerserver: break down the page into mutliple sections (#8900)
* dns-custom-namerserver: break down the page into mutliple sections
This page is currently about kube-dns and is a bit outdated.
Introduce the heading `# Customizing kube-dns`.
Introduce a separate section about CoreDNS.
* Copyedits, fix headings for customizing DNS
Hey Lubomir,
I coypedited pretty heavily because this workflow is so much easier for docs and because I'm trying to help improve everything touching kubeadm as much as possible.
But there's one outstanding issue wrt headings and intro content: you can't add a heading 1 to a topic to do what you wanted to do. The page title in the front matter is rendered as a heading 1 and everything else has to start at heading 2. (We still need to doc this better in the docs contributing content, I know.)
Instead, I think we need to rewrite the top-of-page intro content to explain better the relationship between kube-dns and CoreDNS. I'm happy to write something, but I thought I'd push this commit first so you can see what I'm doing.
Hope it's all clear -- ping here or on Slack with any questions ~ Jennifer
* Interim fix for talking about CoreDNS
* Fix CoreDNS details
* PSP readOnly hostPath (#8898)
* Add documentation for crictl (#8880)
* Add documentation for crictl
* Copyedit
Signed-off-by: Misty Stanley-Jones <mistyhacks@google.com>
* Final copyedit
* VolumeSubpathEnvExpansion alpha feature (#8835)
* Note that Heapster is deprecated (#8827)
* Note that Heapster is deprecated
This notes that Heapster is deprecated, and migrates the relevant
docs to talk about metrics-server or other solutions by default.
* Copyedits and improvements
Signed-off-by: Misty Stanley-Jones <mistyhacks@google.com>
* Address feedback
* fix shortcode to troubleshoot deploy (#9057)
* update dynamic kubelet config docs for v1.11 (#8766)
* update dynamic kubelet config docs for v1.11
* Substantial copyedit
* Address feedback
* Reference doc for kubeadm (release-1.11) (#9044)
* Reference doc for kubeadm (release-1.11)
* fix shortcode to troubleshoot deploy (#9057)
* Reference doc for kube-components (release-1.11) (#9045)
* Reference doc for kube-components (release-1.11)
* Update cloud-controller-manager.md
* fix shortcode to troubleshoot deploy (#9057)
* Documentation on lowercasing kubeadm init apiserver SANs (#9059)
* Documentation on lowercasing kubeadm init apiserver SANs
* fix shortcode to troubleshoot deploy (#9057)
* Clarification in dynamic Kubelet config doc (#9061)
* Promote sysctls to Beta (#8804)
* Promote sysctls to Beta
* Copyedits
Signed-off-by: Misty Stanley-Jones <mistyhacks@google.com>
* Review comments
* Address feedback
* More feedback
* kubectl reference docs for 1.11 (#9080)
* Update Kubernetes API 1.11 ref docs (#8977)
* Update v1alpha1 to v1beta1.
* Adjust left nav for 1.11 ref docs.
* Trim list of old ref docs.
* Update Federation API ref docs for 1.11. (#9064)
* Update Federation API ref docs for 1.11.
* Add titles.
* Update definitions.html
* CRD versioning Public Documentation (#8834)
* CRD versioning Public Documentation
* Copyedit
Signed-off-by: Misty Stanley-Jones <mistyhacks@google.com>
* Address feedback
* More rewrites
* Address feedback
* Update main CRD page in light of versioning
* Reorg CRD docs
* Further reorg
* Tweak title
* CSI documentation update for raw block volume support (#8927)
* CSI documetation update for raw block volume support
* minor edits for "CSI raw block volume support"
Some small grammar and style nits.
* minor CSIBlockVolume edits
* Update kubectl component ref page for 1.11. (#9094)
* Update kubectl component ref page for 1.11.
* Add title. Replace stevepe with username.
* crd versioning doc: fix nits (#9142)
* Update `DynamicKubeletConfig` feature to beta (#9110)
xref: kubernetes/kubernetes#64275
* Documentation for dynamic volume limits based on node type (#8871)
* add cos for storage limits
* Update docs specific for aws and gce
* fix some minor things
* Update storage-limits.md
* Add k8s version to feature-state shortcode
* The Doc update for ScheduleDaemonSetPods (#8842)
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
* Update docs related to PersistentVolumeLabel admission control (#9109)
The said admission controller is disabled by default in 1.11
(kubernetes/kubernetes#64326) and scheduled to be removed in future
release.
* client exec auth: updates for 1.11 (#9154)
* Updates HA kubeadm docs (#9066)
* Updates HA kubeadm docs
Signed-off-by: Chuck Ha <ha.chuck@gmail.com>
* kubeadm HA - Add stacked control plane steps
* ssh instructions and some typos in the bash scripts
Signed-off-by: Chuck Ha <ha.chuck@gmail.com>
* Fix typos and copypasta errors
* Fix rebase issues
* Integrate more changes
Signed-off-by: Chuck Ha <ha.chuck@gmail.com>
* copyedits, layout and formatting fixes
* final copyedits
* Adds a sanity check for load balancer connection
Signed-off-by: Chuck Ha <ha.chuck@gmail.com>
* formatting fixes, copyedits
* fix typos, formatting
* Document the Pod Ready++ feature (#9180)
Closes: #9107
Xref: kubernetes/kubernetes#64057
* Mention 'KubeletPluginsWatcher' feature (#9177)
* Mention 'KubeletPluginsWatcher' feature
This feature is more developers oriented than users oriented, so simply
mention it in the feature gate should be fine.
In future, when the design doc is migrated from Google doc to the
kubernetes/community repo, we can add links to it for users who want to
dig deeper.
Closes: #9108
Xref: kubernetes/kubernetes#63328, kubernetes/kubernetes#64605
* Copyedit
* Amend dynamic volume list docs (#9181)
The dynamic volume list feature has been documented but the feature gate
related was not there yet.
Closes: #9105
* Document for service account projection (#9182)
This adds docs for the service account projection feature.
Xref: kubernetes/kubernetes#63819, kubernetes/community#1973
Closes: #9102
* Update pod priority and preemption user docs (#9172)
* Update pod priority and preemption user docs
* Copyedit
* Documentation on setting node name with Kubeadm (#8925)
* Documentation on setting node name with Kubeadm
* copyedit
* Add kubeadm upgrade docs for 1.11 (#9089)
* Add kubeadm upgrade docs for 1.11
* Initial docs review feedback
* Add 1-11 to outline
* Fix formatting on tab blocks
* Move file to correct location
* Add `kubeadm upgrade node config` step
* Overzealous ediffing
* copyedit, fix lists and headings
* clarify --force flag for fixing bad state
* Get TOML ready for 1.11 release
* Blog post for 1.11 release (#9254)
* Blog post for 1.11 release
* Update 2018-06-26-kubernetes-1.11-release-announcement.md
* Update 2018-06-26-kubernetes-1.11-release-announcement.md
* Update 2018-06-26-kubernetes-1.11-release-announcement.md
2018-06-27 22:26:18 +00:00
For more options, please refer to the usage of [kubectl convert ](/docs/user-guide/kubectl/{{page.version}}/#convert ) command.