Merge pull request #1170 from piosz/rescheduler

Added Rescheduler documentation
2016-09-16 11:08:33 -07:00 · 2016-09-16 11:08:33 -07:00 · 56de4ad1e5
parent 6dd2eae873 ebd9318f37
commit 56de4ad1e5
1 changed files with 58 additions and 0 deletions
--- a/docs/admin/rescheduler.md
+++ b/docs/admin/rescheduler.md
@ -0,0 +1,58 @@
+---
+assignees:
+- davidopp
+- filipg
+- piosz
+
+---
+
+* TOC
+{:toc}
+
+# "Guaranteed" scheduling of critical add-on pods
+
+## Critical add-ons
+
+In addition to Kubernetes core components like api-server, scheduler, controller-manager running on a master machine
+there are a number of add-ons which, for various reasons, must run on a regular cluster node (rather than the Kubernetes master).
+Some of these add-ons are critical to a fully functional cluster, such as Heapster, DNS, and UI.
+A cluster may stop working properly if a critical add-on is evicted (either manually or as a side effect of another operation like upgrade)
+and becomes pending (for example when the cluster is highly utilized and either there are other pending pods that schedule into the space
+vacated by the evicted critical add-on pod or the amount of resources available on the node changed for some other reason).
+
+## Rescheduler: guaranteed scheduling of critical add-ons
+
+Rescheduler ensures that critical add-ons are always scheduled
+(assuming the cluster has enough resources to run the critical add-on pods in the absence of regular pods).
+If the scheduler determines that no node has enough free resources to run the critical add-on pod
+given the pods that are already running in the cluster
+(indicated by critical add-on pod's pod condition PodScheduled set to false, the reason set to Unschedulable)
+the rescheduler tries to free up space for the add-on by evicting some pods; then the scheduler will schedule the add-on pod.
+
+To avoid situation when another pod is scheduled into the space prepared for the critical add-on,
+the chosen node gets a temporary taint “CriticalAddonsOnly” before the eviction(s)
+(see [more details](https://github.com/kubernetes/kubernetes/blob/master/docs/design/taint-toleration-dedicated.md)).
+Each critical add-on has to tolerate it,
+the other pods shouldn't tolerate the taint. The tain is removed once the add-on is successfully scheduled.
+
+*Warning:* currently there is no guarantee which node is chosen and which pods are being killed
+in order to schedule crical pod, so if rescheduler is enabled you pods might be occasionally
+killed for this purpose.
+
+## Config
+
+Rescheduler doesn't have any user facing configuration (component config) or API.
+It's enabled by default. It can be disabled:
+* during cluster setup by setting `ENABLE_RESCHEDULER` flag to `false`
+* on running cluster by deleting its manifest from master node
+(default path `/etc/kubernetes/manifests/rescheduler.manifest`)
+
+### Marking add-on as critical
+
+To be critical an add-on has to run in `kube-system` namespace (cofigurable via flag)
+and have the following annotations specified:
+* `scheduler.alpha.kubernetes.io/critical-pod` set to empty string
+* `scheduler.alpha.kubernetes.io/tolerations` set to `[{"key":"CriticalAddonsOnly", "operator":"Exists"}]`
+
+The first one marks a pod a critical. The second one is required by Rescheduler algorithm.
+