Updated HPA user guide and example

2016-03-15 15:28:17 +01:00 · 2016-03-15 15:28:17 +01:00 · 664f570450
parent 3a2026677d
commit 664f570450
3 changed files with 112 additions and 144 deletions
--- a/docs/user-guide/horizontal-pod-autoscaler.md
+++ b/docs/user-guide/horizontal-pod-autoscaler.md
@ -5,9 +5,8 @@ This document describes the current state of Horizontal Pod Autoscaler in Kubern

 ## What is Horizontal Pod Autoscaler?

-Horizontal pod autoscaling allows the number of pods in a replication controller or deployment
-to scale automatically based on observed CPU utilization.
-It is a [beta](/docs/api/)#api-versioning) feature in Kubernetes 1.1.
+Horizontal pod autoscaling allows to automatically scale the number of pods
+in a replication controller, deployment or replica set based on observed CPU utilization.

 The autoscaler is implemented as a Kubernetes API resource and a controller.
 The resource describes behavior of the controller.
@ -33,14 +32,20 @@ Further details of the autoscaling algorithm are given [here](https://github.com
 Autoscaler uses heapster to collect CPU utilization.
 Therefore, it is required to deploy heapster monitoring in your cluster for autoscaling to work.

-Autoscaler accesses corresponding replication controller or deployment by scale sub-resource.
+Autoscaler accesses corresponding replication controller, deployment or replica set by scale sub-resource.
 Scale is an interface which allows to dynamically set the number of replicas and to learn the current state of them.
 More details on scale sub-resource can be found [here](https://github.com/kubernetes/kubernetes/blob/{{page.githubbranch}}/docs/design/horizontal-pod-autoscaler.md#scale-subresource).


 ## API Object

-Horizontal pod autoscaler is a top-level resource in the Kubernetes REST API (currently in [beta](/docs/api/)#api-versioning)).
+Horizontal pod autoscaler is a top-level resource in the Kubernetes REST API.
+In Kubernetes 1.2 HPA was graduated from beta to stable (more details about [api versioning](/docs/api/#api-versioning)) with compatibility between versions.
+The stable version is available in `autoscaling/v1` api group whereas the beta vesion is available in `extensions/v1beta1` api group as before.
+The transition plan is to depracate beta verion of HPA in Kubernetes 1.3 and get it rid off completely in Kubernetes 1.4.
+
+**Warning!** Please have in mind that all Kubernetes components still use HPA in version `extensions/v1beta1` in Kubernetes 1.2.
+
 More details about the API object can be found at
 [HorizontalPodAutoscaler Object](https://github.com/kubernetes/kubernetes/blob/{{page.githubbranch}}/docs/design/horizontal-pod-autoscaler.md#horizontalpodautoscaler-object).

--- a/docs/user-guide/horizontal-pod-autoscaling/hpa-php-apache.yaml
+++ b/docs/user-guide/horizontal-pod-autoscaling/hpa-php-apache.yaml
@ -5,7 +5,7 @@ metadata:
  namespace: default
 spec:
  scaleRef:
-    kind: ReplicationController
+    kind: Deployment
    name: php-apache
    subresource: scale
  minReplicas: 1
--- a/docs/user-guide/horizontal-pod-autoscaling/index.md
+++ b/docs/user-guide/horizontal-pod-autoscaling/index.md
@ -1,75 +1,126 @@
 ---
 ---

-Horizontal pod autoscaling is a [beta](/docs/api/#api-versioning) feature in Kubernetes 1.1.
-It allows the number of pods in a replication controller or deployment to scale automatically based on observed CPU usage.
+Horizontal pod autoscaling allows to automatically scale the number of pods
+in a replication controller, deployment or replica set based on observed CPU utilization.
 In the future also other metrics will be supported.

-In this document we explain how this feature works by walking you through an example of enabling horizontal pod autoscaling with the php-apache server.
+In this document we explain how this feature works by walking you through an example of enabling horizontal pod autoscaling for the php-apache server.

 ## Prerequisites

-This example requires a running Kubernetes cluster and kubectl in the version at least 1.1.
+This example requires a running Kubernetes cluster and kubectl in the version at least 1.2.
 [Heapster](https://github.com/kubernetes/heapster) monitoring needs to be deployed in the cluster
 as horizontal pod autoscaler uses it to collect metrics
 (if you followed [getting started on GCE guide](/docs/getting-started-guides/gce),
 heapster monitoring will be turned-on by default).

-
 ## Step One: Run & expose php-apache server

 To demonstrate horizontal pod autoscaler we will use a custom docker image based on php-apache server.
-The image can be found [here](https://releases.k8s.io/{{page.githubbranch}}/docs/user-guide/horizontal-pod-autoscaling/image).
+The image can be found [here](/docs/user-guide/horizontal-pod-autoscaling/image).
 It defines [index.php](/docs/user-guide/horizontal-pod-autoscaling/image/index.php) page which performs some CPU intensive computations.

-First, we will start a replication controller running the image and expose it as an external service:
-
-<a name="kubectl-run"></a>
+First, we will start a deployment running the image and expose it as a service:

 ```shell
-$ kubectl run php-apache --image=gcr.io/google_containers/hpa-example --requests=cpu=200m
-replicationcontroller "php-apache" created
-
-$ kubectl expose rc php-apache --port=80 --type=LoadBalancer
-service "php-apache" exposed
-```
-
-Now, we will wait some time and verify that both the replication controller and the service were correctly created and are running. We will also determine the IP address of the service:
-
-```shell
-$ kubectl get pods
-NAME               READY     STATUS    RESTARTS   AGE
-php-apache-wa3t1   1/1       Running   0          12m
-
-$ kubectl describe services php-apache | grep "LoadBalancer Ingress"
-LoadBalancer Ingress:	146.148.24.244
-```
-
-We may now check that php-apache server works correctly by calling `curl` with the service's IP:
-
-```shell
-$ curl http://146.148.24.244
-OK!
-```
-
-Please notice that when exposing the service we assumed that our cluster runs on a provider which supports load balancers (e.g.: on GCE).
-If load balancers are not supported (e.g.: on Vagrant), we can expose php-apache service as ``ClusterIP`` and connect to it using the proxy on the master:
-
-```shell
-$ kubectl expose rc php-apache --port=80 --type=ClusterIP
-service "php-apache" exposed
-
-$ kubectl cluster-info | grep master
-Kubernetes master is running at https://146.148.6.215
-
-$ curl -k -u <admin>:<password> https://146.148.6.215/api/v1/proxy/namespaces/default/services/php-apache/
-OK!
+$ kubectl run php-apache --image=gcr.io/google_containers/hpa-example --requests=cpu=200m --expose --port=80
+service "php-apache" created
+deployment "php-apache" created
 ```

 ## Step Two: Create horizontal pod autoscaler

-Now that the server is running, we will create a horizontal pod autoscaler for it.
-To create it, we will use the [hpa-php-apache.yaml](/docs/user-guide/horizontal-pod-autoscaling/hpa-php-apache.yaml) file, which looks like this:
+Now that the server is running, we will create the autoscaler using
+[kubectl autoscale](https://github.com/kubernetes/kubernetes/blob/{{page.githubbranch}}/docs/user-guide/kubectl/kubectl_autoscale.md).
+The following command will create a horizontal pod autoscaler that maintains between 1 and 10 replicas of the Pods
+controlled by the php-apache deployment we created in the first step of these instructions.
+Roughly speaking, the horizontal autoscaler will increase and decrease the number of replicas
+(via the deployment) to maintain an average CPU utilization across all Pods of 50%
+(since each pod requests 200 milli-cores by [kubectl run](#kubectl-run), this means average CPU usage of 100 milli-cores).
+See [here](https://github.com/kubernetes/kubernetes/blob/{{page.githubbranch}}/docs/design/horizontal-pod-autoscaler.md#autoscaling-algorithm) for more details on the algorithm.
+
+```shell
+$ kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
+deployment "php-apache" autoscaled
+```
+
+We may check the current status of autoscaler by running:
+
+```shell
+$ kubectl get hpa
+NAME         REFERENCE                     TARGET    CURRENT   MINPODS   MAXPODS   AGE
+php-apache   Deployment/php-apache/scale   50%       0%        1         10        18s
+
+```
+
+Please note that the current CPU consumption is 0% as we are not sending any requests to the server
+(the ``CURRENT`` column shows the average across all the pods controlled by the corresponding deployment).
+
+## Step Three: Increase load
+
+Now, we will see how the autoscaler reacts on the increased load on the server.
+We will start a container with `busybox` image and an infinite loop of queries to our server inside (please run it in a different terminal):
+
+```shell
+$ kubectl run -i --tty load-generator --image=busybox /bin/sh
+
+Hit enter for command prompt
+
+$ while true; do wget -q -O- http://php-apache.default.svc.cluster.local; done
+```
+
+We may examine, how CPU load was increased by executing (it usually takes 1 minute):
+
+```shell
+$ kubectl get hpa
+NAME         REFERENCE                     TARGET    CURRENT   MINPODS   MAXPODS   AGE
+php-apache   Deployment/php-apache/scale   50%       305%      1         10        3m
+
+```
+
+In the case presented here, it bumped CPU consumption to 305% of the request.
+As a result, the deployment was resized to 7 replicas:
+
+```shell
+$ kubectl get deployment php-apache
+NAME         DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
+php-apache   7         7         7            7           19m
+```
+
+**Warning!** Sometimes it may take few steps to stabilize the number of replicas.
+Since the amount of load is not controlled in any way it may happen that the final number of replicas will
+differ from this example. 
+
+## Step Four: Stop load
+
+We will finish our example by stopping the user load.
+
+In the terminal where we created container with `busybox` image we will terminate
+infinite ``while`` loop by sending `SIGINT` signal,
+which can be done using `<Ctrl> + C` combination.
+
+Then we will verify the result state:
+
+```shell
+$ kubectl get hpa
+NAME         REFERENCE                     TARGET    CURRENT   MINPODS   MAXPODS   AGE
+php-apache   Deployment/php-apache/scale   50%       0%        1         10        11m
+
+$ kubectl get deployment php-apache
+NAME         DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
+php-apache   1         1         1            1           27m
+```
+
+As we see, in the presented case CPU utilization dropped to 0, and the number of replicas dropped to 1.
+
+**Warning!** Sometimes dropping number of replicas may take few steps.
+
+## Appendix: Other possible scenarios
+
+### Creating the autoscaler from a .yaml file
+
+Instead of using `kubectl autoscale` command we can use the [hpa-php-apache.yaml](/docs/user-guide/horizontal-pod-autoscaling/hpa-php-apache.yaml) file, which looks like this:

 ```yaml
 apiVersion: extensions/v1beta1
@ -79,106 +130,18 @@ metadata:
  namespace: default
 spec:
  scaleRef:
-    kind: ReplicationController
+    kind: Deployment
    name: php-apache
-    namespace: default
+    subresource: scale
  minReplicas: 1
  maxReplicas: 10
  cpuUtilization:
    targetPercentage: 50
 ```

-This defines a horizontal pod autoscaler that maintains between 1 and 10 replicas of the Pods
-controlled by the php-apache replication controller we created in the first step of these instructions.
-Roughly speaking, the horizontal autoscaler will increase and decrease the number of replicas
-(via the replication controller) so as to maintain an average CPU utilization across all Pods of 50%
-(since each pod requests 200 milli-cores by [kubectl run](#kubectl-run), this means average CPU utilization of 100 milli-cores).
-See [here](https://github.com/kubernetes/kubernetes/blob/{{page.githubbranch}}/docs/design/horizontal-pod-autoscaler.md#autoscaling-algorithm) for more details on the algorithm.
-
 We will create the autoscaler by executing the following command:

 ```shell
 $ kubectl create -f docs/user-guide/horizontal-pod-autoscaling/hpa-php-apache.yaml
 horizontalpodautoscaler "php-apache" created
 ```
-
-Alternatively, we can create the autoscaler using [kubectl autoscale](https://github.com/kubernetes/kubernetes/blob/{{page.githubbranch}}/docs/user-guide/kubectl/kubectl_autoscale.md).
-The following command will create the equivalent autoscaler as defined in the [hpa-php-apache.yaml](/docs/user-guide/horizontal-pod-autoscaling/hpa-php-apache.yaml) file:
-
-```shell
-$ kubectl autoscale rc php-apache --cpu-percent=50 --min=1 --max=10
-replicationcontroller "php-apache" autoscaled
-```
-
-We may check the current status of autoscaler by running:
-
-```shell
-$ kubectl get hpa
-NAME         REFERENCE                                   TARGET    CURRENT   MINPODS   MAXPODS   AGE
-php-apache   ReplicationController/default/php-apache/   50%       0%        1         10        27s
-```
-
-Please note that the current CPU consumption is 0% as we are not sending any requests to the server
-(the ``CURRENT`` column shows the average across all the pods controlled by the corresponding replication controller).
-
-## Step Three: Increase load
-
-Now, we will see how the autoscaler reacts on the increased load of the server.
-We will start an infinite loop of queries to our server (please run it in a different terminal):
-
-```shell
-$ while true; do curl http://146.148.6.244; done
-```
-
-We may examine, how CPU load was increased (the results should be visible after about 3-4 minutes) by executing:
-
-```shell
-$ kubectl get hpa
-NAME         REFERENCE                                   TARGET    CURRENT   MINPODS   MAXPODS   AGE
-php-apache   ReplicationController/default/php-apache/   50%       305%      1         10        4m
-```
-
-In the case presented here, it bumped CPU consumption to 305% of the request.
-As a result, the replication controller was resized to 7 replicas:
-
-```shell
-$ kubectl get rc
-CONTROLLER   CONTAINER(S)   IMAGE(S)                               SELECTOR         REPLICAS   AGE
-php-apache   php-apache     gcr.io/google_containers/hpa-example   run=php-apache   7          18m
-```
-
-Now, we may increase the load even more by running yet another infinite loop of queries (in yet another terminal):
-
-```shell
-$ while true; do curl http://146.148.6.244; done
-```
-
-In the case presented here, it increased the number of serving pods to 10:
-
-```shell
-$ kubectl get hpa
-NAME         REFERENCE                                   TARGET    CURRENT   MINPODS   MAXPODS   AGE
-php-apache   ReplicationController/default/php-apache/   50%       65%       1         10        14m
-
-$ kubectl get rc
-CONTROLLER   CONTAINER(S)   IMAGE(S)                               SELECTOR         REPLICAS   AGE
-php-apache   php-apache     gcr.io/google_containers/hpa-example   run=php-apache   10         24m
-```
-
-## Step Four: Stop load
-
-We will finish our example by stopping the user load.
-
-We will terminate both infinite ``while`` loops sending requests to the server and verify the result state:
-
-```shell
-$ kubectl get hpa
-NAME         REFERENCE                                   TARGET    CURRENT   MINPODS   MAXPODS   AGE
-php-apache   ReplicationController/default/php-apache/   50%       0%        1         10        21m
-
-$ kubectl get rc
-CONTROLLER   CONTAINER(S)   IMAGE(S)                               SELECTOR         REPLICAS   AGE
-php-apache   php-apache     gcr.io/google_containers/hpa-example   run=php-apache   1          31m
-```
-
-As we see, in the presented case CPU utilization dropped to 0, and the number of replicas dropped to 1.