Updated HPA user guide and example

pull/133/head
Piotr Szczesniak 2016-03-15 15:28:17 +01:00
parent 3a2026677d
commit 664f570450
3 changed files with 112 additions and 144 deletions

View File

@ -5,9 +5,8 @@ This document describes the current state of Horizontal Pod Autoscaler in Kubern
## What is Horizontal Pod Autoscaler?
Horizontal pod autoscaling allows the number of pods in a replication controller or deployment
to scale automatically based on observed CPU utilization.
It is a [beta](/docs/api/)#api-versioning) feature in Kubernetes 1.1.
Horizontal pod autoscaling allows to automatically scale the number of pods
in a replication controller, deployment or replica set based on observed CPU utilization.
The autoscaler is implemented as a Kubernetes API resource and a controller.
The resource describes behavior of the controller.
@ -33,14 +32,20 @@ Further details of the autoscaling algorithm are given [here](https://github.com
Autoscaler uses heapster to collect CPU utilization.
Therefore, it is required to deploy heapster monitoring in your cluster for autoscaling to work.
Autoscaler accesses corresponding replication controller or deployment by scale sub-resource.
Autoscaler accesses corresponding replication controller, deployment or replica set by scale sub-resource.
Scale is an interface which allows to dynamically set the number of replicas and to learn the current state of them.
More details on scale sub-resource can be found [here](https://github.com/kubernetes/kubernetes/blob/{{page.githubbranch}}/docs/design/horizontal-pod-autoscaler.md#scale-subresource).
## API Object
Horizontal pod autoscaler is a top-level resource in the Kubernetes REST API (currently in [beta](/docs/api/)#api-versioning)).
Horizontal pod autoscaler is a top-level resource in the Kubernetes REST API.
In Kubernetes 1.2 HPA was graduated from beta to stable (more details about [api versioning](/docs/api/#api-versioning)) with compatibility between versions.
The stable version is available in `autoscaling/v1` api group whereas the beta vesion is available in `extensions/v1beta1` api group as before.
The transition plan is to depracate beta verion of HPA in Kubernetes 1.3 and get it rid off completely in Kubernetes 1.4.
**Warning!** Please have in mind that all Kubernetes components still use HPA in version `extensions/v1beta1` in Kubernetes 1.2.
More details about the API object can be found at
[HorizontalPodAutoscaler Object](https://github.com/kubernetes/kubernetes/blob/{{page.githubbranch}}/docs/design/horizontal-pod-autoscaler.md#horizontalpodautoscaler-object).

View File

@ -5,7 +5,7 @@ metadata:
namespace: default
spec:
scaleRef:
kind: ReplicationController
kind: Deployment
name: php-apache
subresource: scale
minReplicas: 1

View File

@ -1,75 +1,126 @@
---
---
Horizontal pod autoscaling is a [beta](/docs/api/#api-versioning) feature in Kubernetes 1.1.
It allows the number of pods in a replication controller or deployment to scale automatically based on observed CPU usage.
Horizontal pod autoscaling allows to automatically scale the number of pods
in a replication controller, deployment or replica set based on observed CPU utilization.
In the future also other metrics will be supported.
In this document we explain how this feature works by walking you through an example of enabling horizontal pod autoscaling with the php-apache server.
In this document we explain how this feature works by walking you through an example of enabling horizontal pod autoscaling for the php-apache server.
## Prerequisites
This example requires a running Kubernetes cluster and kubectl in the version at least 1.1.
This example requires a running Kubernetes cluster and kubectl in the version at least 1.2.
[Heapster](https://github.com/kubernetes/heapster) monitoring needs to be deployed in the cluster
as horizontal pod autoscaler uses it to collect metrics
(if you followed [getting started on GCE guide](/docs/getting-started-guides/gce),
heapster monitoring will be turned-on by default).
## Step One: Run & expose php-apache server
To demonstrate horizontal pod autoscaler we will use a custom docker image based on php-apache server.
The image can be found [here](https://releases.k8s.io/{{page.githubbranch}}/docs/user-guide/horizontal-pod-autoscaling/image).
The image can be found [here](/docs/user-guide/horizontal-pod-autoscaling/image).
It defines [index.php](/docs/user-guide/horizontal-pod-autoscaling/image/index.php) page which performs some CPU intensive computations.
First, we will start a replication controller running the image and expose it as an external service:
<a name="kubectl-run"></a>
First, we will start a deployment running the image and expose it as a service:
```shell
$ kubectl run php-apache --image=gcr.io/google_containers/hpa-example --requests=cpu=200m
replicationcontroller "php-apache" created
$ kubectl expose rc php-apache --port=80 --type=LoadBalancer
service "php-apache" exposed
```
Now, we will wait some time and verify that both the replication controller and the service were correctly created and are running. We will also determine the IP address of the service:
```shell
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
php-apache-wa3t1 1/1 Running 0 12m
$ kubectl describe services php-apache | grep "LoadBalancer Ingress"
LoadBalancer Ingress: 146.148.24.244
```
We may now check that php-apache server works correctly by calling `curl` with the service's IP:
```shell
$ curl http://146.148.24.244
OK!
```
Please notice that when exposing the service we assumed that our cluster runs on a provider which supports load balancers (e.g.: on GCE).
If load balancers are not supported (e.g.: on Vagrant), we can expose php-apache service as ``ClusterIP`` and connect to it using the proxy on the master:
```shell
$ kubectl expose rc php-apache --port=80 --type=ClusterIP
service "php-apache" exposed
$ kubectl cluster-info | grep master
Kubernetes master is running at https://146.148.6.215
$ curl -k -u <admin>:<password> https://146.148.6.215/api/v1/proxy/namespaces/default/services/php-apache/
OK!
$ kubectl run php-apache --image=gcr.io/google_containers/hpa-example --requests=cpu=200m --expose --port=80
service "php-apache" created
deployment "php-apache" created
```
## Step Two: Create horizontal pod autoscaler
Now that the server is running, we will create a horizontal pod autoscaler for it.
To create it, we will use the [hpa-php-apache.yaml](/docs/user-guide/horizontal-pod-autoscaling/hpa-php-apache.yaml) file, which looks like this:
Now that the server is running, we will create the autoscaler using
[kubectl autoscale](https://github.com/kubernetes/kubernetes/blob/{{page.githubbranch}}/docs/user-guide/kubectl/kubectl_autoscale.md).
The following command will create a horizontal pod autoscaler that maintains between 1 and 10 replicas of the Pods
controlled by the php-apache deployment we created in the first step of these instructions.
Roughly speaking, the horizontal autoscaler will increase and decrease the number of replicas
(via the deployment) to maintain an average CPU utilization across all Pods of 50%
(since each pod requests 200 milli-cores by [kubectl run](#kubectl-run), this means average CPU usage of 100 milli-cores).
See [here](https://github.com/kubernetes/kubernetes/blob/{{page.githubbranch}}/docs/design/horizontal-pod-autoscaler.md#autoscaling-algorithm) for more details on the algorithm.
```shell
$ kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
deployment "php-apache" autoscaled
```
We may check the current status of autoscaler by running:
```shell
$ kubectl get hpa
NAME REFERENCE TARGET CURRENT MINPODS MAXPODS AGE
php-apache Deployment/php-apache/scale 50% 0% 1 10 18s
```
Please note that the current CPU consumption is 0% as we are not sending any requests to the server
(the ``CURRENT`` column shows the average across all the pods controlled by the corresponding deployment).
## Step Three: Increase load
Now, we will see how the autoscaler reacts on the increased load on the server.
We will start a container with `busybox` image and an infinite loop of queries to our server inside (please run it in a different terminal):
```shell
$ kubectl run -i --tty load-generator --image=busybox /bin/sh
Hit enter for command prompt
$ while true; do wget -q -O- http://php-apache.default.svc.cluster.local; done
```
We may examine, how CPU load was increased by executing (it usually takes 1 minute):
```shell
$ kubectl get hpa
NAME REFERENCE TARGET CURRENT MINPODS MAXPODS AGE
php-apache Deployment/php-apache/scale 50% 305% 1 10 3m
```
In the case presented here, it bumped CPU consumption to 305% of the request.
As a result, the deployment was resized to 7 replicas:
```shell
$ kubectl get deployment php-apache
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
php-apache 7 7 7 7 19m
```
**Warning!** Sometimes it may take few steps to stabilize the number of replicas.
Since the amount of load is not controlled in any way it may happen that the final number of replicas will
differ from this example.
## Step Four: Stop load
We will finish our example by stopping the user load.
In the terminal where we created container with `busybox` image we will terminate
infinite ``while`` loop by sending `SIGINT` signal,
which can be done using `<Ctrl> + C` combination.
Then we will verify the result state:
```shell
$ kubectl get hpa
NAME REFERENCE TARGET CURRENT MINPODS MAXPODS AGE
php-apache Deployment/php-apache/scale 50% 0% 1 10 11m
$ kubectl get deployment php-apache
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
php-apache 1 1 1 1 27m
```
As we see, in the presented case CPU utilization dropped to 0, and the number of replicas dropped to 1.
**Warning!** Sometimes dropping number of replicas may take few steps.
## Appendix: Other possible scenarios
### Creating the autoscaler from a .yaml file
Instead of using `kubectl autoscale` command we can use the [hpa-php-apache.yaml](/docs/user-guide/horizontal-pod-autoscaling/hpa-php-apache.yaml) file, which looks like this:
```yaml
apiVersion: extensions/v1beta1
@ -79,106 +130,18 @@ metadata:
namespace: default
spec:
scaleRef:
kind: ReplicationController
kind: Deployment
name: php-apache
namespace: default
subresource: scale
minReplicas: 1
maxReplicas: 10
cpuUtilization:
targetPercentage: 50
```
This defines a horizontal pod autoscaler that maintains between 1 and 10 replicas of the Pods
controlled by the php-apache replication controller we created in the first step of these instructions.
Roughly speaking, the horizontal autoscaler will increase and decrease the number of replicas
(via the replication controller) so as to maintain an average CPU utilization across all Pods of 50%
(since each pod requests 200 milli-cores by [kubectl run](#kubectl-run), this means average CPU utilization of 100 milli-cores).
See [here](https://github.com/kubernetes/kubernetes/blob/{{page.githubbranch}}/docs/design/horizontal-pod-autoscaler.md#autoscaling-algorithm) for more details on the algorithm.
We will create the autoscaler by executing the following command:
```shell
$ kubectl create -f docs/user-guide/horizontal-pod-autoscaling/hpa-php-apache.yaml
horizontalpodautoscaler "php-apache" created
```
Alternatively, we can create the autoscaler using [kubectl autoscale](https://github.com/kubernetes/kubernetes/blob/{{page.githubbranch}}/docs/user-guide/kubectl/kubectl_autoscale.md).
The following command will create the equivalent autoscaler as defined in the [hpa-php-apache.yaml](/docs/user-guide/horizontal-pod-autoscaling/hpa-php-apache.yaml) file:
```shell
$ kubectl autoscale rc php-apache --cpu-percent=50 --min=1 --max=10
replicationcontroller "php-apache" autoscaled
```
We may check the current status of autoscaler by running:
```shell
$ kubectl get hpa
NAME REFERENCE TARGET CURRENT MINPODS MAXPODS AGE
php-apache ReplicationController/default/php-apache/ 50% 0% 1 10 27s
```
Please note that the current CPU consumption is 0% as we are not sending any requests to the server
(the ``CURRENT`` column shows the average across all the pods controlled by the corresponding replication controller).
## Step Three: Increase load
Now, we will see how the autoscaler reacts on the increased load of the server.
We will start an infinite loop of queries to our server (please run it in a different terminal):
```shell
$ while true; do curl http://146.148.6.244; done
```
We may examine, how CPU load was increased (the results should be visible after about 3-4 minutes) by executing:
```shell
$ kubectl get hpa
NAME REFERENCE TARGET CURRENT MINPODS MAXPODS AGE
php-apache ReplicationController/default/php-apache/ 50% 305% 1 10 4m
```
In the case presented here, it bumped CPU consumption to 305% of the request.
As a result, the replication controller was resized to 7 replicas:
```shell
$ kubectl get rc
CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS AGE
php-apache php-apache gcr.io/google_containers/hpa-example run=php-apache 7 18m
```
Now, we may increase the load even more by running yet another infinite loop of queries (in yet another terminal):
```shell
$ while true; do curl http://146.148.6.244; done
```
In the case presented here, it increased the number of serving pods to 10:
```shell
$ kubectl get hpa
NAME REFERENCE TARGET CURRENT MINPODS MAXPODS AGE
php-apache ReplicationController/default/php-apache/ 50% 65% 1 10 14m
$ kubectl get rc
CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS AGE
php-apache php-apache gcr.io/google_containers/hpa-example run=php-apache 10 24m
```
## Step Four: Stop load
We will finish our example by stopping the user load.
We will terminate both infinite ``while`` loops sending requests to the server and verify the result state:
```shell
$ kubectl get hpa
NAME REFERENCE TARGET CURRENT MINPODS MAXPODS AGE
php-apache ReplicationController/default/php-apache/ 50% 0% 1 10 21m
$ kubectl get rc
CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS AGE
php-apache php-apache gcr.io/google_containers/hpa-example run=php-apache 1 31m
```
As we see, in the presented case CPU utilization dropped to 0, and the number of replicas dropped to 1.