518 lines
19 KiB
Markdown
518 lines
19 KiB
Markdown
---
|
|
reviewers:
|
|
- fgrzadkowski
|
|
- jszczepkowski
|
|
- justinsb
|
|
- directxman12
|
|
title: HorizontalPodAutoscaler Walkthrough
|
|
content_type: task
|
|
weight: 100
|
|
min-kubernetes-server-version: 1.23
|
|
---
|
|
|
|
<!-- overview -->
|
|
|
|
A [HorizontalPodAutoscaler](/docs/tasks/run-application/horizontal-pod-autoscale/)
|
|
(HPA for short)
|
|
automatically updates a workload resource (such as
|
|
a {{< glossary_tooltip text="Deployment" term_id="deployment" >}} or
|
|
{{< glossary_tooltip text="StatefulSet" term_id="statefulset" >}}), with the
|
|
aim of automatically scaling the workload to match demand.
|
|
|
|
Horizontal scaling means that the response to increased load is to deploy more
|
|
{{< glossary_tooltip text="Pods" term_id="pod" >}}.
|
|
This is different from _vertical_ scaling, which for Kubernetes would mean
|
|
assigning more resources (for example: memory or CPU) to the Pods that are already
|
|
running for the workload.
|
|
|
|
If the load decreases, and the number of Pods is above the configured minimum,
|
|
the HorizontalPodAutoscaler instructs the workload resource (the Deployment, StatefulSet,
|
|
or other similar resource) to scale back down.
|
|
|
|
This document walks you through an example of enabling HorizontalPodAutoscaler to
|
|
automatically manage scale for an example web app. This example workload is Apache
|
|
httpd running some PHP code.
|
|
|
|
## {{% heading "prerequisites" %}}
|
|
|
|
{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}} If you're running an older
|
|
release of Kubernetes, refer to the version of the documentation for that release (see
|
|
[available documentation versions](/docs/home/supported-doc-versions/)).
|
|
|
|
To follow this walkthrough, you also need to use a cluster that has a
|
|
[Metrics Server](https://github.com/kubernetes-sigs/metrics-server#readme) deployed and configured.
|
|
The Kubernetes Metrics Server collects resource metrics from
|
|
the {{<glossary_tooltip term_id="kubelet" text="kubelets">}} in your cluster, and exposes those metrics
|
|
through the [Kubernetes API](/docs/concepts/overview/kubernetes-api/),
|
|
using an [APIService](/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/) to add
|
|
new kinds of resource that represent metric readings.
|
|
|
|
To learn how to deploy the Metrics Server, see the
|
|
[metrics-server documentation](https://github.com/kubernetes-sigs/metrics-server#deployment).
|
|
|
|
If you are running {{< glossary_tooltip term_id="minikube" >}}, run the following command to enable metrics-server:
|
|
|
|
```shell
|
|
minikube addons enable metrics-server
|
|
```
|
|
|
|
<!-- steps -->
|
|
|
|
## Run and expose php-apache server
|
|
|
|
To demonstrate a HorizontalPodAutoscaler, you will first start a Deployment that runs a container using the
|
|
`hpa-example` image, and expose it as a {{< glossary_tooltip term_id="service">}}
|
|
using the following manifest:
|
|
|
|
{{% code_sample file="application/php-apache.yaml" %}}
|
|
|
|
To do so, run the following command:
|
|
|
|
```shell
|
|
kubectl apply -f https://k8s.io/examples/application/php-apache.yaml
|
|
```
|
|
|
|
```
|
|
deployment.apps/php-apache created
|
|
service/php-apache created
|
|
```
|
|
|
|
## Create the HorizontalPodAutoscaler {#create-horizontal-pod-autoscaler}
|
|
|
|
Now that the server is running, create the autoscaler using `kubectl`. There is
|
|
[`kubectl autoscale`](/docs/reference/generated/kubectl/kubectl-commands#autoscale) subcommand,
|
|
part of `kubectl`, that helps you do this.
|
|
|
|
You will shortly run a command that creates a HorizontalPodAutoscaler that maintains
|
|
between 1 and 10 replicas of the Pods controlled by the php-apache Deployment that
|
|
you created in the first step of these instructions.
|
|
|
|
Roughly speaking, the HPA {{<glossary_tooltip text="controller" term_id="controller">}} will increase and decrease
|
|
the number of replicas (by updating the Deployment) to maintain an average CPU utilization across all Pods of 50%.
|
|
The Deployment then updates the ReplicaSet - this is part of how all Deployments work in Kubernetes -
|
|
and then the ReplicaSet either adds or removes Pods based on the change to its `.spec`.
|
|
|
|
Since each pod requests 200 milli-cores by `kubectl run`, this means an average CPU usage of 100 milli-cores.
|
|
See [Algorithm details](/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details) for more details
|
|
on the algorithm.
|
|
|
|
|
|
Create the HorizontalPodAutoscaler:
|
|
|
|
```shell
|
|
kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
|
|
```
|
|
|
|
```
|
|
horizontalpodautoscaler.autoscaling/php-apache autoscaled
|
|
```
|
|
|
|
You can check the current status of the newly-made HorizontalPodAutoscaler, by running:
|
|
|
|
```shell
|
|
# You can use "hpa" or "horizontalpodautoscaler"; either name works OK.
|
|
kubectl get hpa
|
|
```
|
|
|
|
The output is similar to:
|
|
```
|
|
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE
|
|
php-apache Deployment/php-apache/scale 0% / 50% 1 10 1 18s
|
|
```
|
|
|
|
(if you see other HorizontalPodAutoscalers with different names, that means they already existed,
|
|
and isn't usually a problem).
|
|
|
|
Please note that the current CPU consumption is 0% as there are no clients sending requests to the server
|
|
(the ``TARGET`` column shows the average across all the Pods controlled by the corresponding deployment).
|
|
|
|
## Increase the load {#increase-load}
|
|
|
|
Next, see how the autoscaler reacts to increased load.
|
|
To do this, you'll start a different Pod to act as a client. The container within the client Pod
|
|
runs in an infinite loop, sending queries to the php-apache service.
|
|
|
|
```shell
|
|
# Run this in a separate terminal
|
|
# so that the load generation continues and you can carry on with the rest of the steps
|
|
kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
|
|
```
|
|
|
|
Now run:
|
|
```shell
|
|
# type Ctrl+C to end the watch when you're ready
|
|
kubectl get hpa php-apache --watch
|
|
```
|
|
|
|
Within a minute or so, you should see the higher CPU load; for example:
|
|
|
|
```
|
|
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE
|
|
php-apache Deployment/php-apache/scale 305% / 50% 1 10 1 3m
|
|
```
|
|
|
|
and then, more replicas. For example:
|
|
```
|
|
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE
|
|
php-apache Deployment/php-apache/scale 305% / 50% 1 10 7 3m
|
|
```
|
|
|
|
Here, CPU consumption has increased to 305% of the request.
|
|
As a result, the Deployment was resized to 7 replicas:
|
|
|
|
```shell
|
|
kubectl get deployment php-apache
|
|
```
|
|
|
|
You should see the replica count matching the figure from the HorizontalPodAutoscaler
|
|
```
|
|
NAME READY UP-TO-DATE AVAILABLE AGE
|
|
php-apache 7/7 7 7 19m
|
|
```
|
|
|
|
{{< note >}}
|
|
It may take a few minutes to stabilize the number of replicas. Since the amount
|
|
of load is not controlled in any way it may happen that the final number of replicas
|
|
will differ from this example.
|
|
{{< /note >}}
|
|
|
|
## Stop generating load {#stop-load}
|
|
|
|
To finish the example, stop sending the load.
|
|
|
|
In the terminal where you created the Pod that runs a `busybox` image, terminate
|
|
the load generation by typing `<Ctrl> + C`.
|
|
|
|
Then verify the result state (after a minute or so):
|
|
|
|
```shell
|
|
# type Ctrl+C to end the watch when you're ready
|
|
kubectl get hpa php-apache --watch
|
|
```
|
|
|
|
The output is similar to:
|
|
|
|
```
|
|
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE
|
|
php-apache Deployment/php-apache/scale 0% / 50% 1 10 1 11m
|
|
```
|
|
|
|
and the Deployment also shows that it has scaled down:
|
|
|
|
```shell
|
|
kubectl get deployment php-apache
|
|
```
|
|
|
|
```
|
|
NAME READY UP-TO-DATE AVAILABLE AGE
|
|
php-apache 1/1 1 1 27m
|
|
```
|
|
|
|
Once CPU utilization dropped to 0, the HPA automatically scaled the number of replicas back down to 1.
|
|
|
|
Autoscaling the replicas may take a few minutes.
|
|
|
|
<!-- discussion -->
|
|
|
|
## Autoscaling on multiple metrics and custom metrics
|
|
|
|
You can introduce additional metrics to use when autoscaling the `php-apache` Deployment
|
|
by making use of the `autoscaling/v2` API version.
|
|
|
|
First, get the YAML of your HorizontalPodAutoscaler in the `autoscaling/v2` form:
|
|
|
|
```shell
|
|
kubectl get hpa php-apache -o yaml > /tmp/hpa-v2.yaml
|
|
```
|
|
|
|
Open the `/tmp/hpa-v2.yaml` file in an editor, and you should see YAML which looks like this:
|
|
|
|
```yaml
|
|
apiVersion: autoscaling/v2
|
|
kind: HorizontalPodAutoscaler
|
|
metadata:
|
|
name: php-apache
|
|
spec:
|
|
scaleTargetRef:
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
name: php-apache
|
|
minReplicas: 1
|
|
maxReplicas: 10
|
|
metrics:
|
|
- type: Resource
|
|
resource:
|
|
name: cpu
|
|
target:
|
|
type: Utilization
|
|
averageUtilization: 50
|
|
status:
|
|
observedGeneration: 1
|
|
lastScaleTime: <some-time>
|
|
currentReplicas: 1
|
|
desiredReplicas: 1
|
|
currentMetrics:
|
|
- type: Resource
|
|
resource:
|
|
name: cpu
|
|
current:
|
|
averageUtilization: 0
|
|
averageValue: 0
|
|
```
|
|
|
|
Notice that the `targetCPUUtilizationPercentage` field has been replaced with an array called `metrics`.
|
|
The CPU utilization metric is a *resource metric*, since it is represented as a percentage of a resource
|
|
specified on pod containers. Notice that you can specify other resource metrics besides CPU. By default,
|
|
the only other supported resource metric is memory. These resources do not change names from cluster
|
|
to cluster, and should always be available, as long as the `metrics.k8s.io` API is available.
|
|
|
|
You can also specify resource metrics in terms of direct values, instead of as percentages of the
|
|
requested value, by using a `target.type` of `AverageValue` instead of `Utilization`, and
|
|
setting the corresponding `target.averageValue` field instead of the `target.averageUtilization`.
|
|
|
|
There are two other types of metrics, both of which are considered *custom metrics*: pod metrics and
|
|
object metrics. These metrics may have names which are cluster specific, and require a more
|
|
advanced cluster monitoring setup.
|
|
|
|
The first of these alternative metric types is *pod metrics*. These metrics describe Pods, and
|
|
are averaged together across Pods and compared with a target value to determine the replica count.
|
|
They work much like resource metrics, except that they *only* support a `target` type of `AverageValue`.
|
|
|
|
Pod metrics are specified using a metric block like this:
|
|
|
|
```yaml
|
|
type: Pods
|
|
pods:
|
|
metric:
|
|
name: packets-per-second
|
|
target:
|
|
type: AverageValue
|
|
averageValue: 1k
|
|
```
|
|
|
|
The second alternative metric type is *object metrics*. These metrics describe a different
|
|
object in the same namespace, instead of describing Pods. The metrics are not necessarily
|
|
fetched from the object; they only describe it. Object metrics support `target` types of
|
|
both `Value` and `AverageValue`. With `Value`, the target is compared directly to the returned
|
|
metric from the API. With `AverageValue`, the value returned from the custom metrics API is divided
|
|
by the number of Pods before being compared to the target. The following example is the YAML
|
|
representation of the `requests-per-second` metric.
|
|
|
|
```yaml
|
|
type: Object
|
|
object:
|
|
metric:
|
|
name: requests-per-second
|
|
describedObject:
|
|
apiVersion: networking.k8s.io/v1
|
|
kind: Ingress
|
|
name: main-route
|
|
target:
|
|
type: Value
|
|
value: 2k
|
|
```
|
|
|
|
If you provide multiple such metric blocks, the HorizontalPodAutoscaler will consider each metric in turn.
|
|
The HorizontalPodAutoscaler will calculate proposed replica counts for each metric, and then choose the
|
|
one with the highest replica count.
|
|
|
|
For example, if you had your monitoring system collecting metrics about network traffic,
|
|
you could update the definition above using `kubectl edit` to look like this:
|
|
|
|
```yaml
|
|
apiVersion: autoscaling/v2
|
|
kind: HorizontalPodAutoscaler
|
|
metadata:
|
|
name: php-apache
|
|
spec:
|
|
scaleTargetRef:
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
name: php-apache
|
|
minReplicas: 1
|
|
maxReplicas: 10
|
|
metrics:
|
|
- type: Resource
|
|
resource:
|
|
name: cpu
|
|
target:
|
|
type: Utilization
|
|
averageUtilization: 50
|
|
- type: Pods
|
|
pods:
|
|
metric:
|
|
name: packets-per-second
|
|
target:
|
|
type: AverageValue
|
|
averageValue: 1k
|
|
- type: Object
|
|
object:
|
|
metric:
|
|
name: requests-per-second
|
|
describedObject:
|
|
apiVersion: networking.k8s.io/v1
|
|
kind: Ingress
|
|
name: main-route
|
|
target:
|
|
type: Value
|
|
value: 10k
|
|
status:
|
|
observedGeneration: 1
|
|
lastScaleTime: <some-time>
|
|
currentReplicas: 1
|
|
desiredReplicas: 1
|
|
currentMetrics:
|
|
- type: Resource
|
|
resource:
|
|
name: cpu
|
|
current:
|
|
averageUtilization: 0
|
|
averageValue: 0
|
|
- type: Object
|
|
object:
|
|
metric:
|
|
name: requests-per-second
|
|
describedObject:
|
|
apiVersion: networking.k8s.io/v1
|
|
kind: Ingress
|
|
name: main-route
|
|
current:
|
|
value: 10k
|
|
```
|
|
|
|
Then, your HorizontalPodAutoscaler would attempt to ensure that each pod was consuming roughly
|
|
50% of its requested CPU, serving 1000 packets per second, and that all pods behind the main-route
|
|
Ingress were serving a total of 10000 requests per second.
|
|
|
|
### Autoscaling on more specific metrics
|
|
|
|
Many metrics pipelines allow you to describe metrics either by name or by a set of additional
|
|
descriptors called _labels_. For all non-resource metric types (pod, object, and external,
|
|
described below), you can specify an additional label selector which is passed to your metric
|
|
pipeline. For instance, if you collect a metric `http_requests` with the `verb`
|
|
label, you can specify the following metric block to scale only on GET requests:
|
|
|
|
```yaml
|
|
type: Object
|
|
object:
|
|
metric:
|
|
name: http_requests
|
|
selector: {matchLabels: {verb: GET}}
|
|
```
|
|
|
|
This selector uses the same syntax as the full Kubernetes label selectors. The monitoring pipeline
|
|
determines how to collapse multiple series into a single value, if the name and selector
|
|
match multiple series. The selector is additive, and cannot select metrics
|
|
that describe objects that are **not** the target object (the target pods in the case of the `Pods`
|
|
type, and the described object in the case of the `Object` type).
|
|
|
|
### Autoscaling on metrics not related to Kubernetes objects
|
|
|
|
Applications running on Kubernetes may need to autoscale based on metrics that don't have an obvious
|
|
relationship to any object in the Kubernetes cluster, such as metrics describing a hosted service with
|
|
no direct correlation to Kubernetes namespaces. In Kubernetes 1.10 and later, you can address this use case
|
|
with *external metrics*.
|
|
|
|
Using external metrics requires knowledge of your monitoring system; the setup is
|
|
similar to that required when using custom metrics. External metrics allow you to autoscale your cluster
|
|
based on any metric available in your monitoring system. Provide a `metric` block with a
|
|
`name` and `selector`, as above, and use the `External` metric type instead of `Object`.
|
|
If multiple time series are matched by the `metricSelector`,
|
|
the sum of their values is used by the HorizontalPodAutoscaler.
|
|
External metrics support both the `Value` and `AverageValue` target types, which function exactly the same
|
|
as when you use the `Object` type.
|
|
|
|
For example if your application processes tasks from a hosted queue service, you could add the following
|
|
section to your HorizontalPodAutoscaler manifest to specify that you need one worker per 30 outstanding tasks.
|
|
|
|
```yaml
|
|
- type: External
|
|
external:
|
|
metric:
|
|
name: queue_messages_ready
|
|
selector:
|
|
matchLabels:
|
|
queue: "worker_tasks"
|
|
target:
|
|
type: AverageValue
|
|
averageValue: 30
|
|
```
|
|
|
|
When possible, it's preferable to use the custom metric target types instead of external metrics, since it's
|
|
easier for cluster administrators to secure the custom metrics API. The external metrics API potentially allows
|
|
access to any metric, so cluster administrators should take care when exposing it.
|
|
|
|
## Appendix: Horizontal Pod Autoscaler Status Conditions
|
|
|
|
When using the `autoscaling/v2` form of the HorizontalPodAutoscaler, you will be able to see
|
|
*status conditions* set by Kubernetes on the HorizontalPodAutoscaler. These status conditions indicate
|
|
whether or not the HorizontalPodAutoscaler is able to scale, and whether or not it is currently restricted
|
|
in any way.
|
|
|
|
The conditions appear in the `status.conditions` field. To see the conditions affecting a HorizontalPodAutoscaler,
|
|
we can use `kubectl describe hpa`:
|
|
|
|
```shell
|
|
kubectl describe hpa cm-test
|
|
```
|
|
|
|
```
|
|
Name: cm-test
|
|
Namespace: prom
|
|
Labels: <none>
|
|
Annotations: <none>
|
|
CreationTimestamp: Fri, 16 Jun 2017 18:09:22 +0000
|
|
Reference: ReplicationController/cm-test
|
|
Metrics: ( current / target )
|
|
"http_requests" on pods: 66m / 500m
|
|
Min replicas: 1
|
|
Max replicas: 4
|
|
ReplicationController pods: 1 current / 1 desired
|
|
Conditions:
|
|
Type Status Reason Message
|
|
---- ------ ------ -------
|
|
AbleToScale True ReadyForNewScale the last scale time was sufficiently old as to warrant a new scale
|
|
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from pods metric http_requests
|
|
ScalingLimited False DesiredWithinRange the desired replica count is within the acceptable range
|
|
Events:
|
|
```
|
|
|
|
For this HorizontalPodAutoscaler, you can see several conditions in a healthy state. The first,
|
|
`AbleToScale`, indicates whether or not the HPA is able to fetch and update scales, as well as
|
|
whether or not any backoff-related conditions would prevent scaling. The second, `ScalingActive`,
|
|
indicates whether or not the HPA is enabled (i.e. the replica count of the target is not zero) and
|
|
is able to calculate desired scales. When it is `False`, it generally indicates problems with
|
|
fetching metrics. Finally, the last condition, `ScalingLimited`, indicates that the desired scale
|
|
was capped by the maximum or minimum of the HorizontalPodAutoscaler. This is an indication that
|
|
you may wish to raise or lower the minimum or maximum replica count constraints on your
|
|
HorizontalPodAutoscaler.
|
|
|
|
## Quantities
|
|
|
|
All metrics in the HorizontalPodAutoscaler and metrics APIs are specified using
|
|
a special whole-number notation known in Kubernetes as a
|
|
{{< glossary_tooltip term_id="quantity" text="quantity">}}. For example,
|
|
the quantity `10500m` would be written as `10.5` in decimal notation. The metrics APIs
|
|
will return whole numbers without a suffix when possible, and will generally return
|
|
quantities in milli-units otherwise. This means you might see your metric value fluctuate
|
|
between `1` and `1500m`, or `1` and `1.5` when written in decimal notation.
|
|
|
|
## Other possible scenarios
|
|
|
|
### Creating the autoscaler declaratively
|
|
|
|
Instead of using `kubectl autoscale` command to create a HorizontalPodAutoscaler imperatively we
|
|
can use the following manifest to create it declaratively:
|
|
|
|
{{% code_sample file="application/hpa/php-apache.yaml" %}}
|
|
|
|
Then, create the autoscaler by executing the following command:
|
|
|
|
```shell
|
|
kubectl create -f https://k8s.io/examples/application/hpa/php-apache.yaml
|
|
```
|
|
|
|
```
|
|
horizontalpodautoscaler.autoscaling/php-apache created
|
|
```
|