website/content/en/docs/tasks/run-application/horizontal-pod-autoscale-wa...

518 lines
19 KiB
Markdown

---
reviewers:
- fgrzadkowski
- jszczepkowski
- justinsb
- directxman12
title: HorizontalPodAutoscaler Walkthrough
content_type: task
weight: 100
min-kubernetes-server-version: 1.23
---
<!-- overview -->
A [HorizontalPodAutoscaler](/docs/tasks/run-application/horizontal-pod-autoscale/)
(HPA for short)
automatically updates a workload resource (such as
a {{< glossary_tooltip text="Deployment" term_id="deployment" >}} or
{{< glossary_tooltip text="StatefulSet" term_id="statefulset" >}}), with the
aim of automatically scaling the workload to match demand.
Horizontal scaling means that the response to increased load is to deploy more
{{< glossary_tooltip text="Pods" term_id="pod" >}}.
This is different from _vertical_ scaling, which for Kubernetes would mean
assigning more resources (for example: memory or CPU) to the Pods that are already
running for the workload.
If the load decreases, and the number of Pods is above the configured minimum,
the HorizontalPodAutoscaler instructs the workload resource (the Deployment, StatefulSet,
or other similar resource) to scale back down.
This document walks you through an example of enabling HorizontalPodAutoscaler to
automatically manage scale for an example web app. This example workload is Apache
httpd running some PHP code.
## {{% heading "prerequisites" %}}
{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}} If you're running an older
release of Kubernetes, refer to the version of the documentation for that release (see
[available documentation versions](/docs/home/supported-doc-versions/)).
To follow this walkthrough, you also need to use a cluster that has a
[Metrics Server](https://github.com/kubernetes-sigs/metrics-server#readme) deployed and configured.
The Kubernetes Metrics Server collects resource metrics from
the {{<glossary_tooltip term_id="kubelet" text="kubelets">}} in your cluster, and exposes those metrics
through the [Kubernetes API](/docs/concepts/overview/kubernetes-api/),
using an [APIService](/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/) to add
new kinds of resource that represent metric readings.
To learn how to deploy the Metrics Server, see the
[metrics-server documentation](https://github.com/kubernetes-sigs/metrics-server#deployment).
If you are running {{< glossary_tooltip term_id="minikube" >}}, run the following command to enable metrics-server:
```shell
minikube addons enable metrics-server
```
<!-- steps -->
## Run and expose php-apache server
To demonstrate a HorizontalPodAutoscaler, you will first start a Deployment that runs a container using the
`hpa-example` image, and expose it as a {{< glossary_tooltip term_id="service">}}
using the following manifest:
{{% code_sample file="application/php-apache.yaml" %}}
To do so, run the following command:
```shell
kubectl apply -f https://k8s.io/examples/application/php-apache.yaml
```
```
deployment.apps/php-apache created
service/php-apache created
```
## Create the HorizontalPodAutoscaler {#create-horizontal-pod-autoscaler}
Now that the server is running, create the autoscaler using `kubectl`. There is
[`kubectl autoscale`](/docs/reference/generated/kubectl/kubectl-commands#autoscale) subcommand,
part of `kubectl`, that helps you do this.
You will shortly run a command that creates a HorizontalPodAutoscaler that maintains
between 1 and 10 replicas of the Pods controlled by the php-apache Deployment that
you created in the first step of these instructions.
Roughly speaking, the HPA {{<glossary_tooltip text="controller" term_id="controller">}} will increase and decrease
the number of replicas (by updating the Deployment) to maintain an average CPU utilization across all Pods of 50%.
The Deployment then updates the ReplicaSet - this is part of how all Deployments work in Kubernetes -
and then the ReplicaSet either adds or removes Pods based on the change to its `.spec`.
Since each pod requests 200 milli-cores by `kubectl run`, this means an average CPU usage of 100 milli-cores.
See [Algorithm details](/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details) for more details
on the algorithm.
Create the HorizontalPodAutoscaler:
```shell
kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
```
```
horizontalpodautoscaler.autoscaling/php-apache autoscaled
```
You can check the current status of the newly-made HorizontalPodAutoscaler, by running:
```shell
# You can use "hpa" or "horizontalpodautoscaler"; either name works OK.
kubectl get hpa
```
The output is similar to:
```
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache/scale 0% / 50% 1 10 1 18s
```
(if you see other HorizontalPodAutoscalers with different names, that means they already existed,
and isn't usually a problem).
Please note that the current CPU consumption is 0% as there are no clients sending requests to the server
(the ``TARGET`` column shows the average across all the Pods controlled by the corresponding deployment).
## Increase the load {#increase-load}
Next, see how the autoscaler reacts to increased load.
To do this, you'll start a different Pod to act as a client. The container within the client Pod
runs in an infinite loop, sending queries to the php-apache service.
```shell
# Run this in a separate terminal
# so that the load generation continues and you can carry on with the rest of the steps
kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
```
Now run:
```shell
# type Ctrl+C to end the watch when you're ready
kubectl get hpa php-apache --watch
```
Within a minute or so, you should see the higher CPU load; for example:
```
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache/scale 305% / 50% 1 10 1 3m
```
and then, more replicas. For example:
```
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache/scale 305% / 50% 1 10 7 3m
```
Here, CPU consumption has increased to 305% of the request.
As a result, the Deployment was resized to 7 replicas:
```shell
kubectl get deployment php-apache
```
You should see the replica count matching the figure from the HorizontalPodAutoscaler
```
NAME READY UP-TO-DATE AVAILABLE AGE
php-apache 7/7 7 7 19m
```
{{< note >}}
It may take a few minutes to stabilize the number of replicas. Since the amount
of load is not controlled in any way it may happen that the final number of replicas
will differ from this example.
{{< /note >}}
## Stop generating load {#stop-load}
To finish the example, stop sending the load.
In the terminal where you created the Pod that runs a `busybox` image, terminate
the load generation by typing `<Ctrl> + C`.
Then verify the result state (after a minute or so):
```shell
# type Ctrl+C to end the watch when you're ready
kubectl get hpa php-apache --watch
```
The output is similar to:
```
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache/scale 0% / 50% 1 10 1 11m
```
and the Deployment also shows that it has scaled down:
```shell
kubectl get deployment php-apache
```
```
NAME READY UP-TO-DATE AVAILABLE AGE
php-apache 1/1 1 1 27m
```
Once CPU utilization dropped to 0, the HPA automatically scaled the number of replicas back down to 1.
Autoscaling the replicas may take a few minutes.
<!-- discussion -->
## Autoscaling on multiple metrics and custom metrics
You can introduce additional metrics to use when autoscaling the `php-apache` Deployment
by making use of the `autoscaling/v2` API version.
First, get the YAML of your HorizontalPodAutoscaler in the `autoscaling/v2` form:
```shell
kubectl get hpa php-apache -o yaml > /tmp/hpa-v2.yaml
```
Open the `/tmp/hpa-v2.yaml` file in an editor, and you should see YAML which looks like this:
```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
status:
observedGeneration: 1
lastScaleTime: <some-time>
currentReplicas: 1
desiredReplicas: 1
currentMetrics:
- type: Resource
resource:
name: cpu
current:
averageUtilization: 0
averageValue: 0
```
Notice that the `targetCPUUtilizationPercentage` field has been replaced with an array called `metrics`.
The CPU utilization metric is a *resource metric*, since it is represented as a percentage of a resource
specified on pod containers. Notice that you can specify other resource metrics besides CPU. By default,
the only other supported resource metric is memory. These resources do not change names from cluster
to cluster, and should always be available, as long as the `metrics.k8s.io` API is available.
You can also specify resource metrics in terms of direct values, instead of as percentages of the
requested value, by using a `target.type` of `AverageValue` instead of `Utilization`, and
setting the corresponding `target.averageValue` field instead of the `target.averageUtilization`.
There are two other types of metrics, both of which are considered *custom metrics*: pod metrics and
object metrics. These metrics may have names which are cluster specific, and require a more
advanced cluster monitoring setup.
The first of these alternative metric types is *pod metrics*. These metrics describe Pods, and
are averaged together across Pods and compared with a target value to determine the replica count.
They work much like resource metrics, except that they *only* support a `target` type of `AverageValue`.
Pod metrics are specified using a metric block like this:
```yaml
type: Pods
pods:
metric:
name: packets-per-second
target:
type: AverageValue
averageValue: 1k
```
The second alternative metric type is *object metrics*. These metrics describe a different
object in the same namespace, instead of describing Pods. The metrics are not necessarily
fetched from the object; they only describe it. Object metrics support `target` types of
both `Value` and `AverageValue`. With `Value`, the target is compared directly to the returned
metric from the API. With `AverageValue`, the value returned from the custom metrics API is divided
by the number of Pods before being compared to the target. The following example is the YAML
representation of the `requests-per-second` metric.
```yaml
type: Object
object:
metric:
name: requests-per-second
describedObject:
apiVersion: networking.k8s.io/v1
kind: Ingress
name: main-route
target:
type: Value
value: 2k
```
If you provide multiple such metric blocks, the HorizontalPodAutoscaler will consider each metric in turn.
The HorizontalPodAutoscaler will calculate proposed replica counts for each metric, and then choose the
one with the highest replica count.
For example, if you had your monitoring system collecting metrics about network traffic,
you could update the definition above using `kubectl edit` to look like this:
```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Pods
pods:
metric:
name: packets-per-second
target:
type: AverageValue
averageValue: 1k
- type: Object
object:
metric:
name: requests-per-second
describedObject:
apiVersion: networking.k8s.io/v1
kind: Ingress
name: main-route
target:
type: Value
value: 10k
status:
observedGeneration: 1
lastScaleTime: <some-time>
currentReplicas: 1
desiredReplicas: 1
currentMetrics:
- type: Resource
resource:
name: cpu
current:
averageUtilization: 0
averageValue: 0
- type: Object
object:
metric:
name: requests-per-second
describedObject:
apiVersion: networking.k8s.io/v1
kind: Ingress
name: main-route
current:
value: 10k
```
Then, your HorizontalPodAutoscaler would attempt to ensure that each pod was consuming roughly
50% of its requested CPU, serving 1000 packets per second, and that all pods behind the main-route
Ingress were serving a total of 10000 requests per second.
### Autoscaling on more specific metrics
Many metrics pipelines allow you to describe metrics either by name or by a set of additional
descriptors called _labels_. For all non-resource metric types (pod, object, and external,
described below), you can specify an additional label selector which is passed to your metric
pipeline. For instance, if you collect a metric `http_requests` with the `verb`
label, you can specify the following metric block to scale only on GET requests:
```yaml
type: Object
object:
metric:
name: http_requests
selector: {matchLabels: {verb: GET}}
```
This selector uses the same syntax as the full Kubernetes label selectors. The monitoring pipeline
determines how to collapse multiple series into a single value, if the name and selector
match multiple series. The selector is additive, and cannot select metrics
that describe objects that are **not** the target object (the target pods in the case of the `Pods`
type, and the described object in the case of the `Object` type).
### Autoscaling on metrics not related to Kubernetes objects
Applications running on Kubernetes may need to autoscale based on metrics that don't have an obvious
relationship to any object in the Kubernetes cluster, such as metrics describing a hosted service with
no direct correlation to Kubernetes namespaces. In Kubernetes 1.10 and later, you can address this use case
with *external metrics*.
Using external metrics requires knowledge of your monitoring system; the setup is
similar to that required when using custom metrics. External metrics allow you to autoscale your cluster
based on any metric available in your monitoring system. Provide a `metric` block with a
`name` and `selector`, as above, and use the `External` metric type instead of `Object`.
If multiple time series are matched by the `metricSelector`,
the sum of their values is used by the HorizontalPodAutoscaler.
External metrics support both the `Value` and `AverageValue` target types, which function exactly the same
as when you use the `Object` type.
For example if your application processes tasks from a hosted queue service, you could add the following
section to your HorizontalPodAutoscaler manifest to specify that you need one worker per 30 outstanding tasks.
```yaml
- type: External
external:
metric:
name: queue_messages_ready
selector:
matchLabels:
queue: "worker_tasks"
target:
type: AverageValue
averageValue: 30
```
When possible, it's preferable to use the custom metric target types instead of external metrics, since it's
easier for cluster administrators to secure the custom metrics API. The external metrics API potentially allows
access to any metric, so cluster administrators should take care when exposing it.
## Appendix: Horizontal Pod Autoscaler Status Conditions
When using the `autoscaling/v2` form of the HorizontalPodAutoscaler, you will be able to see
*status conditions* set by Kubernetes on the HorizontalPodAutoscaler. These status conditions indicate
whether or not the HorizontalPodAutoscaler is able to scale, and whether or not it is currently restricted
in any way.
The conditions appear in the `status.conditions` field. To see the conditions affecting a HorizontalPodAutoscaler,
we can use `kubectl describe hpa`:
```shell
kubectl describe hpa cm-test
```
```
Name: cm-test
Namespace: prom
Labels: <none>
Annotations: <none>
CreationTimestamp: Fri, 16 Jun 2017 18:09:22 +0000
Reference: ReplicationController/cm-test
Metrics: ( current / target )
"http_requests" on pods: 66m / 500m
Min replicas: 1
Max replicas: 4
ReplicationController pods: 1 current / 1 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ReadyForNewScale the last scale time was sufficiently old as to warrant a new scale
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from pods metric http_requests
ScalingLimited False DesiredWithinRange the desired replica count is within the acceptable range
Events:
```
For this HorizontalPodAutoscaler, you can see several conditions in a healthy state. The first,
`AbleToScale`, indicates whether or not the HPA is able to fetch and update scales, as well as
whether or not any backoff-related conditions would prevent scaling. The second, `ScalingActive`,
indicates whether or not the HPA is enabled (i.e. the replica count of the target is not zero) and
is able to calculate desired scales. When it is `False`, it generally indicates problems with
fetching metrics. Finally, the last condition, `ScalingLimited`, indicates that the desired scale
was capped by the maximum or minimum of the HorizontalPodAutoscaler. This is an indication that
you may wish to raise or lower the minimum or maximum replica count constraints on your
HorizontalPodAutoscaler.
## Quantities
All metrics in the HorizontalPodAutoscaler and metrics APIs are specified using
a special whole-number notation known in Kubernetes as a
{{< glossary_tooltip term_id="quantity" text="quantity">}}. For example,
the quantity `10500m` would be written as `10.5` in decimal notation. The metrics APIs
will return whole numbers without a suffix when possible, and will generally return
quantities in milli-units otherwise. This means you might see your metric value fluctuate
between `1` and `1500m`, or `1` and `1.5` when written in decimal notation.
## Other possible scenarios
### Creating the autoscaler declaratively
Instead of using `kubectl autoscale` command to create a HorizontalPodAutoscaler imperatively we
can use the following manifest to create it declaratively:
{{% code_sample file="application/hpa/php-apache.yaml" %}}
Then, create the autoscaler by executing the following command:
```shell
kubectl create -f https://k8s.io/examples/application/hpa/php-apache.yaml
```
```
horizontalpodautoscaler.autoscaling/php-apache created
```