From 230c74dc256c0229c0d50435751e94059f10700f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jean-Marc=20Fran=C3=A7ois?= Date: Tue, 4 Mar 2025 11:18:18 -0500 Subject: [PATCH 1/7] Add HPA 'configurable tolerance' blog post (KEP-4951). --- .../XXXX-XX-XX-hpa-configurable-tolerance.md | 73 +++++++++++++++++++ 1 file changed, 73 insertions(+) create mode 100644 content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md diff --git a/content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md b/content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md new file mode 100644 index 0000000000..860fa021b8 --- /dev/null +++ b/content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md @@ -0,0 +1,73 @@ +--- +layout: blog +title: "Horizontal Autoscaling Configurable Tolerance" +slug: hpa-configurable-tolerance +date: XXXX-XX-XX +author: "Jean-Marc François" +--- + +This post describes _Configurable Tolerance for Pod Horizontal Autoscaling_, +a new alpha feature first available in Kubernetes 1.33. + +## What is it? + +Horizontal Pod Autoscaling (HPA) is a well-known Kubernetes feature that +allows your workload to automatically resize by adding or removing replicas +based on resource utilization. + +To decide how many replicas a workload requires, users configure their HPA +with a metric (e.g. CPU utilization) and an expected value for this metric (e.g. +80%). The HPA updates the number of replica based on the ratio between the +current and desired metric value. (For example, if there are currently 100 +replicas, the CPU utilization is 88%, and the desired utilization is 80%, the +HPA will ask for `100 * (88/80)` replicas). + +In order to avoid replicas being created or deleted whenever a small metric +fluctuation occurs, Kubernetes require that the current and desired metrics +differ by more than 10%. + +This tolerance of 10% is cluster-wide and typically cannot be fine-tuned. It's +a suitable value for most usage, but too coarse for large deployments, where a +10% tolerance represents tens of pods. As a result, +[users have long asked](https://github.com/kubernetes/kubernetes/issues/116984) +to be able to tune this value. + +Thanks to the _Configurable Tolerance for Pod Horizontal Autoscaling_ feature, +this is now possible. + +## How do I use it? + +Just add the tolerance you want an HPA to use to your `HorizontalPodAutoscaler` +resource. + +Tolerances appear under the `spec.behavior.scaleDown` and +`spec.behavior.scaleUp` fields and can thus be different for scale up and scale +down. A typical usage would be to specify a small tolerance on scale up (to +react quickly to spikes), but lower on scale down (to avoid adding and removing +replicas too quickly in response to small metric fluctuations). + +For example, an HPA with a tolerance of 5% on scale-down, and no +tolerance on scale-up, would look like the following: + +```yaml +apiVersion: autoscaling/v2beta2 +kind: HorizontalPodAutoscaler +metadata: + name: my-app +spec: + behavior: + scaleDown: + tolerance: 0.05 + scaleUp: + tolerance: 0 +``` + +Note: This feature is in alpha in Kubernetes 1.33, gated by the +`HPAConfigurableTolerance` flag. + +## I want all the details! + +Get all the technical details by reading +[KEP-4951](https://github.com/kubernetes/enhancements/tree/master/keps/sig-autoscaling/4951-configurable-hpa-tolerance) +and follow [issue 4951](https://github.com/kubernetes/enhancements/issues/4951) +to be notified of the feature graduation. From ec23d5e3d77fbcfe139f2275803e5a1404e6e1b1 Mon Sep 17 00:00:00 2001 From: jm-franc Date: Tue, 4 Mar 2025 16:01:18 -0500 Subject: [PATCH 2/7] Apply suggestions from code review Co-authored-by: Tim Bannister --- .../XXXX-XX-XX-hpa-configurable-tolerance.md | 26 +++++++++++++------ 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md b/content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md index 860fa021b8..b7bea1b2f7 100644 --- a/content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md +++ b/content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md @@ -1,17 +1,24 @@ --- layout: blog -title: "Horizontal Autoscaling Configurable Tolerance" +title: "Kubernetes v1.33 and Horizontal Autoscaling Configurable Tolerance" slug: hpa-configurable-tolerance +# after the v1.33 release, set a future publication date and remove the draft marker +# the release comms team can confirm which date has been assigned +# +# PRs to remove the draft marker should be opened BEFORE release day +draft: true +math: true # for formulae date: XXXX-XX-XX author: "Jean-Marc François" --- -This post describes _Configurable Tolerance for Pod Horizontal Autoscaling_, +This post describes _configurable tolerance for horizontal Pod autoscaling_, a new alpha feature first available in Kubernetes 1.33. ## What is it? -Horizontal Pod Autoscaling (HPA) is a well-known Kubernetes feature that +[Horizontal Pod autoscaling](/docs/tasks/run-application/horizontal-pod-autoscale/) (HPA) is a +well-known Kubernetes feature that allows your workload to automatically resize by adding or removing replicas based on resource utilization. @@ -23,16 +30,18 @@ replicas, the CPU utilization is 88%, and the desired utilization is 80%, the HPA will ask for `100 * (88/80)` replicas). In order to avoid replicas being created or deleted whenever a small metric -fluctuation occurs, Kubernetes require that the current and desired metrics +fluctuation occurs, Kubernetes applies a form of hysteresis: it only changes the number of replicas +when the the current and desired metric values differ by more than 10%. -This tolerance of 10% is cluster-wide and typically cannot be fine-tuned. It's +This default tolerance of 10% is cluster-wide; in older Kubernetes releases, it could not be fine-tuned. It's a suitable value for most usage, but too coarse for large deployments, where a 10% tolerance represents tens of pods. As a result, -[users have long asked](https://github.com/kubernetes/kubernetes/issues/116984) +users have long [asked](https://github.com/kubernetes/kubernetes/issues/116984) to be able to tune this value. -Thanks to the _Configurable Tolerance for Pod Horizontal Autoscaling_ feature, +In Kubernetes v1.33, +`` this is now possible. ## How do I use it? @@ -50,11 +59,12 @@ For example, an HPA with a tolerance of 5% on scale-down, and no tolerance on scale-up, would look like the following: ```yaml -apiVersion: autoscaling/v2beta2 +apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: my-app spec: + … behavior: scaleDown: tolerance: 0.05 From fae8cf5d5d703a21c0a94b18122a04146d630ec4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jean-Marc=20Fran=C3=A7ois?= Date: Tue, 4 Mar 2025 16:16:14 -0500 Subject: [PATCH 3/7] Update following review. --- .../XXXX-XX-XX-hpa-configurable-tolerance.md | 49 +++++++++---------- 1 file changed, 23 insertions(+), 26 deletions(-) diff --git a/content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md b/content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md index b7bea1b2f7..fc0a5e5c9b 100644 --- a/content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md +++ b/content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md @@ -6,10 +6,10 @@ slug: hpa-configurable-tolerance # the release comms team can confirm which date has been assigned # # PRs to remove the draft marker should be opened BEFORE release day -draft: true +draft: true math: true # for formulae date: XXXX-XX-XX -author: "Jean-Marc François" +author: "Jean-Marc François (Google)" --- This post describes _configurable tolerance for horizontal Pod autoscaling_, @@ -17,37 +17,37 @@ a new alpha feature first available in Kubernetes 1.33. ## What is it? -[Horizontal Pod autoscaling](/docs/tasks/run-application/horizontal-pod-autoscale/) (HPA) is a -well-known Kubernetes feature that -allows your workload to automatically resize by adding or removing replicas -based on resource utilization. +[Horizontal Pod autoscaling](/docs/tasks/run-application/horizontal-pod-autoscale/) +(HPA) is a well-known Kubernetes feature that allows your workload to +automatically resize by adding or removing replicas based on resource +utilization. To decide how many replicas a workload requires, users configure their HPA with a metric (e.g. CPU utilization) and an expected value for this metric (e.g. 80%). The HPA updates the number of replica based on the ratio between the current and desired metric value. (For example, if there are currently 100 replicas, the CPU utilization is 88%, and the desired utilization is 80%, the -HPA will ask for `100 * (88/80)` replicas). +HPA will ask for \\(100 \times (88/80)\)) replicas). In order to avoid replicas being created or deleted whenever a small metric -fluctuation occurs, Kubernetes applies a form of hysteresis: it only changes the number of replicas -when the the current and desired metric values -differ by more than 10%. +fluctuation occurs, Kubernetes applies a form of hysteresis: it only changes the +number of replicas when the the current and desired metric values differ by more +than 10%. -This default tolerance of 10% is cluster-wide; in older Kubernetes releases, it could not be fine-tuned. It's -a suitable value for most usage, but too coarse for large deployments, where a -10% tolerance represents tens of pods. As a result, -users have long [asked](https://github.com/kubernetes/kubernetes/issues/116984) -to be able to tune this value. +This default tolerance of 10% is cluster-wide; in older Kubernetes releases, it +could not be fine-tuned. It's a suitable value for most usage, but too coarse +for large deployments, where a 10% tolerance represents tens of pods. As a +result, users have long +[asked](https://github.com/kubernetes/kubernetes/issues/116984) to be able to +tune this value. -In Kubernetes v1.33, -`` -this is now possible. +In Kubernetes v1.33, this is now possible. ## How do I use it? -Just add the tolerance you want an HPA to use to your `HorizontalPodAutoscaler` -resource. +Enable the `HPAConfigurableTolerance` feature flag in your Kubernetes 1.33 +cluster, then add the tolerance you want an HPA to use to your +HorizontalPodAutoscaler object. Tolerances appear under the `spec.behavior.scaleDown` and `spec.behavior.scaleUp` fields and can thus be different for scale up and scale @@ -55,8 +55,8 @@ down. A typical usage would be to specify a small tolerance on scale up (to react quickly to spikes), but lower on scale down (to avoid adding and removing replicas too quickly in response to small metric fluctuations). -For example, an HPA with a tolerance of 5% on scale-down, and no -tolerance on scale-up, would look like the following: +For example, an HPA with a tolerance of 5% on scale-down, and no tolerance on +scale-up, would look like the following: ```yaml apiVersion: autoscaling/v2 @@ -64,7 +64,7 @@ kind: HorizontalPodAutoscaler metadata: name: my-app spec: - … + ... behavior: scaleDown: tolerance: 0.05 @@ -72,9 +72,6 @@ spec: tolerance: 0 ``` -Note: This feature is in alpha in Kubernetes 1.33, gated by the -`HPAConfigurableTolerance` flag. - ## I want all the details! Get all the technical details by reading From bccde8b0cc5baffeb383bb474e7a76a45a0705c1 Mon Sep 17 00:00:00 2001 From: jm-franc Date: Thu, 13 Mar 2025 11:36:06 -0400 Subject: [PATCH 4/7] Replace 'feature flag' with 'feature gate'. Co-authored-by: Tim Bannister --- content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md b/content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md index fc0a5e5c9b..45b5cd13ef 100644 --- a/content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md +++ b/content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md @@ -45,7 +45,7 @@ In Kubernetes v1.33, this is now possible. ## How do I use it? -Enable the `HPAConfigurableTolerance` feature flag in your Kubernetes 1.33 +Enable the `HPAConfigurableTolerance` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) in your Kubernetes 1.33 cluster, then add the tolerance you want an HPA to use to your HorizontalPodAutoscaler object. From 05d813118f2b5c6c1ab617eb3208dd146a262b3a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jean-Marc=20Fran=C3=A7ois?= Date: Tue, 18 Mar 2025 14:15:44 -0400 Subject: [PATCH 5/7] Updates following review (small fixes, added example). --- .../XXXX-XX-XX-hpa-configurable-tolerance.md | 20 ++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md b/content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md index 45b5cd13ef..b31a82ef32 100644 --- a/content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md +++ b/content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md @@ -17,7 +17,7 @@ a new alpha feature first available in Kubernetes 1.33. ## What is it? -[Horizontal Pod autoscaling](/docs/tasks/run-application/horizontal-pod-autoscale/) +[Horizontal Pod Autoscaling](/docs/tasks/run-application/horizontal-pod-autoscale/) (HPA) is a well-known Kubernetes feature that allows your workload to automatically resize by adding or removing replicas based on resource utilization. @@ -26,8 +26,8 @@ To decide how many replicas a workload requires, users configure their HPA with a metric (e.g. CPU utilization) and an expected value for this metric (e.g. 80%). The HPA updates the number of replica based on the ratio between the current and desired metric value. (For example, if there are currently 100 -replicas, the CPU utilization is 88%, and the desired utilization is 80%, the -HPA will ask for \\(100 \times (88/80)\)) replicas). +replicas, the CPU utilization is 84%, and the desired utilization is 80%, the +HPA will ask for \\(100 \times (84/80)\\)) replicas). In order to avoid replicas being created or deleted whenever a small metric fluctuation occurs, Kubernetes applies a form of hysteresis: it only changes the @@ -37,7 +37,7 @@ than 10%. This default tolerance of 10% is cluster-wide; in older Kubernetes releases, it could not be fine-tuned. It's a suitable value for most usage, but too coarse for large deployments, where a 10% tolerance represents tens of pods. As a -result, users have long +result, the community has long [asked](https://github.com/kubernetes/kubernetes/issues/116984) to be able to tune this value. @@ -45,14 +45,15 @@ In Kubernetes v1.33, this is now possible. ## How do I use it? -Enable the `HPAConfigurableTolerance` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) in your Kubernetes 1.33 -cluster, then add the tolerance you want an HPA to use to your +After enabling the `HPAConfigurableTolerance` +[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) in +your Kubernetes v1.33 cluster, you can add your desired tolerance for your HorizontalPodAutoscaler object. Tolerances appear under the `spec.behavior.scaleDown` and `spec.behavior.scaleUp` fields and can thus be different for scale up and scale down. A typical usage would be to specify a small tolerance on scale up (to -react quickly to spikes), but lower on scale down (to avoid adding and removing +react quickly to spikes), but higher on scale down (to avoid adding and removing replicas too quickly in response to small metric fluctuations). For example, an HPA with a tolerance of 5% on scale-down, and no tolerance on @@ -72,6 +73,11 @@ spec: tolerance: 0 ``` +Consider the previous scenario where the ratio of current to desired metric +values is \\(84/80\\), a 5% increase. With the default 10% scale-up tolerance, +no scaling occurs. However, with the HPA configured as shown, featuring a 0% +scale-up tolerance, the 5% increase triggers scaling. + ## I want all the details! Get all the technical details by reading From 30fc1259ee4488f2cf8f4ee625ccce32d0196bc0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jean-Marc=20Fran=C3=A7ois?= Date: Tue, 18 Mar 2025 14:52:21 -0400 Subject: [PATCH 6/7] Improved example following review. --- .../XXXX-XX-XX-hpa-configurable-tolerance.md | 38 ++++++++++++------- 1 file changed, 24 insertions(+), 14 deletions(-) diff --git a/content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md b/content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md index b31a82ef32..48cc92c070 100644 --- a/content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md +++ b/content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md @@ -18,21 +18,36 @@ a new alpha feature first available in Kubernetes 1.33. ## What is it? [Horizontal Pod Autoscaling](/docs/tasks/run-application/horizontal-pod-autoscale/) -(HPA) is a well-known Kubernetes feature that allows your workload to +is a well-known Kubernetes feature that allows your workload to automatically resize by adding or removing replicas based on resource utilization. -To decide how many replicas a workload requires, users configure their HPA -with a metric (e.g. CPU utilization) and an expected value for this metric (e.g. -80%). The HPA updates the number of replica based on the ratio between the -current and desired metric value. (For example, if there are currently 100 -replicas, the CPU utilization is 84%, and the desired utilization is 80%, the -HPA will ask for \\(100 \times (84/80)\\)) replicas). +Let's say you have a web application running in a Kubernetes cluster with 50 +replicas. You configure the Horizontal Pod Autoscaler (HPA) to scale based on +CPU utilization, with a target of 75% utilization. Now, imagine that the current +CPU utilization across all replicas is 90%, which is higher than the desired +75%. The HPA will calculate the required number of replicas using the formula: +```math +desiredReplicas = ceil\left\lceil currentReplicas \times \frac{currentMetricValue}{desiredMetricValue} \right\rceil +``` + +In this example: +```math +50 \times (90/75) = 60 +``` + +So, the HPA will increase the number of replicas from 50 to 60 to reduce the +load on each pod. Similarly, if the CPU utilization were to drop below 75%, the +HPA would scale down the number of replicas accordingly. The Kubernetes +documentation provides a +[detailed description of the scaling algorithm](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details). In order to avoid replicas being created or deleted whenever a small metric fluctuation occurs, Kubernetes applies a form of hysteresis: it only changes the -number of replicas when the the current and desired metric values differ by more -than 10%. +number of replicas when the current and desired metric values differ by more +than 10%. In the example above, since the ratio between the current and desired +metric values is \\(90/75\\), or 20% above target, exceeding the 10% tolerance, +the scale-up action will proceed. This default tolerance of 10% is cluster-wide; in older Kubernetes releases, it could not be fine-tuned. It's a suitable value for most usage, but too coarse @@ -73,11 +88,6 @@ spec: tolerance: 0 ``` -Consider the previous scenario where the ratio of current to desired metric -values is \\(84/80\\), a 5% increase. With the default 10% scale-up tolerance, -no scaling occurs. However, with the HPA configured as shown, featuring a 0% -scale-up tolerance, the 5% increase triggers scaling. - ## I want all the details! Get all the technical details by reading From 3408bccabeedd1e248bc8b81863258c0561e7328 Mon Sep 17 00:00:00 2001 From: jm-franc Date: Thu, 20 Mar 2025 10:58:53 -0400 Subject: [PATCH 7/7] Update metadata to align with other articles Following review. Co-authored-by: Graziano Casto --- .../en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md b/content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md index 48cc92c070..10ec1630f9 100644 --- a/content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md +++ b/content/en/blog/_posts/XXXX-XX-XX-hpa-configurable-tolerance.md @@ -1,7 +1,7 @@ --- layout: blog -title: "Kubernetes v1.33 and Horizontal Autoscaling Configurable Tolerance" -slug: hpa-configurable-tolerance +title: "Kubernetes v1.33: HorizontalPodAutoscaler Configurable Tolerance" +slug: kubernetes-1-33-hpa-configurable-tolerance # after the v1.33 release, set a future publication date and remove the draft marker # the release comms team can confirm which date has been assigned #