Merge pull request #37242 from mimowo/retriable-pod-failures-beta

Promote "Retriable and non-retriable pod failures for Jobs" to Beta
2022-11-18 12:08:31 -08:00 · 2022-11-18 12:08:31 -08:00 · bfe7bd6380
parent b8fc810198 1e4a160b0d
commit bfe7bd6380
4 changed files with 22 additions and 16 deletions
--- a/content/en/docs/concepts/workloads/controllers/job.md
+++ b/content/en/docs/concepts/workloads/controllers/job.md
@ -290,6 +290,10 @@ starts a new Pod.  This means that your application needs to handle the case whe
 pod.  In particular, it needs to handle temporary files, locks, incomplete output and the like
 caused by previous runs.

+By default, each pod failure is counted towards the `.spec.backoffLimit` limit,
+see [pod backoff failure policy](#pod-backoff-failure-policy). However, you can
+customize handling of pod failures by setting the Job's [pod failure policy](#pod-failure-policy).
+
 Note that even if you specify `.spec.parallelism = 1` and `.spec.completions = 1` and
 `.spec.template.spec.restartPolicy = "Never"`, the same program may
 sometimes be started twice.
@ -694,7 +698,7 @@ mismatch.

 ### Pod failure policy {#pod-failure-policy}

-{{< feature-state for_k8s_version="v1.25" state="alpha" >}}
+{{< feature-state for_k8s_version="v1.26" state="beta" >}}

 {{< note >}}
 You can only configure a Pod failure policy for a Job if you have the
@ -703,7 +707,7 @@ enabled in your cluster. Additionally, it is recommended
 to enable the `PodDisruptionConditions` feature gate in order to be able to detect and handle
 Pod disruption conditions in the Pod failure policy (see also:
 [Pod disruption conditions](/docs/concepts/workloads/pods/disruptions#pod-disruption-conditions)). Both feature gates are
-available in Kubernetes v1.25.
+available in Kubernetes {{< skew currentVersion >}}.
 {{< /note >}}

 A Pod failure policy, defined with the `.spec.podFailurePolicy` field, enables
--- a/content/en/docs/concepts/workloads/pods/disruptions.md
+++ b/content/en/docs/concepts/workloads/pods/disruptions.md
@ -229,12 +229,17 @@ can happen, according to:

 ## Pod disruption conditions {#pod-disruption-conditions}

-{{< feature-state for_k8s_version="v1.25" state="alpha" >}}
+{{< feature-state for_k8s_version="v1.26" state="beta" >}}

 {{< note >}}
-In order to use this behavior, you must enable the `PodDisruptionConditions`
+If you are using an older version of Kubernetes than {{< skew currentVersion >}}
+please refer to the corresponding version of the documentation.
+{{< /note >}}
+
+{{< note >}}
+In order to use this behavior, you must have the `PodDisruptionConditions`
 [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
-in your cluster.
+enabled in your cluster.
 {{< /note >}}

 When enabled, a dedicated Pod `DisruptionTarget` [condition](/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions) is added to indicate
@ -254,6 +259,9 @@ indicates one of the following reasons for the Pod termination:
 `DeletionByPodGC`
 : Pod, that is bound to a no longer existing Node, is due to be deleted by [Pod garbage collection](/docs/concepts/workloads/pods/pod-lifecycle/#pod-garbage-collection).

+`TerminationByKubelet`
+: Pod has been terminated by the kubelet, because of either {{<glossary_tooltip term_id="node-pressure-eviction" text="node pressure eviction">}} or the [graceful node shutdown](/docs/concepts/architecture/nodes/#graceful-node-shutdown).
+
 {{< note >}}
 A Pod disruption might be interrupted. The control plane might re-attempt to
 continue the disruption of the same Pod, but it is not guaranteed. As a result,
--- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md
+++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md
@ -112,7 +112,8 @@ For a reference to old feature gates that are removed, please refer to
 | `InTreePluginvSphereUnregister` | `false` | Alpha | 1.21 | |
 | `IPTablesOwnershipCleanup` | `false` | Alpha | 1.25 | |
 | `JobMutableNodeSchedulingDirectives` | `true` | Beta | 1.23 | |
-| `JobPodFailurePolicy` | `false` | Alpha | 1.25 | - |
+| `JobPodFailurePolicy` | `false` | Alpha | 1.25 | 1.25 |
+| `JobPodFailurePolicy` | `true` | Beta | 1.26 | |
 | `JobReadyPods` | `false` | Alpha | 1.23 | 1.23 |
 | `JobReadyPods` | `true` | Beta | 1.24 | |
 | `KubeletCredentialProviders` | `false` | Alpha | 1.20 | 1.23 |
@ -150,7 +151,8 @@ For a reference to old feature gates that are removed, please refer to
 | `PodAndContainerStatsFromCRI` | `false` | Alpha | 1.23 | |
 | `PodDeletionCost` | `false` | Alpha | 1.21 | 1.21 |
 | `PodDeletionCost` | `true` | Beta | 1.22 | |
-| `PodDisruptionConditions` | `false` | Alpha | 1.25 | - |
+| `PodDisruptionConditions` | `false` | Alpha | 1.25 | 1.25 |
+| `PodDisruptionConditions` | `true` | Beta | 1.26 | |
 | `PodHasNetworkCondition` | `false` | Alpha | 1.25 | |
 | `ProbeTerminationGracePeriod` | `false` | Alpha | 1.21 | 1.21 |
 | `ProbeTerminationGracePeriod` | `false` | Beta | 1.22 | 1.24 |
--- a/content/en/docs/tasks/job/pod-failure-policy.md
+++ b/content/en/docs/tasks/job/pod-failure-policy.md
@ -5,7 +5,7 @@ min-kubernetes-server-version: v1.25
 weight: 60
 ---

-{{< feature-state for_k8s_version="v1.25" state="alpha" >}}
+{{< feature-state for_k8s_version="v1.26" state="beta" >}}

 <!-- overview -->

@ -28,14 +28,6 @@ You should already be familiar with the basic use of [Job](/docs/concepts/worklo

 {{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}

-<!-- steps -->
-
-{{< note >}}
-As the features are in Alpha, prepare the Kubernetes cluster with the two
-[feature gates](/docs/reference/command-line-tools-reference/feature-gates/)
-enabled: `JobPodFailurePolicy` and `PodDisruptionConditions`.
-{{< /note >}}
-
 ## Using Pod failure policy to avoid unnecessary Pod retries

 With the following example, you can learn how to use Pod failure policy to