Merge pull request #37242 from mimowo/retriable-pod-failures-beta

Promote "Retriable and non-retriable pod failures for Jobs" to Beta
pull/37986/head
Kubernetes Prow Robot 2022-11-18 12:08:31 -08:00 committed by GitHub
commit bfe7bd6380
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 22 additions and 16 deletions

View File

@ -290,6 +290,10 @@ starts a new Pod. This means that your application needs to handle the case whe
pod. In particular, it needs to handle temporary files, locks, incomplete output and the like pod. In particular, it needs to handle temporary files, locks, incomplete output and the like
caused by previous runs. caused by previous runs.
By default, each pod failure is counted towards the `.spec.backoffLimit` limit,
see [pod backoff failure policy](#pod-backoff-failure-policy). However, you can
customize handling of pod failures by setting the Job's [pod failure policy](#pod-failure-policy).
Note that even if you specify `.spec.parallelism = 1` and `.spec.completions = 1` and Note that even if you specify `.spec.parallelism = 1` and `.spec.completions = 1` and
`.spec.template.spec.restartPolicy = "Never"`, the same program may `.spec.template.spec.restartPolicy = "Never"`, the same program may
sometimes be started twice. sometimes be started twice.
@ -694,7 +698,7 @@ mismatch.
### Pod failure policy {#pod-failure-policy} ### Pod failure policy {#pod-failure-policy}
{{< feature-state for_k8s_version="v1.25" state="alpha" >}} {{< feature-state for_k8s_version="v1.26" state="beta" >}}
{{< note >}} {{< note >}}
You can only configure a Pod failure policy for a Job if you have the You can only configure a Pod failure policy for a Job if you have the
@ -703,7 +707,7 @@ enabled in your cluster. Additionally, it is recommended
to enable the `PodDisruptionConditions` feature gate in order to be able to detect and handle to enable the `PodDisruptionConditions` feature gate in order to be able to detect and handle
Pod disruption conditions in the Pod failure policy (see also: Pod disruption conditions in the Pod failure policy (see also:
[Pod disruption conditions](/docs/concepts/workloads/pods/disruptions#pod-disruption-conditions)). Both feature gates are [Pod disruption conditions](/docs/concepts/workloads/pods/disruptions#pod-disruption-conditions)). Both feature gates are
available in Kubernetes v1.25. available in Kubernetes {{< skew currentVersion >}}.
{{< /note >}} {{< /note >}}
A Pod failure policy, defined with the `.spec.podFailurePolicy` field, enables A Pod failure policy, defined with the `.spec.podFailurePolicy` field, enables

View File

@ -229,12 +229,17 @@ can happen, according to:
## Pod disruption conditions {#pod-disruption-conditions} ## Pod disruption conditions {#pod-disruption-conditions}
{{< feature-state for_k8s_version="v1.25" state="alpha" >}} {{< feature-state for_k8s_version="v1.26" state="beta" >}}
{{< note >}} {{< note >}}
In order to use this behavior, you must enable the `PodDisruptionConditions` If you are using an older version of Kubernetes than {{< skew currentVersion >}}
please refer to the corresponding version of the documentation.
{{< /note >}}
{{< note >}}
In order to use this behavior, you must have the `PodDisruptionConditions`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
in your cluster. enabled in your cluster.
{{< /note >}} {{< /note >}}
When enabled, a dedicated Pod `DisruptionTarget` [condition](/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions) is added to indicate When enabled, a dedicated Pod `DisruptionTarget` [condition](/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions) is added to indicate
@ -254,6 +259,9 @@ indicates one of the following reasons for the Pod termination:
`DeletionByPodGC` `DeletionByPodGC`
: Pod, that is bound to a no longer existing Node, is due to be deleted by [Pod garbage collection](/docs/concepts/workloads/pods/pod-lifecycle/#pod-garbage-collection). : Pod, that is bound to a no longer existing Node, is due to be deleted by [Pod garbage collection](/docs/concepts/workloads/pods/pod-lifecycle/#pod-garbage-collection).
`TerminationByKubelet`
: Pod has been terminated by the kubelet, because of either {{<glossary_tooltip term_id="node-pressure-eviction" text="node pressure eviction">}} or the [graceful node shutdown](/docs/concepts/architecture/nodes/#graceful-node-shutdown).
{{< note >}} {{< note >}}
A Pod disruption might be interrupted. The control plane might re-attempt to A Pod disruption might be interrupted. The control plane might re-attempt to
continue the disruption of the same Pod, but it is not guaranteed. As a result, continue the disruption of the same Pod, but it is not guaranteed. As a result,

View File

@ -112,7 +112,8 @@ For a reference to old feature gates that are removed, please refer to
| `InTreePluginvSphereUnregister` | `false` | Alpha | 1.21 | | | `InTreePluginvSphereUnregister` | `false` | Alpha | 1.21 | |
| `IPTablesOwnershipCleanup` | `false` | Alpha | 1.25 | | | `IPTablesOwnershipCleanup` | `false` | Alpha | 1.25 | |
| `JobMutableNodeSchedulingDirectives` | `true` | Beta | 1.23 | | | `JobMutableNodeSchedulingDirectives` | `true` | Beta | 1.23 | |
| `JobPodFailurePolicy` | `false` | Alpha | 1.25 | - | | `JobPodFailurePolicy` | `false` | Alpha | 1.25 | 1.25 |
| `JobPodFailurePolicy` | `true` | Beta | 1.26 | |
| `JobReadyPods` | `false` | Alpha | 1.23 | 1.23 | | `JobReadyPods` | `false` | Alpha | 1.23 | 1.23 |
| `JobReadyPods` | `true` | Beta | 1.24 | | | `JobReadyPods` | `true` | Beta | 1.24 | |
| `KubeletCredentialProviders` | `false` | Alpha | 1.20 | 1.23 | | `KubeletCredentialProviders` | `false` | Alpha | 1.20 | 1.23 |
@ -150,7 +151,8 @@ For a reference to old feature gates that are removed, please refer to
| `PodAndContainerStatsFromCRI` | `false` | Alpha | 1.23 | | | `PodAndContainerStatsFromCRI` | `false` | Alpha | 1.23 | |
| `PodDeletionCost` | `false` | Alpha | 1.21 | 1.21 | | `PodDeletionCost` | `false` | Alpha | 1.21 | 1.21 |
| `PodDeletionCost` | `true` | Beta | 1.22 | | | `PodDeletionCost` | `true` | Beta | 1.22 | |
| `PodDisruptionConditions` | `false` | Alpha | 1.25 | - | | `PodDisruptionConditions` | `false` | Alpha | 1.25 | 1.25 |
| `PodDisruptionConditions` | `true` | Beta | 1.26 | |
| `PodHasNetworkCondition` | `false` | Alpha | 1.25 | | | `PodHasNetworkCondition` | `false` | Alpha | 1.25 | |
| `ProbeTerminationGracePeriod` | `false` | Alpha | 1.21 | 1.21 | | `ProbeTerminationGracePeriod` | `false` | Alpha | 1.21 | 1.21 |
| `ProbeTerminationGracePeriod` | `false` | Beta | 1.22 | 1.24 | | `ProbeTerminationGracePeriod` | `false` | Beta | 1.22 | 1.24 |

View File

@ -5,7 +5,7 @@ min-kubernetes-server-version: v1.25
weight: 60 weight: 60
--- ---
{{< feature-state for_k8s_version="v1.25" state="alpha" >}} {{< feature-state for_k8s_version="v1.26" state="beta" >}}
<!-- overview --> <!-- overview -->
@ -28,14 +28,6 @@ You should already be familiar with the basic use of [Job](/docs/concepts/worklo
{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}} {{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}
<!-- steps -->
{{< note >}}
As the features are in Alpha, prepare the Kubernetes cluster with the two
[feature gates](/docs/reference/command-line-tools-reference/feature-gates/)
enabled: `JobPodFailurePolicy` and `PodDisruptionConditions`.
{{< /note >}}
## Using Pod failure policy to avoid unnecessary Pod retries ## Using Pod failure policy to avoid unnecessary Pod retries
With the following example, you can learn how to use Pod failure policy to With the following example, you can learn how to use Pod failure policy to