Merge pull request #37242 from mimowo/retriable-pod-failures-beta
Promote "Retriable and non-retriable pod failures for Jobs" to Betapull/37986/head
commit
bfe7bd6380
|
@ -290,6 +290,10 @@ starts a new Pod. This means that your application needs to handle the case whe
|
||||||
pod. In particular, it needs to handle temporary files, locks, incomplete output and the like
|
pod. In particular, it needs to handle temporary files, locks, incomplete output and the like
|
||||||
caused by previous runs.
|
caused by previous runs.
|
||||||
|
|
||||||
|
By default, each pod failure is counted towards the `.spec.backoffLimit` limit,
|
||||||
|
see [pod backoff failure policy](#pod-backoff-failure-policy). However, you can
|
||||||
|
customize handling of pod failures by setting the Job's [pod failure policy](#pod-failure-policy).
|
||||||
|
|
||||||
Note that even if you specify `.spec.parallelism = 1` and `.spec.completions = 1` and
|
Note that even if you specify `.spec.parallelism = 1` and `.spec.completions = 1` and
|
||||||
`.spec.template.spec.restartPolicy = "Never"`, the same program may
|
`.spec.template.spec.restartPolicy = "Never"`, the same program may
|
||||||
sometimes be started twice.
|
sometimes be started twice.
|
||||||
|
@ -694,7 +698,7 @@ mismatch.
|
||||||
|
|
||||||
### Pod failure policy {#pod-failure-policy}
|
### Pod failure policy {#pod-failure-policy}
|
||||||
|
|
||||||
{{< feature-state for_k8s_version="v1.25" state="alpha" >}}
|
{{< feature-state for_k8s_version="v1.26" state="beta" >}}
|
||||||
|
|
||||||
{{< note >}}
|
{{< note >}}
|
||||||
You can only configure a Pod failure policy for a Job if you have the
|
You can only configure a Pod failure policy for a Job if you have the
|
||||||
|
@ -703,7 +707,7 @@ enabled in your cluster. Additionally, it is recommended
|
||||||
to enable the `PodDisruptionConditions` feature gate in order to be able to detect and handle
|
to enable the `PodDisruptionConditions` feature gate in order to be able to detect and handle
|
||||||
Pod disruption conditions in the Pod failure policy (see also:
|
Pod disruption conditions in the Pod failure policy (see also:
|
||||||
[Pod disruption conditions](/docs/concepts/workloads/pods/disruptions#pod-disruption-conditions)). Both feature gates are
|
[Pod disruption conditions](/docs/concepts/workloads/pods/disruptions#pod-disruption-conditions)). Both feature gates are
|
||||||
available in Kubernetes v1.25.
|
available in Kubernetes {{< skew currentVersion >}}.
|
||||||
{{< /note >}}
|
{{< /note >}}
|
||||||
|
|
||||||
A Pod failure policy, defined with the `.spec.podFailurePolicy` field, enables
|
A Pod failure policy, defined with the `.spec.podFailurePolicy` field, enables
|
||||||
|
|
|
@ -229,12 +229,17 @@ can happen, according to:
|
||||||
|
|
||||||
## Pod disruption conditions {#pod-disruption-conditions}
|
## Pod disruption conditions {#pod-disruption-conditions}
|
||||||
|
|
||||||
{{< feature-state for_k8s_version="v1.25" state="alpha" >}}
|
{{< feature-state for_k8s_version="v1.26" state="beta" >}}
|
||||||
|
|
||||||
{{< note >}}
|
{{< note >}}
|
||||||
In order to use this behavior, you must enable the `PodDisruptionConditions`
|
If you are using an older version of Kubernetes than {{< skew currentVersion >}}
|
||||||
|
please refer to the corresponding version of the documentation.
|
||||||
|
{{< /note >}}
|
||||||
|
|
||||||
|
{{< note >}}
|
||||||
|
In order to use this behavior, you must have the `PodDisruptionConditions`
|
||||||
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||||
in your cluster.
|
enabled in your cluster.
|
||||||
{{< /note >}}
|
{{< /note >}}
|
||||||
|
|
||||||
When enabled, a dedicated Pod `DisruptionTarget` [condition](/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions) is added to indicate
|
When enabled, a dedicated Pod `DisruptionTarget` [condition](/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions) is added to indicate
|
||||||
|
@ -254,6 +259,9 @@ indicates one of the following reasons for the Pod termination:
|
||||||
`DeletionByPodGC`
|
`DeletionByPodGC`
|
||||||
: Pod, that is bound to a no longer existing Node, is due to be deleted by [Pod garbage collection](/docs/concepts/workloads/pods/pod-lifecycle/#pod-garbage-collection).
|
: Pod, that is bound to a no longer existing Node, is due to be deleted by [Pod garbage collection](/docs/concepts/workloads/pods/pod-lifecycle/#pod-garbage-collection).
|
||||||
|
|
||||||
|
`TerminationByKubelet`
|
||||||
|
: Pod has been terminated by the kubelet, because of either {{<glossary_tooltip term_id="node-pressure-eviction" text="node pressure eviction">}} or the [graceful node shutdown](/docs/concepts/architecture/nodes/#graceful-node-shutdown).
|
||||||
|
|
||||||
{{< note >}}
|
{{< note >}}
|
||||||
A Pod disruption might be interrupted. The control plane might re-attempt to
|
A Pod disruption might be interrupted. The control plane might re-attempt to
|
||||||
continue the disruption of the same Pod, but it is not guaranteed. As a result,
|
continue the disruption of the same Pod, but it is not guaranteed. As a result,
|
||||||
|
|
|
@ -112,7 +112,8 @@ For a reference to old feature gates that are removed, please refer to
|
||||||
| `InTreePluginvSphereUnregister` | `false` | Alpha | 1.21 | |
|
| `InTreePluginvSphereUnregister` | `false` | Alpha | 1.21 | |
|
||||||
| `IPTablesOwnershipCleanup` | `false` | Alpha | 1.25 | |
|
| `IPTablesOwnershipCleanup` | `false` | Alpha | 1.25 | |
|
||||||
| `JobMutableNodeSchedulingDirectives` | `true` | Beta | 1.23 | |
|
| `JobMutableNodeSchedulingDirectives` | `true` | Beta | 1.23 | |
|
||||||
| `JobPodFailurePolicy` | `false` | Alpha | 1.25 | - |
|
| `JobPodFailurePolicy` | `false` | Alpha | 1.25 | 1.25 |
|
||||||
|
| `JobPodFailurePolicy` | `true` | Beta | 1.26 | |
|
||||||
| `JobReadyPods` | `false` | Alpha | 1.23 | 1.23 |
|
| `JobReadyPods` | `false` | Alpha | 1.23 | 1.23 |
|
||||||
| `JobReadyPods` | `true` | Beta | 1.24 | |
|
| `JobReadyPods` | `true` | Beta | 1.24 | |
|
||||||
| `KubeletCredentialProviders` | `false` | Alpha | 1.20 | 1.23 |
|
| `KubeletCredentialProviders` | `false` | Alpha | 1.20 | 1.23 |
|
||||||
|
@ -150,7 +151,8 @@ For a reference to old feature gates that are removed, please refer to
|
||||||
| `PodAndContainerStatsFromCRI` | `false` | Alpha | 1.23 | |
|
| `PodAndContainerStatsFromCRI` | `false` | Alpha | 1.23 | |
|
||||||
| `PodDeletionCost` | `false` | Alpha | 1.21 | 1.21 |
|
| `PodDeletionCost` | `false` | Alpha | 1.21 | 1.21 |
|
||||||
| `PodDeletionCost` | `true` | Beta | 1.22 | |
|
| `PodDeletionCost` | `true` | Beta | 1.22 | |
|
||||||
| `PodDisruptionConditions` | `false` | Alpha | 1.25 | - |
|
| `PodDisruptionConditions` | `false` | Alpha | 1.25 | 1.25 |
|
||||||
|
| `PodDisruptionConditions` | `true` | Beta | 1.26 | |
|
||||||
| `PodHasNetworkCondition` | `false` | Alpha | 1.25 | |
|
| `PodHasNetworkCondition` | `false` | Alpha | 1.25 | |
|
||||||
| `ProbeTerminationGracePeriod` | `false` | Alpha | 1.21 | 1.21 |
|
| `ProbeTerminationGracePeriod` | `false` | Alpha | 1.21 | 1.21 |
|
||||||
| `ProbeTerminationGracePeriod` | `false` | Beta | 1.22 | 1.24 |
|
| `ProbeTerminationGracePeriod` | `false` | Beta | 1.22 | 1.24 |
|
||||||
|
|
|
@ -5,7 +5,7 @@ min-kubernetes-server-version: v1.25
|
||||||
weight: 60
|
weight: 60
|
||||||
---
|
---
|
||||||
|
|
||||||
{{< feature-state for_k8s_version="v1.25" state="alpha" >}}
|
{{< feature-state for_k8s_version="v1.26" state="beta" >}}
|
||||||
|
|
||||||
<!-- overview -->
|
<!-- overview -->
|
||||||
|
|
||||||
|
@ -28,14 +28,6 @@ You should already be familiar with the basic use of [Job](/docs/concepts/worklo
|
||||||
|
|
||||||
{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}
|
{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}
|
||||||
|
|
||||||
<!-- steps -->
|
|
||||||
|
|
||||||
{{< note >}}
|
|
||||||
As the features are in Alpha, prepare the Kubernetes cluster with the two
|
|
||||||
[feature gates](/docs/reference/command-line-tools-reference/feature-gates/)
|
|
||||||
enabled: `JobPodFailurePolicy` and `PodDisruptionConditions`.
|
|
||||||
{{< /note >}}
|
|
||||||
|
|
||||||
## Using Pod failure policy to avoid unnecessary Pod retries
|
## Using Pod failure policy to avoid unnecessary Pod retries
|
||||||
|
|
||||||
With the following example, you can learn how to use Pod failure policy to
|
With the following example, you can learn how to use Pod failure policy to
|
||||||
|
|
Loading…
Reference in New Issue