Merge pull request #37242 from mimowo/retriable-pod-failures-beta
Promote "Retriable and non-retriable pod failures for Jobs" to Betapull/37986/head
commit
bfe7bd6380
|
@ -290,6 +290,10 @@ starts a new Pod. This means that your application needs to handle the case whe
|
|||
pod. In particular, it needs to handle temporary files, locks, incomplete output and the like
|
||||
caused by previous runs.
|
||||
|
||||
By default, each pod failure is counted towards the `.spec.backoffLimit` limit,
|
||||
see [pod backoff failure policy](#pod-backoff-failure-policy). However, you can
|
||||
customize handling of pod failures by setting the Job's [pod failure policy](#pod-failure-policy).
|
||||
|
||||
Note that even if you specify `.spec.parallelism = 1` and `.spec.completions = 1` and
|
||||
`.spec.template.spec.restartPolicy = "Never"`, the same program may
|
||||
sometimes be started twice.
|
||||
|
@ -694,7 +698,7 @@ mismatch.
|
|||
|
||||
### Pod failure policy {#pod-failure-policy}
|
||||
|
||||
{{< feature-state for_k8s_version="v1.25" state="alpha" >}}
|
||||
{{< feature-state for_k8s_version="v1.26" state="beta" >}}
|
||||
|
||||
{{< note >}}
|
||||
You can only configure a Pod failure policy for a Job if you have the
|
||||
|
@ -703,7 +707,7 @@ enabled in your cluster. Additionally, it is recommended
|
|||
to enable the `PodDisruptionConditions` feature gate in order to be able to detect and handle
|
||||
Pod disruption conditions in the Pod failure policy (see also:
|
||||
[Pod disruption conditions](/docs/concepts/workloads/pods/disruptions#pod-disruption-conditions)). Both feature gates are
|
||||
available in Kubernetes v1.25.
|
||||
available in Kubernetes {{< skew currentVersion >}}.
|
||||
{{< /note >}}
|
||||
|
||||
A Pod failure policy, defined with the `.spec.podFailurePolicy` field, enables
|
||||
|
|
|
@ -229,12 +229,17 @@ can happen, according to:
|
|||
|
||||
## Pod disruption conditions {#pod-disruption-conditions}
|
||||
|
||||
{{< feature-state for_k8s_version="v1.25" state="alpha" >}}
|
||||
{{< feature-state for_k8s_version="v1.26" state="beta" >}}
|
||||
|
||||
{{< note >}}
|
||||
In order to use this behavior, you must enable the `PodDisruptionConditions`
|
||||
If you are using an older version of Kubernetes than {{< skew currentVersion >}}
|
||||
please refer to the corresponding version of the documentation.
|
||||
{{< /note >}}
|
||||
|
||||
{{< note >}}
|
||||
In order to use this behavior, you must have the `PodDisruptionConditions`
|
||||
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
in your cluster.
|
||||
enabled in your cluster.
|
||||
{{< /note >}}
|
||||
|
||||
When enabled, a dedicated Pod `DisruptionTarget` [condition](/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions) is added to indicate
|
||||
|
@ -254,6 +259,9 @@ indicates one of the following reasons for the Pod termination:
|
|||
`DeletionByPodGC`
|
||||
: Pod, that is bound to a no longer existing Node, is due to be deleted by [Pod garbage collection](/docs/concepts/workloads/pods/pod-lifecycle/#pod-garbage-collection).
|
||||
|
||||
`TerminationByKubelet`
|
||||
: Pod has been terminated by the kubelet, because of either {{<glossary_tooltip term_id="node-pressure-eviction" text="node pressure eviction">}} or the [graceful node shutdown](/docs/concepts/architecture/nodes/#graceful-node-shutdown).
|
||||
|
||||
{{< note >}}
|
||||
A Pod disruption might be interrupted. The control plane might re-attempt to
|
||||
continue the disruption of the same Pod, but it is not guaranteed. As a result,
|
||||
|
|
|
@ -112,7 +112,8 @@ For a reference to old feature gates that are removed, please refer to
|
|||
| `InTreePluginvSphereUnregister` | `false` | Alpha | 1.21 | |
|
||||
| `IPTablesOwnershipCleanup` | `false` | Alpha | 1.25 | |
|
||||
| `JobMutableNodeSchedulingDirectives` | `true` | Beta | 1.23 | |
|
||||
| `JobPodFailurePolicy` | `false` | Alpha | 1.25 | - |
|
||||
| `JobPodFailurePolicy` | `false` | Alpha | 1.25 | 1.25 |
|
||||
| `JobPodFailurePolicy` | `true` | Beta | 1.26 | |
|
||||
| `JobReadyPods` | `false` | Alpha | 1.23 | 1.23 |
|
||||
| `JobReadyPods` | `true` | Beta | 1.24 | |
|
||||
| `KubeletCredentialProviders` | `false` | Alpha | 1.20 | 1.23 |
|
||||
|
@ -150,7 +151,8 @@ For a reference to old feature gates that are removed, please refer to
|
|||
| `PodAndContainerStatsFromCRI` | `false` | Alpha | 1.23 | |
|
||||
| `PodDeletionCost` | `false` | Alpha | 1.21 | 1.21 |
|
||||
| `PodDeletionCost` | `true` | Beta | 1.22 | |
|
||||
| `PodDisruptionConditions` | `false` | Alpha | 1.25 | - |
|
||||
| `PodDisruptionConditions` | `false` | Alpha | 1.25 | 1.25 |
|
||||
| `PodDisruptionConditions` | `true` | Beta | 1.26 | |
|
||||
| `PodHasNetworkCondition` | `false` | Alpha | 1.25 | |
|
||||
| `ProbeTerminationGracePeriod` | `false` | Alpha | 1.21 | 1.21 |
|
||||
| `ProbeTerminationGracePeriod` | `false` | Beta | 1.22 | 1.24 |
|
||||
|
|
|
@ -5,7 +5,7 @@ min-kubernetes-server-version: v1.25
|
|||
weight: 60
|
||||
---
|
||||
|
||||
{{< feature-state for_k8s_version="v1.25" state="alpha" >}}
|
||||
{{< feature-state for_k8s_version="v1.26" state="beta" >}}
|
||||
|
||||
<!-- overview -->
|
||||
|
||||
|
@ -28,14 +28,6 @@ You should already be familiar with the basic use of [Job](/docs/concepts/worklo
|
|||
|
||||
{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}
|
||||
|
||||
<!-- steps -->
|
||||
|
||||
{{< note >}}
|
||||
As the features are in Alpha, prepare the Kubernetes cluster with the two
|
||||
[feature gates](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
enabled: `JobPodFailurePolicy` and `PodDisruptionConditions`.
|
||||
{{< /note >}}
|
||||
|
||||
## Using Pod failure policy to avoid unnecessary Pod retries
|
||||
|
||||
With the following example, you can learn how to use Pod failure policy to
|
||||
|
|
Loading…
Reference in New Issue