add suggestions
parent
cd0de2832a
commit
4afba1c609
|
@ -458,9 +458,9 @@ ensures that deleted pods have their finalizers removed by the Job controller.
|
|||
{{< /note >}}
|
||||
|
||||
{{< note >}}
|
||||
Since Kubernetes v1.28, when pod failure policy is used, the Job controller recreates
|
||||
terminating pods only once they reach the terminal `Failed` phase. This behavior is analogous
|
||||
to when using `podRecreationPolicy: Failed`, see [pod replacement policy](#pod-replacement-policy) for more details.
|
||||
Starting with Kubernetes v1.28, when Pod failure policy is used, the Job controller recreates
|
||||
terminating Pods only once these Pods reach the terminal `Failed` phase. This behavior is similar
|
||||
to `podRecreationPolicy: Failed`. For more information, see [Pod replacement policy](#pod-replacement-policy).
|
||||
{{< /note >}}
|
||||
|
||||
## Job termination and cleanup
|
||||
|
@ -873,7 +873,7 @@ is disabled, `.spec.completions` is immutable.
|
|||
Use cases for elastic Indexed Jobs include batch workloads which require
|
||||
scaling an indexed Job, such as MPI, Horovord, Ray, and PyTorch training jobs.
|
||||
|
||||
### Pod Replacement Policy
|
||||
### Delayed creation of replacement pods
|
||||
|
||||
{{< feature-state for_k8s_version="v1.28" state="alpha" >}}
|
||||
|
||||
|
@ -881,12 +881,19 @@ scaling an indexed Job, such as MPI, Horovord, Ray, and PyTorch training jobs.
|
|||
You can only set `podReplacementPolicy` on Jobs if you enable the `JobPodReplacementPolicy` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/).
|
||||
{{< /note >}}
|
||||
|
||||
By default, the Job controller recreates pods as soon they are either failed or terminating (have a deletion timestamp).
|
||||
This means that, at a given time, the number of running Pods for the Jobs can be greater than `parallelism` or, if using Indexed Jobs, more than one running Pod per index, if some of the Pods are terminating.
|
||||
By default, the Job controller recreates Pods as soon they either fail or are terminating (have a deletion timestamp).
|
||||
This means that, at a given time, when some of the Pods are terminating, the number of running Pods for the Jobs can be greater than `parallelism` or greater than one Pod per index (if using Indexed Jobs).
|
||||
|
||||
You may choose to create replacement pods only when the terminating pod is fully terminal (has `status.phase: Failed`). To do this, set the `.spec.podReplacementPolicy: Failed`.
|
||||
This will only recreate pods once they are terminated.
|
||||
The default policy is `FailedOrTerminating`, meaning that the control plane creates replacement Pods upon deletion (`DeletionTimestamp != nil`).
|
||||
You may choose to create replacement Pods only when the terminating Pod is fully terminal (has `status.phase: Failed`). To do this, set the `.spec.podReplacementPolicy: Failed`.
|
||||
This will only recreate Pods once they are terminated.
|
||||
The default replacement policy depends on whether the Job has a `podFailurePolicy` set.
|
||||
With no Pod failure policy defined for a Job, omitting the `podReplacementPolicy` field selects the
|
||||
`FailedOrTerminating` replacement policy:
|
||||
the control plane creates replacement Pods immediately upon Pod deletion
|
||||
(as soon as the control plane sees that a Pod for this Job has `deletionTimestamp` set).
|
||||
For Jobs with a Pod failure policy set, the default `podReplacementPolicy` is `Failed`, and no other
|
||||
value is permitted.
|
||||
See [Pod failure policy](#pod-failure-policy) to learn more about Pod failure policies for Jobs.
|
||||
|
||||
```yaml
|
||||
kind: Job
|
||||
|
@ -898,8 +905,8 @@ spec:
|
|||
...
|
||||
```
|
||||
|
||||
You can inspect a new field in the JobStatus called `terminating`.
|
||||
This will report the number pods that are currently terminating and is easily viewable in the status.
|
||||
Provided your cluster has the feature gate enabled, you can inspect the `.status.terminating` field of a Job.
|
||||
The value of the field is the number of Pods owned by the Job that are currently terminating.
|
||||
|
||||
```shell
|
||||
kubectl get jobs/myjob -o yaml
|
||||
|
@ -910,13 +917,9 @@ apiVersion: batch/v1
|
|||
kind: Job
|
||||
# .metadata and .spec omitted
|
||||
status:
|
||||
terminating: 1 # if pod is terminating
|
||||
terminating: 3 # three Pods are terminating and have not yet reached the Failed phase
|
||||
```
|
||||
|
||||
When you use a [podFailurePolicy](#pod-failure-policy) in a Job, the Job will have a default `podReplacementPolicy` value of `Failed`, and this is the only policy allowed.
|
||||
If `JobPodReplacementPolicy` is disabled and `podFailurePolicy` is enabled, a Job will wait for terminating pods to be fully terminated before marking the pod as failed.
|
||||
In this case, you will not be able to inspect the `terminating` field.
|
||||
|
||||
## Alternatives
|
||||
|
||||
### Bare Pods
|
||||
|
|
|
@ -552,7 +552,7 @@ Each feature gate is designed for enabling/disabling a specific feature:
|
|||
the pod template of [Job](/docs/concepts/workloads/controllers/job).
|
||||
- `JobPodFailurePolicy`: Allow users to specify handling of pod failures based on container
|
||||
exit codes and pod conditions.
|
||||
- `JobPodReplacementPolicy`: Allows users to specify pod replacement for terminating pods in a [Job](/docs/concepts/workloads/controllers/job)
|
||||
- `JobPodReplacementPolicy`: Allows you to specify pod replacement for terminating pods in a [Job](/docs/concepts/workloads/controllers/job)
|
||||
- `JobReadyPods`: Enables tracking the number of Pods that have a `Ready`
|
||||
[condition](/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions).
|
||||
The count of `Ready` pods is recorded in the
|
||||
|
|
Loading…
Reference in New Issue