From 4afba1c60990420016905a930ba9f7156c1b71a8 Mon Sep 17 00:00:00 2001 From: Kevin Hannon Date: Mon, 31 Jul 2023 14:15:34 -0400 Subject: [PATCH] add suggestions --- .../concepts/workloads/controllers/job.md | 35 ++++++++++--------- .../feature-gates.md | 2 +- 2 files changed, 20 insertions(+), 17 deletions(-) diff --git a/content/en/docs/concepts/workloads/controllers/job.md b/content/en/docs/concepts/workloads/controllers/job.md index 2075380420..eb303741cd 100644 --- a/content/en/docs/concepts/workloads/controllers/job.md +++ b/content/en/docs/concepts/workloads/controllers/job.md @@ -458,9 +458,9 @@ ensures that deleted pods have their finalizers removed by the Job controller. {{< /note >}} {{< note >}} -Since Kubernetes v1.28, when pod failure policy is used, the Job controller recreates -terminating pods only once they reach the terminal `Failed` phase. This behavior is analogous -to when using `podRecreationPolicy: Failed`, see [pod replacement policy](#pod-replacement-policy) for more details. +Starting with Kubernetes v1.28, when Pod failure policy is used, the Job controller recreates +terminating Pods only once these Pods reach the terminal `Failed` phase. This behavior is similar +to `podRecreationPolicy: Failed`. For more information, see [Pod replacement policy](#pod-replacement-policy). {{< /note >}} ## Job termination and cleanup @@ -873,7 +873,7 @@ is disabled, `.spec.completions` is immutable. Use cases for elastic Indexed Jobs include batch workloads which require scaling an indexed Job, such as MPI, Horovord, Ray, and PyTorch training jobs. -### Pod Replacement Policy +### Delayed creation of replacement pods {{< feature-state for_k8s_version="v1.28" state="alpha" >}} @@ -881,12 +881,19 @@ scaling an indexed Job, such as MPI, Horovord, Ray, and PyTorch training jobs. You can only set `podReplacementPolicy` on Jobs if you enable the `JobPodReplacementPolicy` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/). {{< /note >}} -By default, the Job controller recreates pods as soon they are either failed or terminating (have a deletion timestamp). -This means that, at a given time, the number of running Pods for the Jobs can be greater than `parallelism` or, if using Indexed Jobs, more than one running Pod per index, if some of the Pods are terminating. +By default, the Job controller recreates Pods as soon they either fail or are terminating (have a deletion timestamp). +This means that, at a given time, when some of the Pods are terminating, the number of running Pods for the Jobs can be greater than `parallelism` or greater than one Pod per index (if using Indexed Jobs). -You may choose to create replacement pods only when the terminating pod is fully terminal (has `status.phase: Failed`). To do this, set the `.spec.podReplacementPolicy: Failed`. -This will only recreate pods once they are terminated. -The default policy is `FailedOrTerminating`, meaning that the control plane creates replacement Pods upon deletion (`DeletionTimestamp != nil`). +You may choose to create replacement Pods only when the terminating Pod is fully terminal (has `status.phase: Failed`). To do this, set the `.spec.podReplacementPolicy: Failed`. +This will only recreate Pods once they are terminated. +The default replacement policy depends on whether the Job has a `podFailurePolicy` set. +With no Pod failure policy defined for a Job, omitting the `podReplacementPolicy` field selects the +`FailedOrTerminating` replacement policy: +the control plane creates replacement Pods immediately upon Pod deletion +(as soon as the control plane sees that a Pod for this Job has `deletionTimestamp` set). +For Jobs with a Pod failure policy set, the default `podReplacementPolicy` is `Failed`, and no other +value is permitted. +See [Pod failure policy](#pod-failure-policy) to learn more about Pod failure policies for Jobs. ```yaml kind: Job @@ -898,8 +905,8 @@ spec: ... ``` -You can inspect a new field in the JobStatus called `terminating`. -This will report the number pods that are currently terminating and is easily viewable in the status. +Provided your cluster has the feature gate enabled, you can inspect the `.status.terminating` field of a Job. +The value of the field is the number of Pods owned by the Job that are currently terminating. ```shell kubectl get jobs/myjob -o yaml @@ -910,13 +917,9 @@ apiVersion: batch/v1 kind: Job # .metadata and .spec omitted status: - terminating: 1 # if pod is terminating + terminating: 3 # three Pods are terminating and have not yet reached the Failed phase ``` -When you use a [podFailurePolicy](#pod-failure-policy) in a Job, the Job will have a default `podReplacementPolicy` value of `Failed`, and this is the only policy allowed. -If `JobPodReplacementPolicy` is disabled and `podFailurePolicy` is enabled, a Job will wait for terminating pods to be fully terminated before marking the pod as failed. -In this case, you will not be able to inspect the `terminating` field. - ## Alternatives ### Bare Pods diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index a2ced7aca1..cf720e4070 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -552,7 +552,7 @@ Each feature gate is designed for enabling/disabling a specific feature: the pod template of [Job](/docs/concepts/workloads/controllers/job). - `JobPodFailurePolicy`: Allow users to specify handling of pod failures based on container exit codes and pod conditions. -- `JobPodReplacementPolicy`: Allows users to specify pod replacement for terminating pods in a [Job](/docs/concepts/workloads/controllers/job) +- `JobPodReplacementPolicy`: Allows you to specify pod replacement for terminating pods in a [Job](/docs/concepts/workloads/controllers/job) - `JobReadyPods`: Enables tracking the number of Pods that have a `Ready` [condition](/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions). The count of `Ready` pods is recorded in the