Merge pull request #38795 from windsonsea/jobyh

[zh] sync /controllers/job.md
2023-01-08 09:11:30 -08:00 · 2023-01-08 09:11:30 -08:00 · f76ecfa831
parent 2a6cab77d7 950767ed7c
commit f76ecfa831
1 changed files with 221 additions and 102 deletions
--- a/content/zh-cn/docs/concepts/workloads/controllers/job.md
+++ b/content/zh-cn/docs/concepts/workloads/controllers/job.md
@ -9,6 +9,7 @@ weight: 50
 ---
 <!--
 reviewers:
+- alculquicondor
 - erictune
 - soltysh
 title: Jobs
@ -21,6 +22,7 @@ weight: 50
 -->

 <!-- overview -->
+
 <!--
 A Job creates one or more Pods and will continue to retry execution of the Pods until a specified number of them successfully terminate.
 As pods successfully complete, the Job tracks the successful completions.  When a specified number
@ -90,31 +92,23 @@ Check on the status of the Job with `kubectl`:
 -->
 使用 `kubectl` 来检查 Job 的状态：

-```shell
-kubectl describe jobs/pi
-```
-
-<!--
-The output is similar to this:
-->
-输出类似于：
-
-```
-Name:           pi
-Namespace:      default
-Selector:       controller-uid=c9948307-e56d-4b5d-8302-ae2d7b7da67c
-Labels:         controller-uid=c9948307-e56d-4b5d-8302-ae2d7b7da67c
-                job-name=pi
-Annotations:    kubectl.kubernetes.io/last-applied-configuration:
-                  {"apiVersion":"batch/v1","kind":"Job","metadata":{"annotations":{},"name":"pi","namespace":"default"},"spec":{"backoffLimit":4,"template":...
-Parallelism:    1
-Completions:    1
-Start Time:     Mon, 02 Dec 2019 15:20:11 +0200
-Completed At:   Mon, 02 Dec 2019 15:21:16 +0200
-Duration:       65s
-Pods Statuses:  0 Running / 1 Succeeded / 0 Failed
+{{< tabs name="Check status of Job" >}}
+{{< tab name="kubectl describe job pi" codelang="bash" >}}
+Name:             pi
+Namespace:        default
+Selector:         controller-uid=0cd26dd5-88a2-4a5f-a203-ea19a1d5d578
+Labels:           controller-uid=0cd26dd5-88a2-4a5f-a203-ea19a1d5d578
+                  job-name=pi
+Annotations:      batch.kubernetes.io/job-tracking: 
+Parallelism:      1
+Completions:      1
+Completion Mode:  NonIndexed
+Start Time:       Fri, 28 Oct 2022 13:05:18 +0530
+Completed At:     Fri, 28 Oct 2022 13:05:21 +0530
+Duration:         3s
+Pods Statuses:    0 Active / 1 Succeeded / 0 Failed
 Pod Template:
-  Labels:  controller-uid=c9948307-e56d-4b5d-8302-ae2d7b7da67c
+  Labels:  controller-uid=0cd26dd5-88a2-4a5f-a203-ea19a1d5d578
           job-name=pi
  Containers:
   pi:
@ -132,8 +126,66 @@ Pod Template:
 Events:
  Type    Reason            Age   From            Message
  ----    ------            ----  ----            -------
-  Normal  SuccessfulCreate  14m   job-controller  Created pod: pi-5rwd7
-```
+  Normal  SuccessfulCreate  21s   job-controller  Created pod: pi-xf9p4
+  Normal  Completed         18s   job-controller  Job completed
+{{< /tab >}}
+{{< tab name="kubectl get job pi -o yaml" codelang="bash" >}}
+apiVersion: batch/v1
+kind: Job
+metadata:
+  annotations:
+    batch.kubernetes.io/job-tracking: ""
+    kubectl.kubernetes.io/last-applied-configuration: |
+      {"apiVersion":"batch/v1","kind":"Job","metadata":{"annotations":{},"name":"pi","namespace":"default"},"spec":{"backoffLimit":4,"template":{"spec":{"containers":[{"command":["perl","-Mbignum=bpi","-wle","print bpi(2000)"],"image":"perl:5.34.0","name":"pi"}],"restartPolicy":"Never"}}}}
+  creationTimestamp: "2022-11-10T17:53:53Z"
+  generation: 1
+  labels:
+    controller-uid: 204fb678-040b-497f-9266-35ffa8716d14
+    job-name: pi
+  name: pi
+  namespace: default
+  resourceVersion: "4751"
+  uid: 204fb678-040b-497f-9266-35ffa8716d14
+spec:
+  backoffLimit: 4
+  completionMode: NonIndexed
+  completions: 1
+  parallelism: 1
+  selector:
+    matchLabels:
+      controller-uid: 204fb678-040b-497f-9266-35ffa8716d14
+  suspend: false
+  template:
+    metadata:
+      creationTimestamp: null
+      labels:
+        controller-uid: 204fb678-040b-497f-9266-35ffa8716d14
+        job-name: pi
+    spec:
+      containers:
+      - command:
+        - perl
+        - -Mbignum=bpi
+        - -wle
+        - print bpi(2000)
+        image: perl:5.34.0
+        imagePullPolicy: IfNotPresent
+        name: pi
+        resources: {}
+        terminationMessagePath: /dev/termination-log
+        terminationMessagePolicy: File
+      dnsPolicy: ClusterFirst
+      restartPolicy: Never
+      schedulerName: default-scheduler
+      securityContext: {}
+      terminationGracePeriodSeconds: 30
+status:
+  active: 1
+  ready: 0
+  startTime: "2022-11-10T17:53:57Z"
+  uncountedTerminatedPods: {}
+{{< /tab >}}
+{{< /tabs >}}

 <!--
 To view completed Pods of a Job, use `kubectl get pods`.
@ -186,14 +238,27 @@ The output is similar to this:
 ## Writing a Job spec

 As with all other Kubernetes config, a Job needs `apiVersion`, `kind`, and `metadata` fields.
-Its name must be a valid [DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
+
+When the control plane creates new Pods for a Job, the `.metadata.name` of the
+Job is part of the basis for naming those Pods.  The name of a Job must be a valid
+[DNS subdomain](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names)
+value, but this can produce unexpected results for the Pod hostnames.  For best compatibility,
+the name should follow the more restrictive rules for a
+[DNS label](/docs/concepts/overview/working-with-objects/names#dns-label-names).
+Even when the name is a DNS subdomain, the name must be no longer than 63
+characters.

 A Job also needs a [`.spec` section](https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status).
 -->
 ## 编写 Job 规约    {#writing-a-job-spec}

 与 Kubernetes 中其他资源的配置类似，Job 也需要 `apiVersion`、`kind` 和 `metadata` 字段。
-Job 的名字必须是合法的 [DNS 子域名](/zh-cn/docs/concepts/overview/working-with-objects/names#dns-subdomain-names)。
+
+当控制面为 Job 创建新的 Pod 时，Job 的 `.metadata.name` 是命名这些 Pod 的基础组成部分。
+Job 的名字必须是合法的 [DNS 子域名](/zh-cn/docs/concepts/overview/working-with-objects/names#dns-subdomain-names)值，
+但这可能对 Pod 主机名产生意料之外的结果。为了获得最佳兼容性，此名字应遵循更严格的
+[DNS 标签](/zh-cn/docs/concepts/overview/working-with-objects/names#dns-label-names)规则。
+即使该名字被要求遵循 DNS 子域名规则，也不得超过 63 个字符。

 Job 配置还需要一个 [`.spec` 节](https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status)。

@ -367,7 +432,7 @@ Jobs with _fixed completion count_ - that is, jobs that have non null
    the deterministic hostnames to address each other via DNS. For more information about
    how to configure this, see [Job with Pod-to-Pod Communication](/docs/tasks/job/job-with-pod-to-pod-communication/).
  - From the containerized task, in the environment variable `JOB_COMPLETION_INDEX`.
-
+  
  The Job is considered complete when there is one successfully completed Pod
  for each index. For more information about how to use this mode, see
  [Indexed Job for Parallel Processing with Static Work Assignment](/docs/tasks/job/indexed-parallel-processing-static/).
@ -402,7 +467,6 @@ on the node, but the container is re-run.  Therefore, your program needs to hand
 restarted locally, or else specify `.spec.template.spec.restartPolicy = "Never"`.
 See [pod lifecycle](/docs/concepts/workloads/pods/pod-lifecycle/#example-states) for more information on `restartPolicy`.
 -->
-
 ## 处理 Pod 和容器失效    {#handling-pod-and-container-failures}

 Pod 中的容器可能因为多种不同原因失效，例如因为其中的进程退出时返回值非零，
@ -429,6 +493,15 @@ caused by previous runs.
 这意味着，你的应用需要处理在一个新 Pod 中被重启的情况。
 尤其是应用需要处理之前运行所产生的临时文件、锁、不完整的输出等问题。

+<!--
+By default, each pod failure is counted towards the `.spec.backoffLimit` limit,
+see [pod backoff failure policy](#pod-backoff-failure-policy). However, you can
+customize handling of pod failures by setting the Job's [pod failure policy](#pod-failure-policy).
+-->
+默认情况下，每个 Pod 失效都被计入 `.spec.backoffLimit` 限制，
+请参阅 [Pod 回退失效策略](#pod-backoff-failure-policy)。
+但你可以通过设置 Job 的 [Pod 失效策略](#pod-failure-policy)自定义对 Pod 失效的处理方式。
+
 <!--
 Note that even if you specify `.spec.parallelism = 1` and `.spec.completions = 1` and
 `.spec.template.spec.restartPolicy = "Never"`, the same program may
@ -444,6 +517,29 @@ multiple pods running at once.  Therefore, your pods must also be tolerant of co
 那就有可能同时出现多个 Pod 运行的情况。
 为此，你的 Pod 也必须能够处理并发性问题。

+<!--
+When the [feature gates](/docs/reference/command-line-tools-reference/feature-gates/)
+`PodDisruptionConditions` and `JobPodFailurePolicy` are both enabled,
+and the `.spec.podFailurePolicy` field is set, the Job controller does not consider a terminating
+Pod (a pod that has a `.metadata.deletionTimestamp` field set) as a failure until that Pod is
+terminal (its `.status.phase` is `Failed` or `Succeeded`). However, the Job controller
+creates a replacement Pod as soon as the termination becomes apparent. Once the
+pod terminates, the Job controller evaluates `.backoffLimit` and `.podFailurePolicy`
+for the relevant Job, taking this now-terminated Pod into consideration.
+
+If either of these requirements is not satisfied, the Job controller counts
+a terminating Pod as an immediate failure, even if that Pod later terminates
+with `phase: "Succeeded"`.
+-->
+当[特性门控](/zh-cn/docs/reference/command-line-tools-reference/feature-gates/)
+`PodDisruptionConditions` 和 `JobPodFailurePolicy` 都被启用且 `.spec.podFailurePolicy` 字段被设置时，
+Job 控制器不会将终止过程中的 Pod（已设置 `.metadata.deletionTimestamp` 字段的 Pod）视为失效 Pod，
+直到该 Pod 完全终止（其 `.status.phase` 为 `Failed` 或 `Succeeded`）。
+但只要终止变得显而易见，Job 控制器就会创建一个替代的 Pod。一旦 Pod 终止，Job 控制器将把这个刚终止的
+Pod 考虑在内，评估相关 Job 的 `.backoffLimit` 和 `.podFailurePolicy`。
+
+如果不满足任一要求，即使 Pod 稍后以 `phase: "Succeeded"` 终止，Job 控制器也会将此即将终止的 Pod 计为立即失效。
+
 <!--
 ### Pod backoff failure policy

@ -452,7 +548,7 @@ due to a logical error in configuration etc.
 To do so, set `.spec.backoffLimit` to specify the number of retries before
 considering a Job as failed. The back-off limit is set by default to 6. Failed
 Pods associated with the Job are recreated by the Job controller with an
-exponential back-off delay (10s, 20s, 40s ...) capped at six minutes. 
+exponential back-off delay (10s, 20s, 40s ...) capped at six minutes.

 The number of retries is calculated in two ways:
 - The number of Pods with `.status.phase = "Failed"`.
@ -461,10 +557,6 @@ The number of retries is calculated in two ways:

 If either of the calculations reaches the `.spec.backoffLimit`, the Job is
 considered failed.
-
-When the [`JobTrackingWithFinalizers`](#job-tracking-with-finalizers) feature is
-disabled, the number of failed Pods is only based on Pods that are still present
-in the API.
 -->
 ### Pod 回退失效策略    {#pod-backoff-failure-policy}

@ -482,9 +574,6 @@ in the API.

 如果两种方式其中一个的值达到 `.spec.backoffLimit`，则 Job 被判定为失败。

-当 [`JobTrackingWithFinalizers`](#job-tracking-with-finalizers) 特性被禁用时，
-失败的 Pod 数目仅基于 API 中仍然存在的 Pod。
-
 {{< note >}}
 <!--
 If your job has `restartPolicy = "OnFailure"`, keep in mind that your Pod running the Job
@ -508,7 +597,7 @@ allows you to still view the logs of completed pods to check for errors, warning
 The job object also remains after it is completed so that you can view its status.  It is up to the user to delete
 old jobs after noting their status.  Delete the job with `kubectl` (e.g. `kubectl delete jobs/pi` or `kubectl delete -f ./job.yaml`). When you delete the job using `kubectl`, all the pods it created are deleted too.
 -->
-## Job 终止与清理    {#clean-up-finished-jobs-automatically}
+## Job 终止与清理    {#job-termination-and-cleanup}

 Job 完成时不会再创建新的 Pod，不过已有的 Pod [通常](#pod-backoff-failure-policy)也不会被删除。
 保留这些 Pod 使得你可以查看已完成的 Pod 的日志输出，以便检查错误、警告或者其它诊断性输出。
@ -658,6 +747,37 @@ Job `pi-with-ttl` 在结束 100 秒之后，可以成为被自动删除的对象
 如果该字段设置为 `0`，Job 在结束之后立即成为可被自动删除的对象。
 如果该字段没有设置，Job 不会在结束之后被 TTL 控制器自动清除。

+{{< note >}}
+<!--
+It is recommended to set `ttlSecondsAfterFinished` field because unmanaged jobs
+(Jobs that you created directly, and not indirectly through other workload APIs
+such as CronJob) have a default deletion
+policy of `orphanDependents` causing Pods created by an unmanaged Job to be left around
+after that Job is fully deleted.
+Even though the {{< glossary_tooltip text="control plane" term_id="control-plane" >}} eventually
+[garbage collects](/docs/concepts/workloads/pods/pod-lifecycle/#pod-garbage-collection)
+the Pods from a deleted Job after they either fail or complete, sometimes those
+lingering pods may cause cluster performance degradation or in worst case cause the
+cluster to go offline due to this degradation.
+-->
+建议设置 `ttlSecondsAfterFinished` 字段，因为非托管任务
+（是你直接创建的 Job，而不是通过其他工作负载 API（如 CronJob）间接创建的 Job）
+的默认删除策略是 `orphanDependents`，这会导致非托管 Job 创建的 Pod 在该 Job 被完全删除后被保留。
+即使{{< glossary_tooltip text="控制面" term_id="control-plane" >}}最终在 Pod 失效或完成后
+对已删除 Job 中的这些 Pod 执行[垃圾收集](/zh-cn/docs/concepts/workloads/pods/pod-lifecycle/#pod-garbage-collection)操作，
+这些残留的 Pod 有时可能会导致集群性能下降，或者在最坏的情况下会导致集群因这种性能下降而离线。
+
+<!--
+You can use [LimitRanges](/docs/concepts/policy/limit-range/) and
+[ResourceQuotas](/docs/concepts/policy/resource-quotas/) to place a
+cap on the amount of resources that a particular namespace can
+consume.
+-->
+你可以使用 [LimitRange](/zh-cn/docs/concepts/policy/limit-range/) 和
+[ResourceQuota](/zh-cn/docs/concepts/policy/resource-quotas/)，
+设定一个特定名字空间可以消耗的资源上限。
+{{< /note >}}
+
 <!--
 ## Job patterns

@ -711,6 +831,14 @@ The tradeoffs are:
 <!--
 The tradeoffs are summarized here, with columns 2 to 4 corresponding to the above tradeoffs.
 The pattern names are also links to examples and more detailed description.
+
+|                  Pattern                        | Single Job object | Fewer pods than work items? | Use app unmodified? |
+| ----------------------------------------------- |:-----------------:|:---------------------------:|:-------------------:|
+| [Queue with Pod Per Work Item]                  |         ✓         |                             |      sometimes      |
+| [Queue with Variable Pod Count]                 |         ✓         |             ✓               |                     |
+| [Indexed Job with Static Work Assignment]       |         ✓         |                             |          ✓          |
+| [Job Template Expansion]                        |                   |                             |          ✓          |
+| [Job with Pod-to-Pod Communication]             |         ✓         |         sometimes           |      sometimes      | 
 -->
 下面是对这些权衡的汇总，第 2 到 4 列对应上面的权衡比较。
 模式的名称对应了相关示例和更详细描述的链接。
@ -742,6 +870,15 @@ Here, `W` is the number of work items.
 下表显示的是每种模式下 `.spec.parallelism` 和 `.spec.completions` 所需要的设置。
 其中，`W` 表示的是工作条目的个数。

+<!--
+|             Pattern                             | `.spec.completions` |  `.spec.parallelism` |
+| ----------------------------------------------- |:-------------------:|:--------------------:|
+| [Queue with Pod Per Work Item]                  |          W          |        any           |
+| [Queue with Variable Pod Count]                 |         null        |        any           |
+| [Indexed Job with Static Work Assignment]       |          W          |        any           |
+| [Job Template Expansion]                        |          1          |     should be 1      |
+| [Job with Pod-to-Pod Communication]             |          W          |         W            |
+-->
 | 模式  | `.spec.completions` |  `.spec.parallelism` |
 | ----- |:-------------------:|:--------------------:|
 | [每工作条目一 Pod 的队列](/zh-cn/docs/tasks/job/coarse-parallel-processing-work-queue/) | W | 任意值 |
@ -1090,7 +1227,7 @@ mismatch.
 -->
 ### Pod 失效策略 {#pod-failure-policy}

-{{< feature-state for_k8s_version="v1.25" state="alpha" >}}
+{{< feature-state for_k8s_version="v1.26" state="beta" >}}

 {{< note >}}
 <!-- 
@ -1100,14 +1237,14 @@ enabled in your cluster. Additionally, it is recommended
 to enable the `PodDisruptionConditions` feature gate in order to be able to detect and handle
 Pod disruption conditions in the Pod failure policy (see also:
 [Pod disruption conditions](/docs/concepts/workloads/pods/disruptions#pod-disruption-conditions)). Both feature gates are
-available in Kubernetes v1.25. 
+available in Kubernetes {{< skew currentVersion >}}.
 -->
 只有你在集群中启用了
 `JobPodFailurePolicy` [特性门控](/zh-cn/docs/reference/command-line-tools-reference/feature-gates/)，
 你才能为某个 Job 配置 Pod 失效策略。
 此外，建议启用 `PodDisruptionConditions` 特性门控以便在 Pod 失效策略中检测和处理 Pod 干扰状况
 （参考：[Pod 干扰状况](/zh-cn/docs/concepts/workloads/pods/disruptions#pod-disruption-conditions)）。
-这两个特性门控都是在 Kubernetes v1.25 中提供的。
+这两个特性门控都是在 Kubernetes {{< skew currentVersion >}} 中提供的。
 {{< /note >}}

 <!-- 
@ -1134,8 +1271,8 @@ which is based on the Job's `.spec.backoffLimit`. These are some examples of use
  ignore Pod failures caused by disruptions  (such {{< glossary_tooltip text="preemption" term_id="preemption" >}},
  {{< glossary_tooltip text="API-initiated eviction" term_id="api-eviction" >}}
  or {{< glossary_tooltip text="taint" term_id="taint" >}}-based eviction) so
-  that they don't count towards the `.spec.backoffLimit` limit of retries. 
-  -->
+  that they don't count towards the `.spec.backoffLimit` limit of retries.
+-->
 * 通过避免不必要的 Pod 重启来优化工作负载的运行成本，
  你可以在某 Job 中一个 Pod 失效且其退出码表明存在软件错误时立即终止该 Job。
 * 为了保证即使有干扰也能完成 Job，你可以忽略由干扰导致的 Pod 失效
@ -1172,7 +1309,7 @@ Job 将被标记为失败。以下是 `main` 容器的具体规则：
 - an exit code of 42 means that the **entire Job** failed
 - any other exit code represents that the container failed, and hence the entire
  Pod. The Pod will be re-created if the total number of restarts is
-  below `backoffLimit`. If the `backoffLimit` is reached the **entire Job** failed. 
+  below `backoffLimit`. If the `backoffLimit` is reached the **entire Job** failed.
 -->
 - 退出码 0 代表容器成功
 - 退出码 42 代表 **整个 Job** 失败
@ -1227,7 +1364,7 @@ These are some requirements and semantics of the API:
  - `Ignore`: use to indicate that the counter towards the `.spec.backoffLimit`
     should not be incremented and a replacement Pod should be created.
  - `Count`: use to indicate that the Pod should be handled in the default way.
-     The counter towards the `.spec.backoffLimit` should be incremented. 
+     The counter towards the `.spec.backoffLimit` should be incremented.
 -->
 下面是此 API 的一些要求和语义：
 - 如果你想在 Job 中使用 `.spec.podFailurePolicy` 字段，
@ -1250,75 +1387,53 @@ These are some requirements and semantics of the API:
 -->
 ### 使用 Finalizer 追踪 Job   {#job-tracking-with-finalizers}

-{{< feature-state for_k8s_version="v1.23" state="beta" >}}
+{{< feature-state for_k8s_version="v1.26" state="stable" >}}

 {{< note >}}
 <!--
-In order to use this behavior, you must enable the `JobTrackingWithFinalizers`
-[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
-on the [API server](/docs/reference/command-line-tools-reference/kube-apiserver/)
-and the [controller manager](/docs/reference/command-line-tools-reference/kube-controller-manager/).
-It is enabled by default.
+The control plane doesn't track Jobs using finalizers, if the Jobs were created
+when the feature gate `JobTrackingWithFinalizers` was disabled, even after you
+upgrade the control plane to 1.26.
 -->
-要使用该行为，你必须为 [API 服务器](/zh-cn/docs/reference/command-line-tools-reference/kube-apiserver/)
-和[控制器管理器](/zh-cn/docs/reference/command-line-tools-reference/kube-controller-manager/)启用
-`JobTrackingWithFinalizers`
-[特性门控](/zh-cn/docs/reference/command-line-tools-reference/feature-gates/)。
-该特性默认是启用的。
-
-<!--
-When enabled, the control plane tracks new Jobs using the behavior described
-below. Jobs created before the feature was enabled are unaffected. As a user,
-the only difference you would see is that the control plane tracking of Job
-completion is more accurate.
-->
-启用后，控制面基于下述行为追踪新的 Job。在启用该特性之前创建的 Job 不受影响。
-作为用户，你会看到的唯一区别是控制面对 Job 完成情况的跟踪更加准确。
+如果 Job 是在特性门控 `JobTrackingWithFinalizers` 被禁用时创建的，即使你将控制面升级到 1.26，
+控制面也不会使用 Finalizer 跟踪 Job。
 {{< /note >}}

 <!--
-When this feature isn't enabled, the Job {{< glossary_tooltip term_id="controller" >}}
-relies on counting the Pods that exist in the cluster to track the Job status,
-that is, to keep the counters for `succeeded` and `failed` Pods.
-However, Pods can be removed for a number of reasons, including:
- The garbage collector that removes orphan Pods when a Node goes down.
- The garbage collector that removes finished Pods (in `Succeeded` or `Failed`
-  phase) after a threshold.
- Human intervention to delete Pods belonging to a Job.
- An external controller (not provided as part of Kubernetes) that removes or
-  replaces Pods.
+The control plane keeps track of the Pods that belong to any Job and notices if
+any such Pod is removed from the API server. To do that, the Job controller
+creates Pods with the finalizer `batch.kubernetes.io/job-tracking`. The
+controller removes the finalizer only after the Pod has been accounted for in
+the Job status, allowing the Pod to be removed by other controllers or users.
+
+Jobs created before upgrading to Kubernetes 1.26 or before the feature gate
+`JobTrackingWithFinalizers` is enabled are tracked without the use of Pod
+finalizers.
+The Job {{< glossary_tooltip term_id="controller" text="controller" >}} updates
+the status counters for `succeeded` and `failed` Pods based only on the Pods
+that exist in the cluster. The contol plane can lose track of the progress of
+the Job if Pods are deleted from the cluster.
 -->
-该功能未启用时，Job {{< glossary_tooltip term_id="controller" >}} 依靠计算集群中存在的 Pod 来跟踪作业状态。
-也就是说，维持一个统计 `succeeded` 和 `failed` 的 Pod 的计数器。
-然而，Pod 可以因为一些原因被移除，包括：
- 当一个节点宕机时，垃圾收集器会删除孤立（Orphan）Pod。
- 垃圾收集器在某个阈值后删除已完成的 Pod（处于 `Succeeded` 或 `Failed` 阶段）。
- 人工干预删除 Job 的 Pod。
- 一个外部控制器（不包含于 Kubernetes）来删除或取代 Pod。
+控制面会跟踪属于任何 Job 的 Pod，并通知是否有任何这样的 Pod 被从 API 服务器中移除。
+为了实现这一点，Job 控制器创建的 Pod 带有 Finalizer `batch.kubernetes.io/job-tracking`。
+控制器只有在 Pod 被记入 Job 状态后才会移除 Finalizer，允许 Pod 可以被其他控制器或用户移除。
+
+在升级到 Kubernetes 1.26 之前或在启用特性门控 `JobTrackingWithFinalizers`
+之前创建的 Job 被跟踪时不使用 Pod Finalizer。
+Job {{< glossary_tooltip term_id="controller" text="控制器" >}}仅根据集群中存在的 Pod
+更新 `succeeded` 和 `failed` Pod 的状态计数器。如果 Pod 被从集群中删除，控制面可能无法跟踪 Job 的进度。

 <!--
-If you enable the `JobTrackingWithFinalizers` feature for your cluster, the
-control plane keeps track of the Pods that belong to any Job and notices if any
-such Pod is removed from the API server. To do that, the Job controller creates Pods with
-the finalizer `batch.kubernetes.io/job-tracking`. The controller removes the
-finalizer only after the Pod has been accounted for in the Job status, allowing
-the Pod to be removed by other controllers or users.
-
-The Job controller uses the new algorithm for new Jobs only. Jobs created
-before the feature is enabled are unaffected. You can determine if the Job
-controller is tracking a Job using Pod finalizers by checking if the Job has the
-annotation `batch.kubernetes.io/job-tracking`. You should **not** manually add
-or remove this annotation from Jobs.
+You can determine if the control plane is tracking a Job using Pod finalizers by
+checking if the Job has the annotation
+`batch.kubernetes.io/job-tracking`. You should **not** manually add or remove
+this annotation from Jobs. Instead, you can recreate the Jobs to ensure they
+are tracked using Pod finalizers.
 -->
-如果你为你的集群启用了 `JobTrackingWithFinalizers` 特性，控制面会跟踪属于任何 Job 的 Pod。
-并注意是否有任何这样的 Pod 被从 API 服务器上删除。
-为了实现这一点，Job 控制器创建的 Pod 带有 Finalizer `batch.kubernetes.io/job-tracking`。
-控制器只有在 Pod 被记入 Job 状态后才会移除 Finalizer，允许 Pod 可以被其他控制器或用户删除。
-
-Job 控制器只对新的 Job 使用新的算法。在启用该特性之前创建的 Job 不受影响。
 你可以根据检查 Job 是否含有 `batch.kubernetes.io/job-tracking` 注解，
-来确定 Job 控制器是否正在使用 Pod Finalizer 追踪 Job。
+来确定控制面是否正在使用 Pod Finalizer 追踪 Job。
 你**不**应该给 Job 手动添加或删除该注解。
+取而代之的是你可以重新创建 Job 以确保使用 Pod Finalizer 跟踪这些 Job。

 <!--
 ## Alternatives
@ -1406,6 +1521,8 @@ object, but maintains complete control over what Pods are created and how work i
 * Read about [`CronJob`](/docs/concepts/workloads/controllers/cron-jobs/), which you
  can use to define a series of Jobs that will run based on a schedule, similar to
  the UNIX tool `cron`.
+* Practice how to configure handling of retriable and non-retriable pod failures
+  using `podFailurePolicy`, based on the step-by-step [examples](/docs/tasks/job/pod-failure-policy/).
 -->
 * 了解 [Pod](/zh-cn/docs/concepts/workloads/pods)。
 * 了解运行 Job 的不同的方式：
@ -1418,3 +1535,5 @@ object, but maintains complete control over what Pods are created and how work i
  对象定义理解关于该资源的 API。
 * 阅读 [`CronJob`](/zh-cn/docs/concepts/workloads/controllers/cron-jobs/)，
  它允许你定义一系列定期运行的 Job，类似于 UNIX 工具 `cron`。
+* 根据循序渐进的[示例](/zh-cn/docs/tasks/job/pod-failure-policy/)，
+  练习如何使用 `podFailurePolicy` 配置处理可重试和不可重试的 Pod 失效。