commit
f76ecfa831
|
@ -9,6 +9,7 @@ weight: 50
|
|||
---
|
||||
<!--
|
||||
reviewers:
|
||||
- alculquicondor
|
||||
- erictune
|
||||
- soltysh
|
||||
title: Jobs
|
||||
|
@ -21,6 +22,7 @@ weight: 50
|
|||
-->
|
||||
|
||||
<!-- overview -->
|
||||
|
||||
<!--
|
||||
A Job creates one or more Pods and will continue to retry execution of the Pods until a specified number of them successfully terminate.
|
||||
As pods successfully complete, the Job tracks the successful completions. When a specified number
|
||||
|
@ -90,31 +92,23 @@ Check on the status of the Job with `kubectl`:
|
|||
-->
|
||||
使用 `kubectl` 来检查 Job 的状态:
|
||||
|
||||
```shell
|
||||
kubectl describe jobs/pi
|
||||
```
|
||||
|
||||
<!--
|
||||
The output is similar to this:
|
||||
-->
|
||||
输出类似于:
|
||||
|
||||
```
|
||||
Name: pi
|
||||
Namespace: default
|
||||
Selector: controller-uid=c9948307-e56d-4b5d-8302-ae2d7b7da67c
|
||||
Labels: controller-uid=c9948307-e56d-4b5d-8302-ae2d7b7da67c
|
||||
job-name=pi
|
||||
Annotations: kubectl.kubernetes.io/last-applied-configuration:
|
||||
{"apiVersion":"batch/v1","kind":"Job","metadata":{"annotations":{},"name":"pi","namespace":"default"},"spec":{"backoffLimit":4,"template":...
|
||||
Parallelism: 1
|
||||
Completions: 1
|
||||
Start Time: Mon, 02 Dec 2019 15:20:11 +0200
|
||||
Completed At: Mon, 02 Dec 2019 15:21:16 +0200
|
||||
Duration: 65s
|
||||
Pods Statuses: 0 Running / 1 Succeeded / 0 Failed
|
||||
{{< tabs name="Check status of Job" >}}
|
||||
{{< tab name="kubectl describe job pi" codelang="bash" >}}
|
||||
Name: pi
|
||||
Namespace: default
|
||||
Selector: controller-uid=0cd26dd5-88a2-4a5f-a203-ea19a1d5d578
|
||||
Labels: controller-uid=0cd26dd5-88a2-4a5f-a203-ea19a1d5d578
|
||||
job-name=pi
|
||||
Annotations: batch.kubernetes.io/job-tracking:
|
||||
Parallelism: 1
|
||||
Completions: 1
|
||||
Completion Mode: NonIndexed
|
||||
Start Time: Fri, 28 Oct 2022 13:05:18 +0530
|
||||
Completed At: Fri, 28 Oct 2022 13:05:21 +0530
|
||||
Duration: 3s
|
||||
Pods Statuses: 0 Active / 1 Succeeded / 0 Failed
|
||||
Pod Template:
|
||||
Labels: controller-uid=c9948307-e56d-4b5d-8302-ae2d7b7da67c
|
||||
Labels: controller-uid=0cd26dd5-88a2-4a5f-a203-ea19a1d5d578
|
||||
job-name=pi
|
||||
Containers:
|
||||
pi:
|
||||
|
@ -132,8 +126,66 @@ Pod Template:
|
|||
Events:
|
||||
Type Reason Age From Message
|
||||
---- ------ ---- ---- -------
|
||||
Normal SuccessfulCreate 14m job-controller Created pod: pi-5rwd7
|
||||
```
|
||||
Normal SuccessfulCreate 21s job-controller Created pod: pi-xf9p4
|
||||
Normal Completed 18s job-controller Job completed
|
||||
{{< /tab >}}
|
||||
{{< tab name="kubectl get job pi -o yaml" codelang="bash" >}}
|
||||
apiVersion: batch/v1
|
||||
kind: Job
|
||||
metadata:
|
||||
annotations:
|
||||
batch.kubernetes.io/job-tracking: ""
|
||||
kubectl.kubernetes.io/last-applied-configuration: |
|
||||
{"apiVersion":"batch/v1","kind":"Job","metadata":{"annotations":{},"name":"pi","namespace":"default"},"spec":{"backoffLimit":4,"template":{"spec":{"containers":[{"command":["perl","-Mbignum=bpi","-wle","print bpi(2000)"],"image":"perl:5.34.0","name":"pi"}],"restartPolicy":"Never"}}}}
|
||||
creationTimestamp: "2022-11-10T17:53:53Z"
|
||||
generation: 1
|
||||
labels:
|
||||
controller-uid: 204fb678-040b-497f-9266-35ffa8716d14
|
||||
job-name: pi
|
||||
name: pi
|
||||
namespace: default
|
||||
resourceVersion: "4751"
|
||||
uid: 204fb678-040b-497f-9266-35ffa8716d14
|
||||
spec:
|
||||
backoffLimit: 4
|
||||
completionMode: NonIndexed
|
||||
completions: 1
|
||||
parallelism: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
controller-uid: 204fb678-040b-497f-9266-35ffa8716d14
|
||||
suspend: false
|
||||
template:
|
||||
metadata:
|
||||
creationTimestamp: null
|
||||
labels:
|
||||
controller-uid: 204fb678-040b-497f-9266-35ffa8716d14
|
||||
job-name: pi
|
||||
spec:
|
||||
containers:
|
||||
- command:
|
||||
- perl
|
||||
- -Mbignum=bpi
|
||||
- -wle
|
||||
- print bpi(2000)
|
||||
image: perl:5.34.0
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: pi
|
||||
resources: {}
|
||||
terminationMessagePath: /dev/termination-log
|
||||
terminationMessagePolicy: File
|
||||
dnsPolicy: ClusterFirst
|
||||
restartPolicy: Never
|
||||
schedulerName: default-scheduler
|
||||
securityContext: {}
|
||||
terminationGracePeriodSeconds: 30
|
||||
status:
|
||||
active: 1
|
||||
ready: 0
|
||||
startTime: "2022-11-10T17:53:57Z"
|
||||
uncountedTerminatedPods: {}
|
||||
{{< /tab >}}
|
||||
{{< /tabs >}}
|
||||
|
||||
<!--
|
||||
To view completed Pods of a Job, use `kubectl get pods`.
|
||||
|
@ -186,14 +238,27 @@ The output is similar to this:
|
|||
## Writing a Job spec
|
||||
|
||||
As with all other Kubernetes config, a Job needs `apiVersion`, `kind`, and `metadata` fields.
|
||||
Its name must be a valid [DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
|
||||
|
||||
When the control plane creates new Pods for a Job, the `.metadata.name` of the
|
||||
Job is part of the basis for naming those Pods. The name of a Job must be a valid
|
||||
[DNS subdomain](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names)
|
||||
value, but this can produce unexpected results for the Pod hostnames. For best compatibility,
|
||||
the name should follow the more restrictive rules for a
|
||||
[DNS label](/docs/concepts/overview/working-with-objects/names#dns-label-names).
|
||||
Even when the name is a DNS subdomain, the name must be no longer than 63
|
||||
characters.
|
||||
|
||||
A Job also needs a [`.spec` section](https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status).
|
||||
-->
|
||||
## 编写 Job 规约 {#writing-a-job-spec}
|
||||
|
||||
与 Kubernetes 中其他资源的配置类似,Job 也需要 `apiVersion`、`kind` 和 `metadata` 字段。
|
||||
Job 的名字必须是合法的 [DNS 子域名](/zh-cn/docs/concepts/overview/working-with-objects/names#dns-subdomain-names)。
|
||||
|
||||
当控制面为 Job 创建新的 Pod 时,Job 的 `.metadata.name` 是命名这些 Pod 的基础组成部分。
|
||||
Job 的名字必须是合法的 [DNS 子域名](/zh-cn/docs/concepts/overview/working-with-objects/names#dns-subdomain-names)值,
|
||||
但这可能对 Pod 主机名产生意料之外的结果。为了获得最佳兼容性,此名字应遵循更严格的
|
||||
[DNS 标签](/zh-cn/docs/concepts/overview/working-with-objects/names#dns-label-names)规则。
|
||||
即使该名字被要求遵循 DNS 子域名规则,也不得超过 63 个字符。
|
||||
|
||||
Job 配置还需要一个 [`.spec` 节](https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status)。
|
||||
|
||||
|
@ -367,7 +432,7 @@ Jobs with _fixed completion count_ - that is, jobs that have non null
|
|||
the deterministic hostnames to address each other via DNS. For more information about
|
||||
how to configure this, see [Job with Pod-to-Pod Communication](/docs/tasks/job/job-with-pod-to-pod-communication/).
|
||||
- From the containerized task, in the environment variable `JOB_COMPLETION_INDEX`.
|
||||
|
||||
|
||||
The Job is considered complete when there is one successfully completed Pod
|
||||
for each index. For more information about how to use this mode, see
|
||||
[Indexed Job for Parallel Processing with Static Work Assignment](/docs/tasks/job/indexed-parallel-processing-static/).
|
||||
|
@ -402,7 +467,6 @@ on the node, but the container is re-run. Therefore, your program needs to hand
|
|||
restarted locally, or else specify `.spec.template.spec.restartPolicy = "Never"`.
|
||||
See [pod lifecycle](/docs/concepts/workloads/pods/pod-lifecycle/#example-states) for more information on `restartPolicy`.
|
||||
-->
|
||||
|
||||
## 处理 Pod 和容器失效 {#handling-pod-and-container-failures}
|
||||
|
||||
Pod 中的容器可能因为多种不同原因失效,例如因为其中的进程退出时返回值非零,
|
||||
|
@ -429,6 +493,15 @@ caused by previous runs.
|
|||
这意味着,你的应用需要处理在一个新 Pod 中被重启的情况。
|
||||
尤其是应用需要处理之前运行所产生的临时文件、锁、不完整的输出等问题。
|
||||
|
||||
<!--
|
||||
By default, each pod failure is counted towards the `.spec.backoffLimit` limit,
|
||||
see [pod backoff failure policy](#pod-backoff-failure-policy). However, you can
|
||||
customize handling of pod failures by setting the Job's [pod failure policy](#pod-failure-policy).
|
||||
-->
|
||||
默认情况下,每个 Pod 失效都被计入 `.spec.backoffLimit` 限制,
|
||||
请参阅 [Pod 回退失效策略](#pod-backoff-failure-policy)。
|
||||
但你可以通过设置 Job 的 [Pod 失效策略](#pod-failure-policy)自定义对 Pod 失效的处理方式。
|
||||
|
||||
<!--
|
||||
Note that even if you specify `.spec.parallelism = 1` and `.spec.completions = 1` and
|
||||
`.spec.template.spec.restartPolicy = "Never"`, the same program may
|
||||
|
@ -444,6 +517,29 @@ multiple pods running at once. Therefore, your pods must also be tolerant of co
|
|||
那就有可能同时出现多个 Pod 运行的情况。
|
||||
为此,你的 Pod 也必须能够处理并发性问题。
|
||||
|
||||
<!--
|
||||
When the [feature gates](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
`PodDisruptionConditions` and `JobPodFailurePolicy` are both enabled,
|
||||
and the `.spec.podFailurePolicy` field is set, the Job controller does not consider a terminating
|
||||
Pod (a pod that has a `.metadata.deletionTimestamp` field set) as a failure until that Pod is
|
||||
terminal (its `.status.phase` is `Failed` or `Succeeded`). However, the Job controller
|
||||
creates a replacement Pod as soon as the termination becomes apparent. Once the
|
||||
pod terminates, the Job controller evaluates `.backoffLimit` and `.podFailurePolicy`
|
||||
for the relevant Job, taking this now-terminated Pod into consideration.
|
||||
|
||||
If either of these requirements is not satisfied, the Job controller counts
|
||||
a terminating Pod as an immediate failure, even if that Pod later terminates
|
||||
with `phase: "Succeeded"`.
|
||||
-->
|
||||
当[特性门控](/zh-cn/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
`PodDisruptionConditions` 和 `JobPodFailurePolicy` 都被启用且 `.spec.podFailurePolicy` 字段被设置时,
|
||||
Job 控制器不会将终止过程中的 Pod(已设置 `.metadata.deletionTimestamp` 字段的 Pod)视为失效 Pod,
|
||||
直到该 Pod 完全终止(其 `.status.phase` 为 `Failed` 或 `Succeeded`)。
|
||||
但只要终止变得显而易见,Job 控制器就会创建一个替代的 Pod。一旦 Pod 终止,Job 控制器将把这个刚终止的
|
||||
Pod 考虑在内,评估相关 Job 的 `.backoffLimit` 和 `.podFailurePolicy`。
|
||||
|
||||
如果不满足任一要求,即使 Pod 稍后以 `phase: "Succeeded"` 终止,Job 控制器也会将此即将终止的 Pod 计为立即失效。
|
||||
|
||||
<!--
|
||||
### Pod backoff failure policy
|
||||
|
||||
|
@ -452,7 +548,7 @@ due to a logical error in configuration etc.
|
|||
To do so, set `.spec.backoffLimit` to specify the number of retries before
|
||||
considering a Job as failed. The back-off limit is set by default to 6. Failed
|
||||
Pods associated with the Job are recreated by the Job controller with an
|
||||
exponential back-off delay (10s, 20s, 40s ...) capped at six minutes.
|
||||
exponential back-off delay (10s, 20s, 40s ...) capped at six minutes.
|
||||
|
||||
The number of retries is calculated in two ways:
|
||||
- The number of Pods with `.status.phase = "Failed"`.
|
||||
|
@ -461,10 +557,6 @@ The number of retries is calculated in two ways:
|
|||
|
||||
If either of the calculations reaches the `.spec.backoffLimit`, the Job is
|
||||
considered failed.
|
||||
|
||||
When the [`JobTrackingWithFinalizers`](#job-tracking-with-finalizers) feature is
|
||||
disabled, the number of failed Pods is only based on Pods that are still present
|
||||
in the API.
|
||||
-->
|
||||
### Pod 回退失效策略 {#pod-backoff-failure-policy}
|
||||
|
||||
|
@ -482,9 +574,6 @@ in the API.
|
|||
|
||||
如果两种方式其中一个的值达到 `.spec.backoffLimit`,则 Job 被判定为失败。
|
||||
|
||||
当 [`JobTrackingWithFinalizers`](#job-tracking-with-finalizers) 特性被禁用时,
|
||||
失败的 Pod 数目仅基于 API 中仍然存在的 Pod。
|
||||
|
||||
{{< note >}}
|
||||
<!--
|
||||
If your job has `restartPolicy = "OnFailure"`, keep in mind that your Pod running the Job
|
||||
|
@ -508,7 +597,7 @@ allows you to still view the logs of completed pods to check for errors, warning
|
|||
The job object also remains after it is completed so that you can view its status. It is up to the user to delete
|
||||
old jobs after noting their status. Delete the job with `kubectl` (e.g. `kubectl delete jobs/pi` or `kubectl delete -f ./job.yaml`). When you delete the job using `kubectl`, all the pods it created are deleted too.
|
||||
-->
|
||||
## Job 终止与清理 {#clean-up-finished-jobs-automatically}
|
||||
## Job 终止与清理 {#job-termination-and-cleanup}
|
||||
|
||||
Job 完成时不会再创建新的 Pod,不过已有的 Pod [通常](#pod-backoff-failure-policy)也不会被删除。
|
||||
保留这些 Pod 使得你可以查看已完成的 Pod 的日志输出,以便检查错误、警告或者其它诊断性输出。
|
||||
|
@ -658,6 +747,37 @@ Job `pi-with-ttl` 在结束 100 秒之后,可以成为被自动删除的对象
|
|||
如果该字段设置为 `0`,Job 在结束之后立即成为可被自动删除的对象。
|
||||
如果该字段没有设置,Job 不会在结束之后被 TTL 控制器自动清除。
|
||||
|
||||
{{< note >}}
|
||||
<!--
|
||||
It is recommended to set `ttlSecondsAfterFinished` field because unmanaged jobs
|
||||
(Jobs that you created directly, and not indirectly through other workload APIs
|
||||
such as CronJob) have a default deletion
|
||||
policy of `orphanDependents` causing Pods created by an unmanaged Job to be left around
|
||||
after that Job is fully deleted.
|
||||
Even though the {{< glossary_tooltip text="control plane" term_id="control-plane" >}} eventually
|
||||
[garbage collects](/docs/concepts/workloads/pods/pod-lifecycle/#pod-garbage-collection)
|
||||
the Pods from a deleted Job after they either fail or complete, sometimes those
|
||||
lingering pods may cause cluster performance degradation or in worst case cause the
|
||||
cluster to go offline due to this degradation.
|
||||
-->
|
||||
建议设置 `ttlSecondsAfterFinished` 字段,因为非托管任务
|
||||
(是你直接创建的 Job,而不是通过其他工作负载 API(如 CronJob)间接创建的 Job)
|
||||
的默认删除策略是 `orphanDependents`,这会导致非托管 Job 创建的 Pod 在该 Job 被完全删除后被保留。
|
||||
即使{{< glossary_tooltip text="控制面" term_id="control-plane" >}}最终在 Pod 失效或完成后
|
||||
对已删除 Job 中的这些 Pod 执行[垃圾收集](/zh-cn/docs/concepts/workloads/pods/pod-lifecycle/#pod-garbage-collection)操作,
|
||||
这些残留的 Pod 有时可能会导致集群性能下降,或者在最坏的情况下会导致集群因这种性能下降而离线。
|
||||
|
||||
<!--
|
||||
You can use [LimitRanges](/docs/concepts/policy/limit-range/) and
|
||||
[ResourceQuotas](/docs/concepts/policy/resource-quotas/) to place a
|
||||
cap on the amount of resources that a particular namespace can
|
||||
consume.
|
||||
-->
|
||||
你可以使用 [LimitRange](/zh-cn/docs/concepts/policy/limit-range/) 和
|
||||
[ResourceQuota](/zh-cn/docs/concepts/policy/resource-quotas/),
|
||||
设定一个特定名字空间可以消耗的资源上限。
|
||||
{{< /note >}}
|
||||
|
||||
<!--
|
||||
## Job patterns
|
||||
|
||||
|
@ -711,6 +831,14 @@ The tradeoffs are:
|
|||
<!--
|
||||
The tradeoffs are summarized here, with columns 2 to 4 corresponding to the above tradeoffs.
|
||||
The pattern names are also links to examples and more detailed description.
|
||||
|
||||
| Pattern | Single Job object | Fewer pods than work items? | Use app unmodified? |
|
||||
| ----------------------------------------------- |:-----------------:|:---------------------------:|:-------------------:|
|
||||
| [Queue with Pod Per Work Item] | ✓ | | sometimes |
|
||||
| [Queue with Variable Pod Count] | ✓ | ✓ | |
|
||||
| [Indexed Job with Static Work Assignment] | ✓ | | ✓ |
|
||||
| [Job Template Expansion] | | | ✓ |
|
||||
| [Job with Pod-to-Pod Communication] | ✓ | sometimes | sometimes |
|
||||
-->
|
||||
下面是对这些权衡的汇总,第 2 到 4 列对应上面的权衡比较。
|
||||
模式的名称对应了相关示例和更详细描述的链接。
|
||||
|
@ -742,6 +870,15 @@ Here, `W` is the number of work items.
|
|||
下表显示的是每种模式下 `.spec.parallelism` 和 `.spec.completions` 所需要的设置。
|
||||
其中,`W` 表示的是工作条目的个数。
|
||||
|
||||
<!--
|
||||
| Pattern | `.spec.completions` | `.spec.parallelism` |
|
||||
| ----------------------------------------------- |:-------------------:|:--------------------:|
|
||||
| [Queue with Pod Per Work Item] | W | any |
|
||||
| [Queue with Variable Pod Count] | null | any |
|
||||
| [Indexed Job with Static Work Assignment] | W | any |
|
||||
| [Job Template Expansion] | 1 | should be 1 |
|
||||
| [Job with Pod-to-Pod Communication] | W | W |
|
||||
-->
|
||||
| 模式 | `.spec.completions` | `.spec.parallelism` |
|
||||
| ----- |:-------------------:|:--------------------:|
|
||||
| [每工作条目一 Pod 的队列](/zh-cn/docs/tasks/job/coarse-parallel-processing-work-queue/) | W | 任意值 |
|
||||
|
@ -1090,7 +1227,7 @@ mismatch.
|
|||
-->
|
||||
### Pod 失效策略 {#pod-failure-policy}
|
||||
|
||||
{{< feature-state for_k8s_version="v1.25" state="alpha" >}}
|
||||
{{< feature-state for_k8s_version="v1.26" state="beta" >}}
|
||||
|
||||
{{< note >}}
|
||||
<!--
|
||||
|
@ -1100,14 +1237,14 @@ enabled in your cluster. Additionally, it is recommended
|
|||
to enable the `PodDisruptionConditions` feature gate in order to be able to detect and handle
|
||||
Pod disruption conditions in the Pod failure policy (see also:
|
||||
[Pod disruption conditions](/docs/concepts/workloads/pods/disruptions#pod-disruption-conditions)). Both feature gates are
|
||||
available in Kubernetes v1.25.
|
||||
available in Kubernetes {{< skew currentVersion >}}.
|
||||
-->
|
||||
只有你在集群中启用了
|
||||
`JobPodFailurePolicy` [特性门控](/zh-cn/docs/reference/command-line-tools-reference/feature-gates/),
|
||||
你才能为某个 Job 配置 Pod 失效策略。
|
||||
此外,建议启用 `PodDisruptionConditions` 特性门控以便在 Pod 失效策略中检测和处理 Pod 干扰状况
|
||||
(参考:[Pod 干扰状况](/zh-cn/docs/concepts/workloads/pods/disruptions#pod-disruption-conditions))。
|
||||
这两个特性门控都是在 Kubernetes v1.25 中提供的。
|
||||
这两个特性门控都是在 Kubernetes {{< skew currentVersion >}} 中提供的。
|
||||
{{< /note >}}
|
||||
|
||||
<!--
|
||||
|
@ -1134,8 +1271,8 @@ which is based on the Job's `.spec.backoffLimit`. These are some examples of use
|
|||
ignore Pod failures caused by disruptions (such {{< glossary_tooltip text="preemption" term_id="preemption" >}},
|
||||
{{< glossary_tooltip text="API-initiated eviction" term_id="api-eviction" >}}
|
||||
or {{< glossary_tooltip text="taint" term_id="taint" >}}-based eviction) so
|
||||
that they don't count towards the `.spec.backoffLimit` limit of retries.
|
||||
-->
|
||||
that they don't count towards the `.spec.backoffLimit` limit of retries.
|
||||
-->
|
||||
* 通过避免不必要的 Pod 重启来优化工作负载的运行成本,
|
||||
你可以在某 Job 中一个 Pod 失效且其退出码表明存在软件错误时立即终止该 Job。
|
||||
* 为了保证即使有干扰也能完成 Job,你可以忽略由干扰导致的 Pod 失效
|
||||
|
@ -1172,7 +1309,7 @@ Job 将被标记为失败。以下是 `main` 容器的具体规则:
|
|||
- an exit code of 42 means that the **entire Job** failed
|
||||
- any other exit code represents that the container failed, and hence the entire
|
||||
Pod. The Pod will be re-created if the total number of restarts is
|
||||
below `backoffLimit`. If the `backoffLimit` is reached the **entire Job** failed.
|
||||
below `backoffLimit`. If the `backoffLimit` is reached the **entire Job** failed.
|
||||
-->
|
||||
- 退出码 0 代表容器成功
|
||||
- 退出码 42 代表 **整个 Job** 失败
|
||||
|
@ -1227,7 +1364,7 @@ These are some requirements and semantics of the API:
|
|||
- `Ignore`: use to indicate that the counter towards the `.spec.backoffLimit`
|
||||
should not be incremented and a replacement Pod should be created.
|
||||
- `Count`: use to indicate that the Pod should be handled in the default way.
|
||||
The counter towards the `.spec.backoffLimit` should be incremented.
|
||||
The counter towards the `.spec.backoffLimit` should be incremented.
|
||||
-->
|
||||
下面是此 API 的一些要求和语义:
|
||||
- 如果你想在 Job 中使用 `.spec.podFailurePolicy` 字段,
|
||||
|
@ -1250,75 +1387,53 @@ These are some requirements and semantics of the API:
|
|||
-->
|
||||
### 使用 Finalizer 追踪 Job {#job-tracking-with-finalizers}
|
||||
|
||||
{{< feature-state for_k8s_version="v1.23" state="beta" >}}
|
||||
{{< feature-state for_k8s_version="v1.26" state="stable" >}}
|
||||
|
||||
{{< note >}}
|
||||
<!--
|
||||
In order to use this behavior, you must enable the `JobTrackingWithFinalizers`
|
||||
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
on the [API server](/docs/reference/command-line-tools-reference/kube-apiserver/)
|
||||
and the [controller manager](/docs/reference/command-line-tools-reference/kube-controller-manager/).
|
||||
It is enabled by default.
|
||||
The control plane doesn't track Jobs using finalizers, if the Jobs were created
|
||||
when the feature gate `JobTrackingWithFinalizers` was disabled, even after you
|
||||
upgrade the control plane to 1.26.
|
||||
-->
|
||||
要使用该行为,你必须为 [API 服务器](/zh-cn/docs/reference/command-line-tools-reference/kube-apiserver/)
|
||||
和[控制器管理器](/zh-cn/docs/reference/command-line-tools-reference/kube-controller-manager/)启用
|
||||
`JobTrackingWithFinalizers`
|
||||
[特性门控](/zh-cn/docs/reference/command-line-tools-reference/feature-gates/)。
|
||||
该特性默认是启用的。
|
||||
|
||||
<!--
|
||||
When enabled, the control plane tracks new Jobs using the behavior described
|
||||
below. Jobs created before the feature was enabled are unaffected. As a user,
|
||||
the only difference you would see is that the control plane tracking of Job
|
||||
completion is more accurate.
|
||||
-->
|
||||
启用后,控制面基于下述行为追踪新的 Job。在启用该特性之前创建的 Job 不受影响。
|
||||
作为用户,你会看到的唯一区别是控制面对 Job 完成情况的跟踪更加准确。
|
||||
如果 Job 是在特性门控 `JobTrackingWithFinalizers` 被禁用时创建的,即使你将控制面升级到 1.26,
|
||||
控制面也不会使用 Finalizer 跟踪 Job。
|
||||
{{< /note >}}
|
||||
|
||||
<!--
|
||||
When this feature isn't enabled, the Job {{< glossary_tooltip term_id="controller" >}}
|
||||
relies on counting the Pods that exist in the cluster to track the Job status,
|
||||
that is, to keep the counters for `succeeded` and `failed` Pods.
|
||||
However, Pods can be removed for a number of reasons, including:
|
||||
- The garbage collector that removes orphan Pods when a Node goes down.
|
||||
- The garbage collector that removes finished Pods (in `Succeeded` or `Failed`
|
||||
phase) after a threshold.
|
||||
- Human intervention to delete Pods belonging to a Job.
|
||||
- An external controller (not provided as part of Kubernetes) that removes or
|
||||
replaces Pods.
|
||||
The control plane keeps track of the Pods that belong to any Job and notices if
|
||||
any such Pod is removed from the API server. To do that, the Job controller
|
||||
creates Pods with the finalizer `batch.kubernetes.io/job-tracking`. The
|
||||
controller removes the finalizer only after the Pod has been accounted for in
|
||||
the Job status, allowing the Pod to be removed by other controllers or users.
|
||||
|
||||
Jobs created before upgrading to Kubernetes 1.26 or before the feature gate
|
||||
`JobTrackingWithFinalizers` is enabled are tracked without the use of Pod
|
||||
finalizers.
|
||||
The Job {{< glossary_tooltip term_id="controller" text="controller" >}} updates
|
||||
the status counters for `succeeded` and `failed` Pods based only on the Pods
|
||||
that exist in the cluster. The contol plane can lose track of the progress of
|
||||
the Job if Pods are deleted from the cluster.
|
||||
-->
|
||||
该功能未启用时,Job {{< glossary_tooltip term_id="controller" >}} 依靠计算集群中存在的 Pod 来跟踪作业状态。
|
||||
也就是说,维持一个统计 `succeeded` 和 `failed` 的 Pod 的计数器。
|
||||
然而,Pod 可以因为一些原因被移除,包括:
|
||||
- 当一个节点宕机时,垃圾收集器会删除孤立(Orphan)Pod。
|
||||
- 垃圾收集器在某个阈值后删除已完成的 Pod(处于 `Succeeded` 或 `Failed` 阶段)。
|
||||
- 人工干预删除 Job 的 Pod。
|
||||
- 一个外部控制器(不包含于 Kubernetes)来删除或取代 Pod。
|
||||
控制面会跟踪属于任何 Job 的 Pod,并通知是否有任何这样的 Pod 被从 API 服务器中移除。
|
||||
为了实现这一点,Job 控制器创建的 Pod 带有 Finalizer `batch.kubernetes.io/job-tracking`。
|
||||
控制器只有在 Pod 被记入 Job 状态后才会移除 Finalizer,允许 Pod 可以被其他控制器或用户移除。
|
||||
|
||||
在升级到 Kubernetes 1.26 之前或在启用特性门控 `JobTrackingWithFinalizers`
|
||||
之前创建的 Job 被跟踪时不使用 Pod Finalizer。
|
||||
Job {{< glossary_tooltip term_id="controller" text="控制器" >}}仅根据集群中存在的 Pod
|
||||
更新 `succeeded` 和 `failed` Pod 的状态计数器。如果 Pod 被从集群中删除,控制面可能无法跟踪 Job 的进度。
|
||||
|
||||
<!--
|
||||
If you enable the `JobTrackingWithFinalizers` feature for your cluster, the
|
||||
control plane keeps track of the Pods that belong to any Job and notices if any
|
||||
such Pod is removed from the API server. To do that, the Job controller creates Pods with
|
||||
the finalizer `batch.kubernetes.io/job-tracking`. The controller removes the
|
||||
finalizer only after the Pod has been accounted for in the Job status, allowing
|
||||
the Pod to be removed by other controllers or users.
|
||||
|
||||
The Job controller uses the new algorithm for new Jobs only. Jobs created
|
||||
before the feature is enabled are unaffected. You can determine if the Job
|
||||
controller is tracking a Job using Pod finalizers by checking if the Job has the
|
||||
annotation `batch.kubernetes.io/job-tracking`. You should **not** manually add
|
||||
or remove this annotation from Jobs.
|
||||
You can determine if the control plane is tracking a Job using Pod finalizers by
|
||||
checking if the Job has the annotation
|
||||
`batch.kubernetes.io/job-tracking`. You should **not** manually add or remove
|
||||
this annotation from Jobs. Instead, you can recreate the Jobs to ensure they
|
||||
are tracked using Pod finalizers.
|
||||
-->
|
||||
如果你为你的集群启用了 `JobTrackingWithFinalizers` 特性,控制面会跟踪属于任何 Job 的 Pod。
|
||||
并注意是否有任何这样的 Pod 被从 API 服务器上删除。
|
||||
为了实现这一点,Job 控制器创建的 Pod 带有 Finalizer `batch.kubernetes.io/job-tracking`。
|
||||
控制器只有在 Pod 被记入 Job 状态后才会移除 Finalizer,允许 Pod 可以被其他控制器或用户删除。
|
||||
|
||||
Job 控制器只对新的 Job 使用新的算法。在启用该特性之前创建的 Job 不受影响。
|
||||
你可以根据检查 Job 是否含有 `batch.kubernetes.io/job-tracking` 注解,
|
||||
来确定 Job 控制器是否正在使用 Pod Finalizer 追踪 Job。
|
||||
来确定控制面是否正在使用 Pod Finalizer 追踪 Job。
|
||||
你**不**应该给 Job 手动添加或删除该注解。
|
||||
取而代之的是你可以重新创建 Job 以确保使用 Pod Finalizer 跟踪这些 Job。
|
||||
|
||||
<!--
|
||||
## Alternatives
|
||||
|
@ -1406,6 +1521,8 @@ object, but maintains complete control over what Pods are created and how work i
|
|||
* Read about [`CronJob`](/docs/concepts/workloads/controllers/cron-jobs/), which you
|
||||
can use to define a series of Jobs that will run based on a schedule, similar to
|
||||
the UNIX tool `cron`.
|
||||
* Practice how to configure handling of retriable and non-retriable pod failures
|
||||
using `podFailurePolicy`, based on the step-by-step [examples](/docs/tasks/job/pod-failure-policy/).
|
||||
-->
|
||||
* 了解 [Pod](/zh-cn/docs/concepts/workloads/pods)。
|
||||
* 了解运行 Job 的不同的方式:
|
||||
|
@ -1418,3 +1535,5 @@ object, but maintains complete control over what Pods are created and how work i
|
|||
对象定义理解关于该资源的 API。
|
||||
* 阅读 [`CronJob`](/zh-cn/docs/concepts/workloads/controllers/cron-jobs/),
|
||||
它允许你定义一系列定期运行的 Job,类似于 UNIX 工具 `cron`。
|
||||
* 根据循序渐进的[示例](/zh-cn/docs/tasks/job/pod-failure-policy/),
|
||||
练习如何使用 `podFailurePolicy` 配置处理可重试和不可重试的 Pod 失效。
|
||||
|
|
Loading…
Reference in New Issue