Update for KEP3329: "Retriable and non-retriable Pod failures for Jobs"

Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>
2023-03-06 12:43:19 +01:00 · 2023-03-06 12:43:19 +01:00 · 801b556183
parent e2526aa6c4
commit 801b556183
5 changed files with 149 additions and 5 deletions
--- a/content/en/docs/concepts/workloads/controllers/job.md
+++ b/content/en/docs/concepts/workloads/controllers/job.md
@ -807,6 +807,17 @@ These are some requirements and semantics of the API:
  - `Count`: use to indicate that the Pod should be handled in the default way.
     The counter towards the `.spec.backoffLimit` should be incremented.

+{{< note >}}
+When you use a `podFailurePolicy`, the job controller only matches Pods in the
+`Failed` phase. Pods with a deletion timestamp that are not in a terminal phase
+(`Failed` or `Succeeded`) are considered still terminating. This implies that
+terminating pods retain a [tracking finalizer](#job-tracking-with-finalizers)
+until they reach a terminal phase.
+Since Kubernetes 1.27, Kubelet transitions deleted pods to a terminal phase
+(see: [Pod Phase](/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase)). This
+ensures that deleted pods have their finalizers removed by the Job controller.
+{{< /note >}}
+
 ### Job tracking with finalizers

 {{< feature-state for_k8s_version="v1.26" state="stable" >}}
--- a/content/en/docs/concepts/workloads/pods/disruptions.md
+++ b/content/en/docs/concepts/workloads/pods/disruptions.md
@ -231,11 +231,6 @@ can happen, according to:

 {{< feature-state for_k8s_version="v1.26" state="beta" >}}

-{{< note >}}
-If you are using an older version of Kubernetes than {{< skew currentVersion >}}
-please refer to the corresponding version of the documentation.
-{{< /note >}}
-
 {{< note >}}
 In order to use this behavior, you must have the `PodDisruptionConditions`
 [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
--- a/content/en/docs/concepts/workloads/pods/pod-lifecycle.md
+++ b/content/en/docs/concepts/workloads/pods/pod-lifecycle.md
@ -91,6 +91,12 @@ A Pod is granted a term to terminate gracefully, which defaults to 30 seconds.
 You can use the flag `--force` to [terminate a Pod by force](/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination-forced).
 {{< /note >}}

+Since Kubernetes 1.27, the kubelet transitions deleted pods, except for
+[static pods](/docs/tasks/configure-pod-container/static-pod/) and
+[force-deleted pods](/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination-forced)
+without a finalizer, to a terminal phase (`Failed` or `Succeeded` depending on
+the exit statuses of the pod containers) before their deletion from the API server.
+
 If a node dies or is disconnected from the rest of the cluster, Kubernetes
 applies a policy for setting the `phase` of all Pods on the lost node to Failed.

@ -476,6 +482,8 @@ An example flow:
 1. When the grace period expires, the kubelet triggers forcible shutdown. The container runtime sends
   `SIGKILL` to any processes still running in any container in the Pod.
   The kubelet also cleans up a hidden `pause` container if that container runtime uses one.
+1. The kubelet transitions the pod into a terminal phase (`Failed` or `Succeeded` depending on
+   the end state of its containers). This step is guaranteed since version 1.27.
 1. The kubelet triggers forcible removal of Pod object from the API server, by setting grace period
   to 0 (immediate deletion).
 1. The API server deletes the Pod's API object, which is then no longer visible from any client.
--- a/content/en/docs/tasks/job/pod-failure-policy.md
+++ b/content/en/docs/tasks/job/pod-failure-policy.md
@ -28,6 +28,9 @@ You should already be familiar with the basic use of [Job](/docs/concepts/worklo

 {{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}

+Ensure that the [feature gates](/docs/reference/command-line-tools-reference/feature-gates/)
+`PodDisruptionConditions` and `JobPodFailurePolicy` are both enabled in your cluster.
+
 ## Using Pod failure policy to avoid unnecessary Pod retries

 With the following example, you can learn how to use Pod failure policy to
@ -129,6 +132,114 @@ kubectl delete jobs/job-pod-failure-policy-ignore

 The cluster automatically cleans up the Pods.

+## Using Pod failure policy to avoid unnecessary Pod retries based on custom Pod Conditions
+
+With the following example, you can learn how to use Pod failure policy to
+avoid unnecessary Pod restarts based on custom Pod Conditions.
+
+{{< note >}}
+The example below works since version 1.27 as it relies on transitioning of
+deleted pods, in the `Pending` phase, to a terminal phase
+(see: [Pod Phase](/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase)).
+{{< /note >}}
+
+1. First, create a Job based on the config:
+
+   {{< codenew file="/controllers/job-pod-failure-policy-config-issue.yaml" >}}
+
+   by running:
+
+   ```sh
+   kubectl create -f job-pod-failure-policy-config-issue.yaml
+   ```
+
+   Note that, the image is misconfigured, as it does not exist.
+
+2. Inspect the status of the job's Pods by running:
+
+   ```sh
+   kubectl get pods -l job-name=job-pod-failure-policy-config-issue -o yaml
+   ```
+
+   You will see output similar to this:
+   ```yaml
+   containerStatuses:
+   - image: non-existing-repo/non-existing-image:example
+      ...
+      state:
+      waiting:
+         message: Back-off pulling image "non-existing-repo/non-existing-image:example"
+         reason: ImagePullBackOff
+         ...
+   phase: Pending
+   ```
+
+   Note that the pod remains in the `Pending` phase as it fails to pull the
+   misconfigured image. This, in principle, could be a transient issue and the
+   image could get pulled. However, in this case, the image does not exist so
+   we indicate this fact by a custom condition.
+
+3. Add the custom condition. First prepare the patch by running:
+
+   ```sh
+   cat <<EOF > patch.yaml
+   status:
+     conditions:
+     - type: ConfigIssue
+       status: "True"
+       reason: "NonExistingImage"
+       lastTransitionTime: "$(date -u +"%Y-%m-%dT%H:%M:%SZ")"
+   EOF
+   ```
+   Second, select one of the pods created by the job by running:
+   ```
+   podName=$(kubectl get pods -l job-name=job-pod-failure-policy-config-issue -o jsonpath='{.items[0].metadata.name}')
+   ```
+
+   Then, apply the patch on one of the pods by running the following command:
+
+   ```sh
+   kubectl patch pod $podName --subresource=status --patch-file=patch.yaml
+   ```
+
+   If applied successfully, you will get a notification like this:
+
+   ```sh
+   pod/job-pod-failure-policy-config-issue-k6pvp patched
+   ```
+
+4. Delete the pod to transition it to `Failed` phase, by running the command:
+
+   ```sh
+   kubectl delete pods/$podName
+   ```
+
+5. Inspect the status of the Job by running:
+
+   ```sh
+   kubectl get jobs -l job-name=job-pod-failure-policy-config-issue -o yaml
+   ```
+
+   In the Job status, see a job `Failed` condition with the field `reason`
+   equal `PodFailurePolicy`. Additionally, the `message` field contains a
+   more detailed information about the Job termination, such as:
+   `Pod default/job-pod-failure-policy-config-issue-k6pvp has condition ConfigIssue matching FailJob rule at index 0`.
+
+{{< note >}}
+In a production environment, the steps 3 and 4 should be automated by a
+user-provided controller.
+{{< /note >}}
+
+### Cleaning up
+
+Delete the Job you created:
+
+```sh
+kubectl delete jobs/job-pod-failure-policy-config-issue
+```
+
+The cluster automatically cleans up the Pods.
+
 ## Alternatives

 You could rely solely on the
--- a/content/en/examples/controllers/job-pod-failure-policy-config-issue.yaml
+++ b/content/en/examples/controllers/job-pod-failure-policy-config-issue.yaml
@ -0,0 +1,19 @@
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: job-pod-failure-policy-config-issue
+spec:
+  completions: 8
+  parallelism: 2
+  template:
+    spec:
+      restartPolicy: Never
+      containers:
+      - name: main
+        image: "non-existing-repo/non-existing-image:example"
+  backoffLimit: 6
+  podFailurePolicy:
+    rules:
+    - action: FailJob
+      onPodConditions:
+      - type: ConfigIssue