314 lines
9.4 KiB
Markdown
314 lines
9.4 KiB
Markdown
---
|
|
title: Parallel Processing using Expansions
|
|
content_template: templates/task
|
|
min-kubernetes-server-version: v1.8
|
|
weight: 20
|
|
---
|
|
|
|
{{% capture overview %}}
|
|
|
|
This task demonstrates running multiple {{< glossary_tooltip text="Jobs" term_id="job" >}}
|
|
based on a common template. You can use this approach to process batches of work in
|
|
parallel.
|
|
|
|
For this example there are only three items: _apple_, _banana_, and _cherry_.
|
|
The sample Jobs process each item simply by printing a string then pausing.
|
|
|
|
See [using Jobs in real workloads](#using-jobs-in-real-workloads) to learn about how
|
|
this pattern fits more realistic use cases.
|
|
{{% /capture %}}
|
|
|
|
{{% capture prerequisites %}}
|
|
|
|
You should be familiar with the basic,
|
|
non-parallel, use of [Job](/docs/concepts/jobs/run-to-completion-finite-workloads/).
|
|
|
|
{{< include "task-tutorial-prereqs.md" >}}
|
|
|
|
For basic templating you need the command-line utility `sed`.
|
|
|
|
To follow the advanced templating example, you need a working installation of
|
|
[Python](https://www.python.org/), and the Jinja2 template
|
|
library for Python.
|
|
|
|
Once you have Python set up, you can install Jinja2 by running:
|
|
```shell
|
|
pip install --user jinja2
|
|
```
|
|
{{% /capture %}}
|
|
|
|
|
|
{{% capture steps %}}
|
|
|
|
## Create Jobs based on a template
|
|
|
|
First, download the following template of a Job to a file called `job-tmpl.yaml`.
|
|
Here's what you'll download:
|
|
|
|
{{< codenew file="application/job/job-tmpl.yaml" >}}
|
|
|
|
```shell
|
|
# Use curl to download job-tmpl.yaml
|
|
curl -L -s -O https://k8s.io/examples/application/job/job-tmpl.yaml
|
|
```
|
|
|
|
The file you downloaded is not yet a valid Kubernetes
|
|
{{< glossary_tooltip text="manifest" term_id="manifest" >}}.
|
|
Instead that template is a YAML representation of a Job object with some placeholders
|
|
that need to be filled in before it can be used. The `$ITEM` syntax is not meaningful to Kubernetes.
|
|
|
|
|
|
### Create manifests from the template
|
|
|
|
The following shell snippet uses `sed` to replace the string `$ITEM` with the loop
|
|
variable, writing into a temporary directory named `jobs`. Run this now:
|
|
|
|
```shell
|
|
# Expand the template into multiple files, one for each item to be processed.
|
|
mkdir ./jobs
|
|
for i in apple banana cherry
|
|
do
|
|
cat job-tmpl.yaml | sed "s/\$ITEM/$i/" > ./jobs/job-$i.yaml
|
|
done
|
|
```
|
|
|
|
Check if it worked:
|
|
|
|
```shell
|
|
ls jobs/
|
|
```
|
|
|
|
The output is similar to this:
|
|
|
|
```
|
|
job-apple.yaml
|
|
job-banana.yaml
|
|
job-cherry.yaml
|
|
```
|
|
|
|
You could use any type of template language (for example: Jinja2; ERB), or
|
|
write a program to generate the Job manifests.
|
|
|
|
### Create Jobs from the manifests
|
|
|
|
Next, create all the Jobs with one kubectl command:
|
|
|
|
```shell
|
|
kubectl create -f ./jobs
|
|
```
|
|
|
|
The output is similar to this:
|
|
|
|
```
|
|
job.batch/process-item-apple created
|
|
job.batch/process-item-banana created
|
|
job.batch/process-item-cherry created
|
|
```
|
|
|
|
Now, check on the jobs:
|
|
|
|
```shell
|
|
kubectl get jobs -l jobgroup=jobexample
|
|
```
|
|
|
|
The output is similar to this:
|
|
|
|
```
|
|
NAME COMPLETIONS DURATION AGE
|
|
process-item-apple 1/1 14s 22s
|
|
process-item-banana 1/1 12s 21s
|
|
process-item-cherry 1/1 12s 20s
|
|
```
|
|
|
|
Using the `-l` option to kubectl selects only the Jobs that are part
|
|
of this group of jobs (there might be other unrelated jobs in the system).
|
|
|
|
You can check on the Pods as well using the same
|
|
{{< glossary_tooltip text="label selector" term_id="selector" >}}:
|
|
|
|
|
|
```shell
|
|
kubectl get pods -l jobgroup=jobexample
|
|
```
|
|
|
|
The output is similar to:
|
|
|
|
```
|
|
NAME READY STATUS RESTARTS AGE
|
|
process-item-apple-kixwv 0/1 Completed 0 4m
|
|
process-item-banana-wrsf7 0/1 Completed 0 4m
|
|
process-item-cherry-dnfu9 0/1 Completed 0 4m
|
|
```
|
|
|
|
We can use this single command to check on the output of all jobs at once:
|
|
|
|
```shell
|
|
kubectl logs -f -l jobgroup=jobexample
|
|
```
|
|
|
|
The output should be:
|
|
|
|
```
|
|
Processing item apple
|
|
Processing item banana
|
|
Processing item cherry
|
|
```
|
|
|
|
### Clean up {#cleanup-1}
|
|
|
|
```shell
|
|
# Remove the Jobs you created
|
|
# Your cluster automatically cleans up their Pods
|
|
kubectl delete job -l jobgroup=jobexample
|
|
```
|
|
|
|
## Use advanced template parameters
|
|
|
|
In the [first example](#create-jobs-based-on-a-template), each instance of the template had one
|
|
parameter, and that parameter was also used in the Job's name. However,
|
|
[names](/docs/concepts/overview/working-with-objects/names/#names) are restricted
|
|
to contain only certain characters.
|
|
|
|
This slightly more complex example uses the
|
|
[Jinja template language](https://palletsprojects.com/p/jinja/) to generate manifests
|
|
and then objects from those manifests, with a multiple parameters for each Job.
|
|
|
|
For this part of the task, you are going to use a one-line Python script to
|
|
convert the template to a set of manifests.
|
|
|
|
First, copy and paste the following template of a Job object, into a file called `job.yaml.jinja2`:
|
|
|
|
|
|
```liquid
|
|
{%- set params = [{ "name": "apple", "url": "http://dbpedia.org/resource/Apple", },
|
|
{ "name": "banana", "url": "http://dbpedia.org/resource/Banana", },
|
|
{ "name": "cherry", "url": "http://dbpedia.org/resource/Cherry" }]
|
|
%}
|
|
{%- for p in params %}
|
|
{%- set name = p["name"] %}
|
|
{%- set url = p["url"] %}
|
|
---
|
|
apiVersion: batch/v1
|
|
kind: Job
|
|
metadata:
|
|
name: jobexample-{{ name }}
|
|
labels:
|
|
jobgroup: jobexample
|
|
spec:
|
|
template:
|
|
metadata:
|
|
name: jobexample
|
|
labels:
|
|
jobgroup: jobexample
|
|
spec:
|
|
containers:
|
|
- name: c
|
|
image: busybox
|
|
command: ["sh", "-c", "echo Processing URL {{ url }} && sleep 5"]
|
|
restartPolicy: Never
|
|
{%- endfor %}
|
|
```
|
|
|
|
The above template defines two parameters for each Job object using a list of
|
|
python dicts (lines 1-4). A `for` loop emits one Job manifest for each
|
|
set of parameters (remaining lines).
|
|
|
|
This example relies on a feature of YAML. One YAML file can contain multiple
|
|
documents (Kubernetes manifests, in this case), separated by `---` on a line
|
|
by itself.
|
|
You can pipe the output directly to `kubectl` to create the Jobs.
|
|
|
|
Next, use this one-line Python program to expand the template:
|
|
|
|
```shell
|
|
alias render_template='python -c "from jinja2 import Template; import sys; print(Template(sys.stdin.read()).render());"'
|
|
```
|
|
|
|
Use `render_template` to convert the parameters and template into a single
|
|
YAML file containing Kubernetes manifests:
|
|
|
|
```shell
|
|
# This requires the alias you defined earlier
|
|
cat job.yaml.jinja2 | render_template > jobs.yaml
|
|
```
|
|
|
|
You can view `jobs.yaml` to verify that the `render_template` script worked
|
|
correctly.
|
|
|
|
Once you are happy that `render_template` is working how you intend,
|
|
you can pipe its output into `kubectl`:
|
|
|
|
```shell
|
|
cat job.yaml.jinja2 | render_template | kubectl apply -f -
|
|
```
|
|
|
|
Kubernetes accepts and runs the Jobs you created.
|
|
|
|
### Clean up {#cleanup-2}
|
|
|
|
```shell
|
|
# Remove the Jobs you created
|
|
# Your cluster automatically cleans up their Pods
|
|
kubectl delete job -l jobgroup=jobexample
|
|
```
|
|
|
|
{{% /capture %}}
|
|
{{% capture discussion %}}
|
|
|
|
## Using Jobs in real workloads
|
|
|
|
In a real use case, each Job performs some substantial computation, such as rendering a frame
|
|
of a movie, or processing a range of rows in a database. If you were rendering a movie
|
|
you would set `$ITEM` to the frame number. If you were processing rows from a database
|
|
table, you would set `$ITEM` to represent the range of database rows to process.
|
|
|
|
In the task, you ran a command to collect the output from Pods by fetching
|
|
their logs. In a real use case, each Pod for a Job writes its output to
|
|
durable storage before completing. You can use a PersistentVolume for each Job,
|
|
or an external storage service. For example, if you are rendering frames for a movie,
|
|
use HTTP to `PUT` the rendered frame data to a URL, using a different URL for each
|
|
frame.
|
|
|
|
## Labels on Jobs and Pods
|
|
|
|
After you create a Job, Kubernetes automatically adds additional
|
|
{{< glossary_tooltip text="labels" term_id="label" >}} that
|
|
distinguish one Job's pods from another Job's pods.
|
|
|
|
In this example, each Job and its Pod template have a label:
|
|
`jobgroup=jobexample`.
|
|
|
|
Kubernetes itself pays no attention to labels named `jobgroup`. Setting a label
|
|
for all the Jobs you create from a template makes it convenient to operate on all
|
|
those Jobs at once.
|
|
In the [first example](#create-jobs-based-on-a-template) you used a template to
|
|
create several Jobs. The template ensures that each Pod also gets the same label, so
|
|
you can check on all Pods for these templated Jobs with a single command.
|
|
|
|
{{< note >}}
|
|
The label key `jobgroup` is not special or reserved.
|
|
You can pick your own labelling scheme.
|
|
There are [recommended labels](/docs/concepts/overview/working-with-objects/common-labels/#labels)
|
|
that you can use if you wish.
|
|
{{< /note >}}
|
|
|
|
## Alternatives
|
|
|
|
If you plan to create a large number of Job objects, you may find that:
|
|
|
|
- Even using labels, managing so many Jobs is cumbersome.
|
|
- If you create many Jobs in a batch, you might place high load
|
|
on the Kubernetes control plane. Alternatively, the Kubernetes API
|
|
server could rate limit you, temporarily rejecting your requests with a 429 status.
|
|
- You are limited by a {{< glossary_tooltip text="resource quota" term_id="resource-quota" >}}
|
|
on Jobs: the API server permanently rejects some of your requests
|
|
when you create a great deal of work in one batch.
|
|
|
|
There are other [job patterns](/docs/concepts/jobs/run-to-completion-finite-workloads/#job-patterns)
|
|
that you can use to process large amounts of work without creating very many Job
|
|
objects.
|
|
|
|
You could also consider writing your own [controller](/docs/concepts/architecture/controller/)
|
|
to manage Job objects automatically.
|
|
{{% /capture %}}
|