diff --git a/content/en/docs/tasks/job/coarse-parallel-processing-work-queue.md b/content/en/docs/tasks/job/coarse-parallel-processing-work-queue.md index b052a04826..3f94e249eb 100644 --- a/content/en/docs/tasks/job/coarse-parallel-processing-work-queue.md +++ b/content/en/docs/tasks/job/coarse-parallel-processing-work-queue.md @@ -1,6 +1,5 @@ --- title: Coarse Parallel Processing Using a Work Queue -min-kubernetes-server-version: v1.8 content_type: task weight: 20 --- @@ -8,7 +7,7 @@ weight: 20 -In this example, we will run a Kubernetes Job with multiple parallel +In this example, you will run a Kubernetes Job with multiple parallel worker processes. In this example, as each pod is created, it picks up one unit of work @@ -16,7 +15,7 @@ from a task queue, completes it, deletes it from the queue, and exits. Here is an overview of the steps in this example: -1. **Start a message queue service.** In this example, we use RabbitMQ, but you could use another +1. **Start a message queue service.** In this example, you use RabbitMQ, but you could use another one. In practice you would set up a message queue service once and reuse it for many jobs. 1. **Create a queue, and fill it with messages.** Each message represents one task to be done. In this example, a message is an integer that we will do a lengthy computation on. @@ -26,11 +25,16 @@ Here is an overview of the steps in this example: ## {{% heading "prerequisites" %}} -Be familiar with the basic, +You should already be familiar with the basic, non-parallel, use of [Job](/docs/concepts/workloads/controllers/job/). {{< include "task-tutorial-prereqs.md" >}} +You will need a container image registry where you can upload images to run in your cluster. + +This task example also assumes that you have Docker installed locally. + + ## Starting a message queue service @@ -43,21 +47,20 @@ cluster and reuse it for many jobs, as well as for long-running services. Start RabbitMQ as follows: ```shell -kubectl create -f https://raw.githubusercontent.com/kubernetes/kubernetes/release-1.3/examples/celery-rabbitmq/rabbitmq-service.yaml +# make a Service for the StatefulSet to use +kubectl create -f https://kubernetes.io/examples/application/job/rabbitmq-service.yaml ``` ``` service "rabbitmq-service" created ``` ```shell -kubectl create -f https://raw.githubusercontent.com/kubernetes/kubernetes/release-1.3/examples/celery-rabbitmq/rabbitmq-controller.yaml +kubectl create -f https://kubernetes.io/examples/application/job/rabbitmq-statefulset.yaml ``` ``` -replicationcontroller "rabbitmq-controller" created +statefulset "rabbitmq" created ``` -We will only use the rabbitmq part from the [celery-rabbitmq example](https://github.com/kubernetes/kubernetes/tree/release-1.3/examples/celery-rabbitmq). - ## Testing the message queue service Now, we can experiment with accessing the message queue. We will @@ -68,7 +71,7 @@ First create a temporary interactive Pod. ```shell # Create a temporary interactive container -kubectl run -i --tty temp --image ubuntu:18.04 +kubectl run -i --tty temp --image ubuntu:22.04 ``` ``` Waiting for pod default/temp-loe07 to be running, status is Pending, pod ready: false @@ -77,76 +80,82 @@ Waiting for pod default/temp-loe07 to be running, status is Pending, pod ready: Note that your pod name and command prompt will be different. -Next install the `amqp-tools` so we can work with message queues. +Next install the `amqp-tools` so you can work with message queues. +The next commands show what you need to run inside the interactive shell in that Pod: ```shell -# Install some tools -root@temp-loe07:/# apt-get update -.... [ lots of output ] .... -root@temp-loe07:/# apt-get install -y curl ca-certificates amqp-tools python dnsutils -.... [ lots of output ] .... +apt-get update && apt-get install -y curl ca-certificates amqp-tools python dnsutils ``` -Later, we will make a docker image that includes these packages. +Later, you will make a container image that includes these packages. -Next, we will check that we can discover the rabbitmq service: +Next, you will check that you can discover the Service for RabbitMQ: ``` +# Run these commands inside the Pod # Note the rabbitmq-service has a DNS name, provided by Kubernetes: - -root@temp-loe07:/# nslookup rabbitmq-service +nslookup rabbitmq-service +``` +``` Server: 10.0.0.10 Address: 10.0.0.10#53 Name: rabbitmq-service.default.svc.cluster.local Address: 10.0.147.152 - -# Your address will vary. ``` +(the IP addresses will vary) -If Kube-DNS is not set up correctly, the previous step may not work for you. -You can also find the service IP in an env var: - -``` -# env | grep RABBIT | grep HOST -RABBITMQ_SERVICE_SERVICE_HOST=10.0.147.152 -# Your address will vary. -``` - -Next we will verify we can create a queue, and publish and consume messages. +If the kube-dns addon is not set up correctly, the previous step may not work for you. +You can also find the IP address for that Service in an environment variable: ```shell +# run this check inside the Pod +env | grep RABBITMQ_SERVICE | grep HOST +``` +``` +RABBITMQ_SERVICE_SERVICE_HOST=10.0.147.152 +``` +(the IP address will vary) + +Next you will verify that you can create a queue, and publish and consume messages. + +```shell +# Run these commands inside the Pod # In the next line, rabbitmq-service is the hostname where the rabbitmq-service # can be reached. 5672 is the standard port for rabbitmq. - -root@temp-loe07:/# export BROKER_URL=amqp://guest:guest@rabbitmq-service:5672 +export BROKER_URL=amqp://guest:guest@rabbitmq-service:5672 # If you could not resolve "rabbitmq-service" in the previous step, # then use this command instead: -# root@temp-loe07:/# BROKER_URL=amqp://guest:guest@$RABBITMQ_SERVICE_SERVICE_HOST:5672 +BROKER_URL=amqp://guest:guest@$RABBITMQ_SERVICE_SERVICE_HOST:5672 # Now create a queue: -root@temp-loe07:/# /usr/bin/amqp-declare-queue --url=$BROKER_URL -q foo -d +/usr/bin/amqp-declare-queue --url=$BROKER_URL -q foo -d +``` +``` foo +``` -# Publish one message to it: - -root@temp-loe07:/# /usr/bin/amqp-publish --url=$BROKER_URL -r foo -p -b Hello +Publish one message to the queue: +```shell +/usr/bin/amqp-publish --url=$BROKER_URL -r foo -p -b Hello # And get it back. -root@temp-loe07:/# /usr/bin/amqp-consume --url=$BROKER_URL -q foo -c 1 cat && echo +/usr/bin/amqp-consume --url=$BROKER_URL -q foo -c 1 cat && echo 1>&2 +``` +``` Hello -root@temp-loe07:/# ``` -In the last command, the `amqp-consume` tool takes one message (`-c 1`) -from the queue, and passes that message to the standard input of an arbitrary command. In this case, the program `cat` prints out the characters read from standard input, and the echo adds a carriage -return so the example is readable. +In the last command, the `amqp-consume` tool took one message (`-c 1`) +from the queue, and passes that message to the standard input of an arbitrary command. +In this case, the program `cat` prints out the characters read from standard input, and +the echo adds a carriage return so the example is readable. -## Filling the Queue with tasks +## Fill the queue with tasks -Now let's fill the queue with some "tasks". In our example, our tasks are strings to be +Now, fill the queue with some simulated tasks. In this example, the tasks are strings to be printed. In a practice, the content of the messages might be: @@ -157,18 +166,22 @@ In a practice, the content of the messages might be: - configuration parameters to a simulation - frame numbers of a scene to be rendered -In practice, if there is large data that is needed in a read-only mode by all pods -of the Job, you will typically put that in a shared file system like NFS and mount -that readonly on all the pods, or the program in the pod will natively read data from -a cluster file system like HDFS. +If there is large data that is needed in a read-only mode by all pods +of the Job, you typically put that in a shared file system like NFS and mount +that readonly on all the pods, or write the program in the pod so that it can natively read +data from a cluster file system (for example: HDFS). -For our example, we will create the queue and fill it using the amqp command line tools. -In practice, you might write a program to fill the queue using an amqp client library. +For this example, you will create the queue and fill it using the AMQP command line tools. +In practice, you might write a program to fill the queue using an AMQP client library. ```shell +# Run this on your computer, not in the Pod /usr/bin/amqp-declare-queue --url=$BROKER_URL -q job1 -d +``` +``` job1 ``` +Add items to the queue: ```shell for f in apple banana cherry date fig grape lemon melon do @@ -176,14 +189,14 @@ do done ``` -So, we filled the queue with 8 messages. +You added 8 messages to the queue. -## Create an Image +## Create a container image -Now we are ready to create an image that we will run as a job. +Now you are ready to create an image that you will run as a Job. -We will use the `amqp-consume` utility to read the message -from the queue and run our actual program. Here is a very simple +The job will use the `amqp-consume` utility to read the message +from the queue and run the actual work. Here is a very simple example program: {{% code_sample language="python" file="application/job/rabbitmq/worker.py" %}} @@ -194,9 +207,7 @@ Give the script execution permission: chmod +x worker.py ``` -Now, build an image. If you are working in the source -tree, then change directory to `examples/job/work-queue-1`. -Otherwise, make a temporary directory, change to it, +Now, build an image. Make a temporary directory, change to it, download the [Dockerfile](/examples/application/job/rabbitmq/Dockerfile), and [worker.py](/examples/application/job/rabbitmq/worker.py). In either case, build the image with this command: @@ -214,33 +225,27 @@ docker tag job-wq-1 /job-wq-1 docker push /job-wq-1 ``` -If you are using [Google Container -Registry](https://cloud.google.com/tools/container-registry/), tag -your app image with your project ID, and push to GCR. Replace -`` with your project ID. - -```shell -docker tag job-wq-1 gcr.io//job-wq-1 -gcloud docker -- push gcr.io//job-wq-1 -``` +If you are using an alternative container image registry, tag the +image and push it there instead. ## Defining a Job -Here is a job definition. You'll need to make a copy of the Job and edit the -image to match the name you used, and call it `./job.yaml`. - +Here is a manifest for a Job. You'll need to make a copy of the Job manifest +(call it `./job.yaml`), +and edit the name of the container image to match the name you used. {{% code_sample file="application/job/rabbitmq/job.yaml" %}} In this example, each pod works on one item from the queue and then exits. So, the completion count of the Job corresponds to the number of work items -done. So we set, `.spec.completions: 8` for the example, since we put 8 items in the queue. +done. That is why the example manifest has `.spec.completions` set to `8`. ## Running the Job -So, now run the Job: +Now, run the Job: ```shell +# this assumes you downloaded and then edited the manifest already kubectl apply -f ./job.yaml ``` @@ -264,14 +269,14 @@ Labels: controller-uid=41d75705-92df-11e7-b85e-fa163ee3c11f Annotations: Parallelism: 2 Completions: 8 -Start Time: Wed, 06 Sep 2017 16:42:02 +0800 +Start Time: Wed, 06 Sep 2022 16:42:02 +0000 Pods Statuses: 0 Running / 8 Succeeded / 0 Failed Pod Template: Labels: controller-uid=41d75705-92df-11e7-b85e-fa163ee3c11f job-name=job-wq-1 Containers: c: - Image: gcr.io/causal-jigsaw-637/job-wq-1 + Image: container-registry.example/causal-jigsaw-637/job-wq-1 Port: Environment: BROKER_URL: amqp://guest:guest@rabbitmq-service:5672 @@ -293,30 +298,31 @@ Events: -All the pods for that Job succeeded. Yay. - +All the pods for that Job succeeded! You're done. ## Alternatives -This approach has the advantage that you -do not need to modify your "worker" program to be aware that there is a work queue. +This approach has the advantage that you do not need to modify your "worker" program to be +aware that there is a work queue. You can include the worker program unmodified in your container +image. -It does require that you run a message queue service. +Using this approach does require that you run a message queue service. If running a queue service is inconvenient, you may want to consider one of the other [job patterns](/docs/concepts/workloads/controllers/job/#job-patterns). This approach creates a pod for every work item. If your work items only take a few seconds, though, creating a Pod for every work item may add a lot of overhead. Consider another -[example](/docs/tasks/job/fine-parallel-processing-work-queue/), that executes multiple work items per Pod. +design, such as in the [fine parallel work queue example](/docs/tasks/job/fine-parallel-processing-work-queue/), +that executes multiple work items per Pod. -In this example, we use the `amqp-consume` utility to read the message -from the queue and run our actual program. This has the advantage that you +In this example, you used the `amqp-consume` utility to read the message +from the queue and run the actual program. This has the advantage that you do not need to modify your program to be aware of the queue. -A [different example](/docs/tasks/job/fine-parallel-processing-work-queue/), shows how to -communicate with the work queue using a client library. +The [fine parallel work queue example](/docs/tasks/job/fine-parallel-processing-work-queue/) +shows how to communicate with the work queue using a client library. ## Caveats @@ -327,11 +333,11 @@ If the number of completions is set to more than the number of items in the queu then the Job will not appear to be completed, even though all items in the queue have been processed. It will start additional pods which will block waiting for a message. +You would need to make your own mechanism to spot when there is work +to do and measure the size of the queue, setting the number of completions to match. There is an unlikely race with this pattern. If the container is killed in between the time -that the message is acknowledged by the amqp-consume command and the time that the container +that the message is acknowledged by the `amqp-consume` command and the time that the container exits with success, or if the node crashes before the kubelet is able to post the success of the pod -back to the api-server, then the Job will not appear to be complete, even though all items +back to the API server, then the Job will not appear to be complete, even though all items in the queue have been processed. - - diff --git a/content/en/docs/tasks/job/fine-parallel-processing-work-queue.md b/content/en/docs/tasks/job/fine-parallel-processing-work-queue.md index 258e90ecf0..b865f49fd7 100644 --- a/content/en/docs/tasks/job/fine-parallel-processing-work-queue.md +++ b/content/en/docs/tasks/job/fine-parallel-processing-work-queue.md @@ -1,23 +1,23 @@ --- title: Fine Parallel Processing Using a Work Queue content_type: task -min-kubernetes-server-version: v1.8 weight: 30 --- -In this example, we will run a Kubernetes Job with multiple parallel -worker processes in a given pod. +In this example, you will run a Kubernetes Job that runs multiple parallel +tasks as worker processes, each running as a separate Pod. In this example, as each pod is created, it picks up one unit of work from a task queue, processes it, and repeats until the end of the queue is reached. Here is an overview of the steps in this example: -1. **Start a storage service to hold the work queue.** In this example, we use Redis to store - our work items. In the previous example, we used RabbitMQ. In this example, we use Redis and - a custom work-queue client library because AMQP does not provide a good way for clients to +1. **Start a storage service to hold the work queue.** In this example, you will use Redis to store + work items. In the [previous example](/docs/tasks/job/coarse-parallel-processing-work-queue), + you used RabbitMQ. In this example, you will use Redis and a custom work-queue client library; + this is because AMQP does not provide a good way for clients to detect when a finite-length work queue is empty. In practice you would set up a store such as Redis once and reuse it for the work queues of many jobs, and other things. 1. **Create a queue, and fill it with messages.** Each message represents one task to be done. In @@ -30,6 +30,13 @@ Here is an overview of the steps in this example: {{< include "task-tutorial-prereqs.md" >}} +You will need a container image registry where you can upload images to run in your cluster. +The example uses [Docker Hub](https://hub.docker.com/), but you could adapt it to a different +container image registry. + +This task example also assumes that you have Docker installed locally. You use Docker to +build container images. + Be familiar with the basic, @@ -39,7 +46,7 @@ non-parallel, use of [Job](/docs/concepts/workloads/controllers/job/). ## Starting Redis -For this example, for simplicity, we will start a single instance of Redis. +For this example, for simplicity, you will start a single instance of Redis. See the [Redis Example](https://github.com/kubernetes/examples/tree/master/guestbook) for an example of deploying Redis scalably and redundantly. @@ -53,23 +60,27 @@ You could also download the following files directly: - [`worker.py`](/examples/application/job/redis/worker.py) -## Filling the Queue with tasks +## Filling the queue with tasks -Now let's fill the queue with some "tasks". In our example, our tasks are strings to be +Now let's fill the queue with some "tasks". In this example, the tasks are strings to be printed. Start a temporary interactive pod for running the Redis CLI. ```shell kubectl run -i --tty temp --image redis --command "/bin/sh" +``` +``` Waiting for pod default/redis2-c7h78 to be running, status is Pending, pod ready: false Hit enter for command prompt ``` -Now hit enter, start the redis CLI, and create a list with some work items in it. +Now hit enter, start the Redis CLI, and create a list with some work items in it. +```shell +redis-cli -h redis ``` -# redis-cli -h redis +```console redis:6379> rpush job2 "apple" (integer) 1 redis:6379> rpush job2 "banana" @@ -100,21 +111,21 @@ redis:6379> lrange job2 0 -1 9) "orange" ``` -So, the list with key `job2` will be our work queue. +So, the list with key `job2` will be the work queue. Note: if you do not have Kube DNS setup correctly, you may need to change the first step of the above block to `redis-cli -h $REDIS_SERVICE_HOST`. -## Create an Image +## Create a container image {#create-an-image} -Now we are ready to create an image that we will run. +Now you are ready to create an image that will process the work in that queue. -We will use a python worker program with a redis client to read +You're going to use a Python worker program with a Redis client to read the messages from the message queue. A simple Redis work queue client library is provided, -called rediswq.py ([Download](/examples/application/job/redis/rediswq.py)). +called `rediswq.py` ([Download](/examples/application/job/redis/rediswq.py)). The "worker" program in each Pod of the Job uses the work queue client library to get work. Here it is: @@ -124,7 +135,7 @@ client library to get work. Here it is: You could also download [`worker.py`](/examples/application/job/redis/worker.py), [`rediswq.py`](/examples/application/job/redis/rediswq.py), and [`Dockerfile`](/examples/application/job/redis/Dockerfile) files, then build -the image: +the container image. Here's an example using Docker to do the image build: ```shell docker build -t job-wq-2 . @@ -144,46 +155,40 @@ docker push /job-wq-2 You need to push to a public repository or [configure your cluster to be able to access your private repository](/docs/concepts/containers/images/). -If you are using [Google Container -Registry](https://cloud.google.com/tools/container-registry/), tag -your app image with your project ID, and push to GCR. Replace -`` with your project ID. - -```shell -docker tag job-wq-2 gcr.io//job-wq-2 -gcloud docker -- push gcr.io//job-wq-2 -``` - ## Defining a Job -Here is the job definition: +Here is a manifest for the Job you will create: {{% code_sample file="application/job/redis/job.yaml" %}} -Be sure to edit the job template to +{{< note >}} +Be sure to edit the manifest to change `gcr.io/myproject` to your own path. +{{< /note >}} In this example, each pod works on several items from the queue and then exits when there are no more items. Since the workers themselves detect when the workqueue is empty, and the Job controller does not know about the workqueue, it relies on the workers to signal when they are done working. -The workers signal that the queue is empty by exiting with success. So, as soon as any worker -exits with success, the controller knows the work is done, and the Pods will exit soon. -So, we set the completion count of the Job to 1. The job controller will wait for the other pods to complete -too. - +The workers signal that the queue is empty by exiting with success. So, as soon as **any** worker +exits with success, the controller knows the work is done, and that the Pods will exit soon. +So, you need to set the completion count of the Job to 1. The job controller will wait for +the other pods to complete too. ## Running the Job So, now run the Job: ```shell +# this assumes you downloaded and then edited the manifest already kubectl apply -f ./job.yaml ``` -Now wait a bit, then check on the job. +Now wait a bit, then check on the Job: ```shell kubectl describe jobs/job-wq-2 +``` +``` Name: job-wq-2 Namespace: default Selector: controller-uid=b1c7e4e3-92e1-11e7-b85e-fa163ee3c11f @@ -192,14 +197,14 @@ Labels: controller-uid=b1c7e4e3-92e1-11e7-b85e-fa163ee3c11f Annotations: Parallelism: 2 Completions: -Start Time: Mon, 11 Jan 2016 17:07:59 -0800 +Start Time: Mon, 11 Jan 2022 17:07:59 +0000 Pods Statuses: 1 Running / 0 Succeeded / 0 Failed Pod Template: Labels: controller-uid=b1c7e4e3-92e1-11e7-b85e-fa163ee3c11f job-name=job-wq-2 Containers: c: - Image: gcr.io/exampleproject/job-wq-2 + Image: container-registry.example/exampleproject/job-wq-2 Port: Environment: Mounts: @@ -227,7 +232,7 @@ Working on date Working on lemon ``` -As you can see, one of our pods worked on several work units. +As you can see, one of the pods for this Job worked on several work units. @@ -238,8 +243,7 @@ want to consider one of the other [job patterns](/docs/concepts/workloads/controllers/job/#job-patterns). If you have a continuous stream of background processing work to run, then -consider running your background workers with a `ReplicaSet` instead, +consider running your background workers with a ReplicaSet instead, and consider running a background processing library such as [https://github.com/resque/resque](https://github.com/resque/resque). - diff --git a/content/en/examples/application/job/rabbitmq/rabbitmq-service.yaml b/content/en/examples/application/job/rabbitmq/rabbitmq-service.yaml new file mode 100644 index 0000000000..2f7fb06dcf --- /dev/null +++ b/content/en/examples/application/job/rabbitmq/rabbitmq-service.yaml @@ -0,0 +1,12 @@ +apiVersion: v1 +kind: Service +metadata: + labels: + component: rabbitmq + name: rabbitmq-service +spec: + ports: + - port: 5672 + selector: + app.kubernetes.io/name: task-queue + app.kubernetes.io/component: rabbitmq diff --git a/content/en/examples/application/job/rabbitmq/rabbitmq-statefulset.yaml b/content/en/examples/application/job/rabbitmq/rabbitmq-statefulset.yaml new file mode 100644 index 0000000000..502598ddf9 --- /dev/null +++ b/content/en/examples/application/job/rabbitmq/rabbitmq-statefulset.yaml @@ -0,0 +1,36 @@ +apiVersion: apps/v1 +kind: StatefulSet +metadata: + labels: + component: rabbitmq + name: rabbitmq +spec: + replicas: 1 + serviceName: rabbitmq-service + selector: + matchLabels: + app.kubernetes.io/name: task-queue + app.kubernetes.io/component: rabbitmq + template: + metadata: + labels: + app.kubernetes.io/name: task-queue + app.kubernetes.io/component: rabbitmq + spec: + containers: + - image: rabbitmq + name: rabbitmq + ports: + - containerPort: 5672 + resources: + requests: + memory: 16M + limits: + cpu: 250m + memory: 512M + volumeMounts: + - mountPath: /var/lib/rabbitmq + name: rabbitmq-data + volumes: + - name: rabbitmq-data + emptyDir: {}