14-days FREE Trial

 

Right-size Kubernetes cluster, boost app performance and lower cloud infrastructure cost in 5 minutes or less

 

GET STARTED

  Blog

Kubernetes Patterns - The Batch Job Pattern

 


Pods Common Pattern

A Pod may host one or more containers in which all of them are treated as one unit. There are several ways available to create Pods. For example:

  • Naked Pods: These are the Pods that you can create directly through a definition file. In general, this is not a recommended practice because a controller does not manage the created Pods. If the Pod is terminated or the node crashed, it will not restart or reschedule on a different node.
  • ReplicaSet: This controller is suitable for Pods that need to run continuously. It can restart when it fails. For example, web servers and APIs.
  • DaemonSets: This can be used to run Pods continuously. Also, it ensures that they run on every node of the cluster. A typical scenario for using DaemonSets is when you want to collect logs from the cluster nodes and send them to a log aggregator such as ElasticSearch.

As you can see, the typical pattern here is trying to have the Pod running at all times. This pattern is a common one: you always want your service to continue responding to requests. However, in some cases, you need the container to run once and then get terminated.

Pods for running tasks

As mentioned, Kubernetes uses Pods as its building block. Any task that you need Kubernetes to run is done through a Pod. The difference between one Pod type and other lies in the controller that manages them. Kubernetes provides the Job controller, which is used to run one or more Pods and ensure that all of them terminate successfully. Let’s demonstrate an example.

Assume that you have a web application that needs a random number that it reads from a file. To create this number, we create a Job controller that spawns a Pod. The Pod writes the number to the file and exits. A Job definition may look like this:

apiVersion: batch/v1
kind: Job
metadata:
 name: seed-creator
spec:
 completions: 1
 parallelism: 1
 template:
   metadata:
     name: seed-creator
   spec:
     restartPolicy: OnFailure
     containers:
       - image: bash
         name: seed-creator
         command: ["bash","-c","echo","$RANDOM",">","/random.txt"]

Magalix trial

A Job definition contains the following unique fields:

  • completions (line 6): the number of Pods that need to be run to completion for the Job to be successful. In our example, we only specified one Pod. However, you can specify more if needed.
  • parallelism (line 7): the number of Pods that can be run in parallel. If this number is 1 this means that only one Pod can be run at a time. Subsequent Pods, if any, will run after this one is finished.
  • restartPolicy (line 12): this line is optional in a ReplicaSet, but it’s required in a Job. Generally, restartPolicy accepts always, never, and onFailure. However, since a Job is mainly used to run a Pod till completion, you are only allowed to use never and onFailure for restartPolicy.

Using a Job vs. using a bare Pod

You may be asking why bother using a specific definition for a Job, or you could’ve equally used a naked (bare) Pod. For example, the following Pod definition will achieve the same goal as the Job in the previous example:

apiVersion: v1
kind: Pod
metadata:
 name: seed-creator
spec:
   restartPolicy: OnFailure
   containers:
     - image: bash
       name: seed-creator
       command: ["bash","-c","echo","$RANDOM",">","/random.txt"]

 

Applying this definition creates a Pod that executes the same command with the same options. So, why (and when) to use a Job?

You control the number of times the Pod should run by the completion’s parameter. With a bare Pod, you will have to do this manually.

Using the parallelism parameter, you can scale up the number of running Pods.

If the node fails while the Job Pod is running, the Job controller reschedules this Pod to a healthy node. A bare Pod remains failing until you manually delete it and start it on another node.

So, as you can see, a Job lifts a lot of the administrative burden by automatically managing the Pods.

The Job patterns

The completions and parallelism parameters allow you to utilize different Job patterns depending on your environment and requirements. Let’s have a look:

Single Job Pattern: This pattern is used when you want to execute only one task. You can use this pattern when you set both the completions and parallelism values to 1. Alternatively, you can omit them from the definition file, and they automatically use the default values (1). The job is considered done when the exit status of the Pod is 0. Refer to the first example in this article which uses the Single Job Pattern.

 

batch1

 

 

Start 14-day trial

Fixed-count Job Pattern: When you need the task to be executed in a specific number of times. For example, you need to read precisely five files and insert their contents in a database. After each file is read, it gets deleted so that the next iteration reads the following file. For this pattern, you need to set the completions pattern to be higher than 1 (5 in our example). The parallelism parameter is optional here.

 

 

batch2

 

Work-queue Job Pattern: You should use this pattern when you have an undefined number of tasks that need to be done. The typical use case here is message queues. If you need to consume messages from a message queue until it is empty, you create a Job, set the completions count to 1 (or omit it) and set the parallelism value to be greater than 1 to have high throughput.

A Job is considered successful when at least one Pod terminates successfully, and all other Pods terminate as well. Since more than one Pod is running in parallel, it is the responsibility of each Pod to coordinate with other Pods regarding which items every Pod is working on. The first Pod that detects an empty queue will terminate with an exit status of 0. Other Pods also end as soon as they finish processing the messages they’re consuming.

 

 

batch3

Magalix Trial

If you have an indefinite number of work items that need to be processed (think of Twitter messages, for example), then you should consider other controllers like ReplicaSets. The reason is that such queue types need Pods that are always running and restarted when they fail.

TL;DR

  • In general, Kubernetes controllers are meant to keep their Pods in “always” running state. If the Pod crashes, it gets restarted automatically.
  • When you need to run one or more tasks, you use Pods because they are Kubernetes building blocks. However, you use the Job controller as it is more suited to this case.
  • A Job controller is highly recommended over using a bare Pod as it monitors the Pod execution process and restarts it if it fails, but only to complete the task.
  • There are three patterns for using the Job controller:
    • Single: the simplest type. Just execute the task and that’s it.
    • Fixed count: we need to execute the job a specific number of times
    • Work queue: we need “worker” Pods that are continuously running till the master task is complete. The most prominent example of this type of pattern is message queue consumers.

 

 

Mohamed Ahmed

Sep 30, 2019