Scheduled Jobs Challenges
Cron jobs were part of the UNIX system since its early version. When GNU and Linux came into existence, crons were already part of the system. A cron job is simply a command, program, or shell script that is scheduled to run periodically. For example, a program that automatically executes log rotation must be run from time to time (daily, weekly, monthly, etc.).
However, as the application grows in scale and high availability is needed, we need our cron jobs to be highly available as well. The following challenges may face this approach:
- If we have multiple nodes hosting the application for high availability, which node handles cron?
- What happens if multiple identical cron jobs run simultaneously?
One possible solution to these challenges is to create a higher level “controller” that manages cron jobs. The controller is installed on each node, and a leader node gets elected. The leader node is the only one that can execute cron jobs. If the node is down, another node gets elected.
However, you will need to install this controller through a third-party vendor or write your own. Fortunately, you can execute periodic tasks by using Kubernetes CronJob controller, which adds a time dimension to the traditional Job controller. In this article, we demonstrate the CronJob type, its use cases, and the type of problems it solves.
A Cron Job Example
apiVersion: batch/v1beta1 kind: CronJob metadata: name: sender spec: schedule: "*/15 * * * *" jobTemplate: spec: template: spec: containers: - image: bash name: sender command: ["bash","-c","echo 'Sending information to API/database'"] restartPolicy: OnFailure
The purpose of the above definition file is to create a CronJob resource that sends data to an API or a database every fifteen minutes. We used the echo command from the bash Docker image to simulate the sending action to keep the example simple. Let's see the critical properties in this definition:
.spec.schedule (line 6): the schedule parameter defines how frequent the job should run. It uses the same cron format as Linux. If you're not familiar with the cron format, it's straightforward. We have five slots: minutes, hours, days, months, and day of the week. If we want to ignore one of them (for example, run on all months) we place a star in the slot (*).
You can also use the */ notation to denote "every x units." In our example, */15 means every fifteen minutes, the remaining slots have * so it'll run on all hours, all days, all months, and all the days of the week. For more information about the cron format, you can refer to this document.
Like the Job resource, the CronJob uses a Pod template to define the containers that this Pod hosts and the specs of those containers (the image, the startup command, etc.).
.spec.jobTemplate.spec.template.spec.restartPolicy (line 15) defines whether to restart the job. You can set this value to Never or OnFailure.
My Cron Job Didn't Start On Time. What Should I Do?
In some cases, the CronJob may not get triggered on the specified time. In such an event, there are two scenarios:
- We need to execute the job that didn't start even if it was delayed.
- We need to execute the job that didn't start only if a specific time limit was not crossed.
In our first example, the job sends information to an API that expects this information every fifteen minutes. If the data arrives late, it's useless, and the API automatically discards it. The CronJob resource offers the. spec. startingDeadlineSeconds parameter:
.spec.startingDeadlineSeconds: If the job misses the scheduled time and did not exceed that number of seconds, it should get executed. Otherwise, it is executed on the next scheduled time.
otice that if this parameter is not set, the CronJob counts all the missed jobs since the last successful execution and reschedules them with a maximum of 100 missed job. If the number of missing jobs exceeds 100, the cron job is not rescheduled.
My Cron Job is Taking so Long That it'd Span to The Next Execution Time
If the CronJob takes too long to finish, you may be in a situation where another instance of the job kicks in on its scheduled time. The CronJob resource offers the. spec.concurrencyPolicy. This parameter gives you the following options:
- concurrencyPolicy: Always allows concurrent instances of the same CronJob to run. This is the default behavior.
- concurrencyPolicy: Replace If the current job hasn't finished yet, kill it and start the newly scheduled one.
- concurrencyPolicy: Forbid when killing a running job is undesirable; we need to let it complete before starting a new one.
I Need to Execute The Cron Job Only Once
In Linux, we have the at command. The at command allows you to schedule a program to get completed but only once. This functionality can be achieved using the CronJob resource on Kubernetes using the. spec.suspend parameter.
.spec.suspend: setting it to true, this parameter suspends all subsequent CronJob executions. However, be aware that you must also use the startingDeadlineSeconds with it. The reason is that if you changed the suspend value to false, Kubernetes examines all the missed jobs that were not executed because of the suspend parameter being on. If the jobs count is less than 100, they get executed. Using the startingDeadlineSeconds setting, you can avoid this behavior as it prevents missed jobs from getting executed if they pass the defined number of seconds.
Does Cron Job Keep a History of The Jobs That Succeeded and Failed?
Most of the times, you need to know what happened when the cron job last ran. If a database update didn't occur, an API server wasn't updated or any other action that was supposed to happen as a result of the CronJob running, you'd need to know why. By default, CronJob remembers the last three succeeded jobs and the last failed one. However, those values can be changed to your preference by setting the following parameters:
- spec.successfulJobsHistoryLimit: if not set, it defaults to 3. It specifies the number of successful jobs to keep in history.
- spec.failedJobsHistoryLimit: if not set, it defaults to 1. It specifies the number of failed jobs to keep in history.
f you don't need to keep any history of execution, you can just set both values to 0.
- Cron jobs go back long in the history of UNIX and Linux. Combined with other Kubernetes technologies like Pods, containers, the scheduler, and the intelligent algorithms for Pod placement and health probes, CronJobs prove to be way more powerful than their traditional OS-level counterparts.
- Since they run on containers, CronJobs provide a lot of flexibility for the developers. They need not worry about which platform the cron job runs on and the presence of the required dependencies as everything runs on the container.
- Kubernetes handles CronJob execution, what happens when it misses an execution time, and how many times the job should run. This allows the developers to focus more on writing code and addressing business issues rather than worrying about the internals of code execution.
- The business application is still responsible for handling what happens when the cron job runs, does not run, gets canceled, or runs concurrently.