The Story So Far…

Alex, a seasoned backend engineer with over a decade of experience building and scaling monolithic applications on virtual machines, had been navigating the exciting, yet often-choppy, waters of Kubernetes at the fast-growing startup, NovaCraft. Each day brought new challenges, new learnings, and a growing appreciation for the power of container orchestration. Having mastered the art of deploying and managing stateless services, Alex was about to face a new set of requirements that would push his understanding of Kubernetes even further.

One Monday morning, Sarah, the Head of Platform Engineering, called Alex into her office. “Alex,” she began, “our user base is exploding, which is fantastic news. But it also means we have some new engineering challenges. We need to start running some heavy data processing jobs to analyze user behavior, but these jobs can’t run as part of our main application services. They are too resource-intensive and need to run to completion, then shut down. We also need to generate a daily report of key metrics, which needs to run at the same time every night. And finally, our SRE team wants to deploy a new monitoring agent on every single node in our cluster to get better visibility into system performance. I know you’ve been doing a great job with Deployments and Services, but this is a different kind of beast. I need you to figure out how we can handle these workloads in Kubernetes.”

Alex felt a familiar mix of excitement and trepidation. This was exactly the kind of meaty challenge he loved. He had spent years writing cron jobs and setting up dedicated servers for batch processing in his previous roles. How would he translate those concepts to the world of Kubernetes? It was time to dive back into the Kubernetes documentation and figure out the right tools for the job.

The Right Tool for the Job: An Introduction to Kubernetes Workload Resources

As Alex started his research, he quickly realized that Kubernetes had a rich set of resources for managing different types of workloads, far beyond the Deployments he was already familiar with. For the specific challenges Sarah had laid out, three resources stood out: Jobs, CronJobs, and DaemonSets.

Think of your Kubernetes cluster as a bustling workshop. So far, Alex has been working with Deployments, which are like the permanent workstations on the factory floor, always running and ready to serve requests. But what about tasks that are not meant to run forever? This is where our new set of tools comes in.

Jobs: The One-Off Task

A Job is like a special order that comes into the workshop. It needs to be completed once, and once it’s done, it’s done. Imagine you need to assemble a single, custom piece of furniture. You set up a temporary workbench, gather your tools, build the furniture, and once it’s finished, you clean up the workbench and it’s gone. You don’t keep the workbench running forever, waiting for another custom order.

In Kubernetes, a Job creates one or more Pods and ensures that a specified number of them successfully terminate. As soon as the task is complete, the Pods are terminated. This is perfect for run-to-completion tasks like:

  • Data migration: Running a script to migrate data from an old database to a new one.
  • Batch processing: Processing a large batch of images or videos.
  • A one-time analysis: Running a complex calculation or analysis on a dataset.

CronJobs: The Scheduled Task

A CronJob is like the workshop’s regular, scheduled maintenance task. Every night at midnight, a robot arm comes out, inspects all the machinery, and generates a report. This happens on a predictable schedule, without any manual intervention.

In Kubernetes, a CronJob creates Jobs on a repeating schedule. It uses the familiar cron syntax from the Linux world to define when the Job should run. This is ideal for tasks like:

  • Generating daily reports: Creating a summary of the previous day’s sales figures.
  • Sending out newsletters: Sending a weekly email to all your users.
  • Backing up data: Taking a snapshot of a database every few hours.

DaemonSets: The Ever-Present Helper

A DaemonSet is like the workshop’s safety inspector. You need one inspector on every single floor of the workshop to ensure that all safety regulations are being followed. If a new floor is added to the workshop, a new inspector is automatically assigned to it.

In Kubernetes, a DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. This is perfect for deploying cluster-wide agents like:

  • Log collectors: Running a log collector agent like Fluentd or Logstash on every node.
  • Monitoring agents: Running a monitoring agent like Prometheus Node Exporter or Datadog Agent on every node.
  • Cluster storage daemons: Running a storage daemon like glusterd or ceph on each node.

With these three new tools in his arsenal, Alex was ready to tackle the challenges Sarah had given him. It was time to get his hands dirty and put these concepts into practice.

Under the Hood: How They Work

To really understand how to use these resources effectively, Alex needed to dig a little deeper into how they worked under the hood.

Jobs: A Closer Look

A Job is a controller that manages Pods. When you create a Job, the Job controller creates one or more Pods and monitors them to ensure they complete successfully. The key to understanding Jobs lies in the restartPolicy of the Pods it creates. While a regular Pod can have a restartPolicy of Always, OnFailure, or Never, the Pods created by a Job can only have a restartPolicy of OnFailure or Never. This is because a Job is designed to complete. If the Pods were to restart forever, the Job would never finish.

Here’s a breakdown of the key fields in a Job manifest:

  • spec.template: This is the template for the Pods that the Job will create. It’s just like the template you would use in a Deployment or a ReplicaSet.
  • spec.completions: This field specifies how many Pods must complete successfully for the Job to be considered complete. If you don’t specify this, it defaults to 1.
  • spec.parallelism: This field specifies how many Pods can run in parallel at any given time. If you don’t specify this, it defaults to 1.
  • spec.backoffLimit: This field specifies the number of times a Job will be retried before it is marked as failed. If you don’t specify this, it defaults to 6.

CronJobs: The Scheduler

A CronJob is a controller that manages Jobs. It’s a simple but powerful concept. The CronJob controller checks every 10 seconds to see if it needs to create a new Job based on the schedule you’ve defined. The schedule is defined using the standard cron syntax, which has five fields:

# ┌───────────── minute (0 - 59)
# │ ┌───────────── hour (0 - 23)
# │ │ ┌───────────── day of the month (1 - 31)
# │ │ │ ┌───────────── month (1 - 12)
# │ │ │ │ ┌───────────── day of the week (0 - 6) (Sunday to Saturday)
# │ │ │ │ │
# │ │ │ │ │
# * * * * *

Here are the key fields in a CronJob manifest:

  • spec.schedule: This is where you define the cron schedule.
  • spec.jobTemplate: This is the template for the Jobs that the CronJob will create. It’s just like the template you would use for a regular Job.
  • spec.concurrencyPolicy: This field specifies how to handle concurrent Jobs. It can be one of three values:
    • Allow (default): Allows concurrent Jobs to run.
    • Forbid: Forbids concurrent Jobs. If it’s time for a new Job to run but the previous one hasn’t finished yet, the new Job will be skipped.
    • Replace: Replaces the currently running Job with a new one.
  • spec.successfulJobsHistoryLimit and spec.failedJobsHistoryLimit: These fields specify how many completed and failed Jobs should be kept around. This is useful for debugging.

DaemonSets: The Node Controller

A DaemonSet is a controller that ensures that a Pod is running on every node in the cluster (or a subset of nodes, if you use a nodeSelector). The DaemonSet controller watches for changes to the nodes in the cluster and ensures that the desired Pods are running. When a new node is added to the cluster, the DaemonSet controller creates a new Pod on that node. When a node is removed, the Pod is garbage collected.

Here are the key fields in a DaemonSet manifest:

  • spec.template: This is the template for the Pods that the DaemonSet will create.
  • spec.selector: This is the label selector that the DaemonSet uses to identify the Pods it manages.
  • spec.updateStrategy: This field specifies how to update the Pods when the DaemonSet template is changed. It can be one of two values:
    • RollingUpdate (default): The Pods are updated in a rolling fashion, one by one.
    • OnDelete: The Pods are only updated when they are manually deleted.

Hands-On: Putting It All Together

Now that Alex had a solid theoretical understanding of Jobs, CronJobs, and DaemonSets, it was time to put them into practice. He decided to tackle each of Sarah’s requests one by one.

1. The Data Processing Job

First up was the data processing Job. Alex needed to create a Job that would process a batch of data. For this example, he decided to create a simple Job that would calculate the value of Pi to 2000 decimal places and then print it to the logs.

Here’s the manifest for the Job:

apiVersion: batch/v1
kind: Job
metadata:
name: pi-calculator
spec:
template:
spec:
containers:
- name: pi
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
backoffLimit: 4

Alex saved this manifest as pi-job.yaml and then created the Job using kubectl:

Terminal window
kubectl apply -f pi-job.yaml

He could then check the status of the Job:

Terminal window
kubectl describe jobs/pi-calculator

And once the Job was complete, he could view the logs of the Pod to see the result:

Terminal window
kubectl logs -l job-name=pi-calculator

2. The Scheduled Report

Next, Alex needed to create a CronJob to generate a daily report. For this example, he decided to create a simple CronJob that would run every minute and print the current date and a message to the logs.

Here’s the manifest for the CronJob:

apiVersion: batch/v1
kind: CronJob
metadata:
name: daily-report
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: report-generator
image: busybox
args:
- /bin/sh
- -c
- date; echo "Generating daily report..."
restartPolicy: OnFailure

Alex saved this manifest as daily-report-cronjob.yaml and then created the CronJob:

Terminal window
kubectl apply -f daily-report-cronjob.yaml

He could then watch as the CronJob created a new Job every minute:

Terminal window
kubectl get jobs --watch

3. The Cluster-Wide Logging Agent

Finally, Alex needed to deploy a logging agent to every node in the cluster. For this, he would use a DaemonSet. He decided to use Fluentd, a popular open-source log collector.

Here’s the manifest for the DaemonSet:

apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd-elasticsearch
namespace: kube-system
labels:
k8s-app: fluentd-logging
spec:
selector:
matchLabels:
name: fluentd-elasticsearch
template:
metadata:
labels:
name: fluentd-elasticsearch
spec:
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: fluentd-elasticsearch
image: quay.io/fluentd_elasticsearch/fluentd:v2.5.2
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
terminationGracePeriodSeconds: 30
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers

Alex saved this manifest as fluentd-daemonset.yaml and then created the DaemonSet:

Terminal window
kubectl apply -f fluentd-daemonset.yaml

He could then verify that a Fluentd Pod was running on each node in the cluster:

Terminal window
kubectl get pods -n kube-system -l name=fluentd-elasticsearch -o wide

Debugging and Troubleshooting

As with any new technology, Alex knew that things wouldn’t always go smoothly. He anticipated some common issues and made a note of how to troubleshoot them.

Job and CronJob Issues

  • Job not completing: If a Job is not completing, the first thing to check is the logs of the Pods. You can use kubectl logs to view the logs. If the Pods are failing, the restartPolicy will determine what happens next. If it’s OnFailure, the Pod will be restarted. If it’s Never, the Pod will not be restarted and the Job will eventually fail.
  • CronJob not running: If a CronJob is not creating Jobs, the first thing to check is the CronJob’s schedule. Make sure it’s correct. You can also check the CronJob’s events to see if there are any errors. Use kubectl describe cronjob <cronjob-name> to view the events.
  • Too many old Jobs: By default, CronJobs will keep the last three successful Jobs and the last failed Job. If you have a CronJob that runs frequently, this can lead to a lot of old Jobs cluttering up your cluster. You can use the successfulJobsHistoryLimit and failedJobsHistoryLimit fields to control how many old Jobs are kept.

DaemonSet Issues

  • DaemonSet Pod not running on a node: If a DaemonSet Pod is not running on a particular node, the first thing to check is the node’s taints and tolerations. A DaemonSet will only schedule Pods on nodes that have matching tolerations for their taints. You can use kubectl describe node <node-name> to view the node’s taints.
  • DaemonSet Pods not updating: If you’ve updated the DaemonSet’s template but the Pods are not updating, check the updateStrategy. If it’s OnDelete, you will need to manually delete the old Pods before the new ones will be created.

Key Takeaways

After a long day of learning and experimenting, Alex had a much better understanding of how to manage different types of workloads in Kubernetes. Here are the key takeaways:

  • Jobs are for run-to-completion tasks.
  • CronJobs are for scheduled tasks.
  • DaemonSets are for running a Pod on every node in the cluster.
  • Each of these resources is a controller that manages Pods.
  • Understanding the restartPolicy of the Pods is key to understanding how these resources work.

The Story Continues…

Alex felt a sense of accomplishment. He had successfully tackled a new set of challenges and added three powerful new tools to his Kubernetes toolkit. He wrote up his findings and sent them to Sarah, who was thrilled with the results. NovaCraft could now run its batch processing jobs, generate its daily reports, and monitor its cluster with ease.

But Alex knew that his Kubernetes journey was far from over. As he was packing up his things for the day, he received a message from Sarah: “Great work today, Alex! I have another challenge for you tomorrow. We need to figure out how to manage configuration data and secrets for our applications. I’ve heard that Kubernetes has some built-in resources for this, but I need you to dig into the details.”

Alex smiled. Another day, another challenge. He was ready for it. He was starting to feel less like a newcomer and more like a seasoned Kubernetes navigator, charting a course through the vast and exciting world of container orchestration.

Next time on The Container Odyssey: We’ll join Alex as he dives into the world of ConfigMaps and Secrets, learning how to manage application configuration and sensitive data in Kubernetes.