Alex, a senior backend engineer at NovaCraft, stared at the architecture diagram on the whiteboard. His team was building a new microservice, a critical piece of the NovaCraft platform, and it needed a database. In his decade of experience with monolithic applications running on virtual machines, this was a task he could do in his sleep. Provision a VM, attach a persistent disk, install PostgreSQL, and you were golden. But here at NovaCraft, in the brave new world of Kubernetes, the old rules didn’t apply.

“So, let me get this straight,” Alex said to his teammate, a cloud-native enthusiast half his age. “When a Pod dies, all the data written to its container’s filesystem just… vanishes?”

“Yep,” she replied, sipping her cold brew. “Ephemeral by design. It’s what makes them so scalable and resilient.”

Alex sighed. “Resilient for stateless services, maybe. But a database without data is just a… well, it’s useless.” He knew this was a fundamental challenge he had to conquer. How could they build stateful, data-driven services on a platform that seemed to have a case of amnesia? This question kicked off Alex’s deep dive into the world of Kubernetes storage, a journey to give his applications a persistent memory.

The Amnesia of Containers

By default, containers are stateless. Any data created inside a container is tied to the lifecycle of that container. When the container is terminated, its filesystem is wiped clean. This is a powerful feature for building scalable, resilient applications, as it allows you to treat your containers as interchangeable cattle, not pets. But for stateful applications like databases, message queues, and key-value stores, this ephemeral nature is a significant hurdle.

To run stateful applications in Kubernetes, we need a way to store data that persists beyond the lifecycle of a single Pod. We need a way to give our Pods a memory.

Volumes: A Pod’s Trusty Backpack

Kubernetes solves this problem with Volumes. A Volume is a directory that is mounted into a Pod, but its lifecycle is not tied to the containers running in the Pod. When a container is restarted, the data in the Volume is preserved. You can think of a Volume as a backpack that a Pod carries with it. The Pod can put its important files in the backpack, and even if the Pod has to move to a new location (i.e., a new node), it can take its backpack with it, and its files will still be there.

Kubernetes supports many types of Volumes, from simple emptyDir Volumes that are tied to the lifecycle of the Pod to more durable options like gcePersistentDisk, awsElasticBlockStore, and nfs. For our database, we need a Volume that is completely independent of the Pod’s lifecycle.

PersistentVolumes and PersistentVolumeClaims: The Library Analogy

This is where PersistentVolumes (PVs) and PersistentVolumeClaims (PVCs) come into play. Let’s use an analogy to understand these concepts.

Imagine a large public library (our Kubernetes cluster). The library has a collection of books (PersistentVolumes). Each book has a specific size and category (e.g., fiction, non-fiction). The librarians (cluster administrators) are responsible for acquiring new books and managing the existing collection.

A library member (a developer) who wants to read a book doesn’t go and grab a book from the shelf directly. Instead, they go to the front desk and fill out a request form (a PersistentVolumeClaim). On the form, they specify the type of book they want (e.g., a non-fiction book of at least 200 pages). The librarian then finds a book that matches the request and checks it out to the member.

In this analogy:

  • PersistentVolume (PV): A piece of storage in the cluster that has been provisioned by an administrator. It is a resource in the cluster, just like a CPU or memory.
  • PersistentVolumeClaim (PVC): A request for storage by a user. It’s like a developer requesting a certain amount of storage with specific characteristics (e.g., access mode, size).

When a developer creates a PVC, Kubernetes acts as the librarian. It finds a suitable PV that meets the requirements of the PVC and “binds” them together. The developer can then mount the PVC into their Pod, and the Pod will have access to the persistent storage.

This separation of concerns is a cornerstone of Kubernetes storage. It allows cluster administrators to manage the underlying storage infrastructure, while developers can consume storage resources without needing to know the implementation details.

StorageClasses: The Self-Service Library

In a large, busy library, it would be inefficient for librarians to manually fulfill every book request. To streamline the process, the library could introduce a self-service system. They could categorize their books into different sections (e.g., “New Releases,” “Science Fiction,” “History”) and provide a catalog that describes each section.

This is what StorageClasses do in Kubernetes. A StorageClass provides a way for administrators to define different “classes” of storage they offer. Each class can have its own set of properties, such as:

  • Provisioner: The underlying storage provider (e.g., kubernetes.io/aws-ebs, kubernetes.io/gce-pd).
  • Parameters: Parameters specific to the provisioner (e.g., type: gp2 for AWS EBS).
  • Reclaim Policy: What happens to the underlying storage when the PV is deleted (e.g., Delete or Retain).

When a PVC is created, it can specify a StorageClass. Kubernetes will then dynamically provision a PV that belongs to that class. This is called dynamic provisioning, and it eliminates the need for administrators to pre-provision PVs, making the storage management process much more scalable and efficient.

StatefulSets: For the Creatures of Habit

Deployments are the workhorses of Kubernetes, but they are designed for stateless applications. They treat Pods as interchangeable and disposable. If a Pod is deleted, it is replaced by a new Pod with a new name, a new IP address, and a new identity.

For stateful applications like databases, this is a problem. A database needs a stable identity. It needs to know who it is, and it needs to be able to find its data. This is where StatefulSets come in. A StatefulSet is a Kubernetes controller that is specifically designed to manage stateful applications. It provides the following guarantees:

  • Stable, unique network identifiers: Each Pod in a StatefulSet has a stable hostname that is based on the name of the StatefulSet and the ordinal index of the Pod (e.g., postgres-0, postgres-1).
  • Stable, persistent storage: Each Pod in a StatefulSet gets its own PersistentVolumeClaim, and the PVC is not deleted when the Pod is deleted. This ensures that the Pod always has access to its data.
  • Ordered, graceful deployment and scaling: Pods in a StatefulSet are created and deleted in a specific, predictable order.
  • Ordered, automated rolling updates: Rolling updates to a StatefulSet are performed one Pod at a time, in reverse ordinal order.

These guarantees make StatefulSets the ideal choice for running stateful applications like databases, message queues, and other distributed systems in Kubernetes.

Running a Database on Kubernetes: To Run or Not to Run?

Before we dive into the hands-on section, it’s important to have a brief discussion about whether you should run your database on Kubernetes in the first place. While it is certainly possible, and in many cases, beneficial, it’s not always the right choice.

When to run a database on Kubernetes:

  • Development and testing: Running your database in Kubernetes during development and testing can simplify your workflow and ensure that your application is tested in an environment that is as close to production as possible.
  • Portability: If you need to be able to run your application in different environments (e.g., on-premises, multiple cloud providers), running your database in Kubernetes can provide a consistent and portable solution.
  • Automation: Kubernetes can automate many of the tasks involved in managing a database, such as provisioning, scaling, and failover.

When to use a managed database service:

  • Production workloads: For production workloads, it is often recommended to use a managed database service (e.g., Amazon RDS, Google Cloud SQL). These services are highly available, scalable, and managed by experts, which can save you a lot of time and effort.
  • Lack of expertise: If you don’t have the expertise to manage a database in Kubernetes, it’s better to use a managed service.

In our case, Alex and his team at NovaCraft have decided to run their database on Kubernetes for development and testing purposes, with the option to move to a managed service in production if needed.

Hands-On: Deploying PostgreSQL with Persistent Storage

Now, let’s get our hands dirty. We will deploy a PostgreSQL database on Kubernetes using a StatefulSet and a PersistentVolumeClaim. We will then create some data in the database and verify that the data persists even if the PostgreSQL Pod is restarted.

Prerequisites

  • A running Kubernetes cluster (e.g., minikube, Docker Desktop)
  • kubectl configured to connect to your cluster

1. Create a StorageClass

If you are using a local cluster like minikube, you can use the built-in standard StorageClass. If you are using a cloud provider, you will likely have a default StorageClass available. You can check your available StorageClasses with the following command:

Terminal window
kubectl get storageclass

For this tutorial, we will assume you have a StorageClass named standard.

2. Create a Headless Service

Before we create the StatefulSet, we need to create a headless Service. A headless Service is a Service that doesn’t have a cluster IP. It is used to provide a DNS entry for each Pod in the StatefulSet.

Create a file named postgres-service.yaml with the following content:

apiVersion: v1
kind: Service
metadata:
name: postgres
labels:
app: postgres
spec:
ports:
- port: 5432
name: web
clusterIP: None
selector:
app: postgres

Create the Service:

Terminal window
kubectl apply -f postgres-service.yaml

3. Create the StatefulSet

Now, let’s create the StatefulSet for PostgreSQL. Create a file named postgres-statefulset.yaml with the following content:

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: "postgres"
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
terminationGracePeriodSeconds: 10
containers:
- name: postgres
image: postgres:13
ports:
- containerPort: 5432
name: web
env:
- name: POSTGRES_PASSWORD
value: mysecretpassword
volumeMounts:
- name: postgres-storage
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: postgres-storage
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "standard"
resources:
requests:
storage: 1Gi

Let’s break down this manifest:

  • serviceName: "postgres": This tells the StatefulSet to use the postgres Service we created earlier.
  • replicas: 1: We are starting with a single replica of our database.
  • template: This is the Pod template that the StatefulSet will use to create the Pods.
  • volumeMounts: We are mounting a Volume named postgres-storage at /var/lib/postgresql/data, which is the directory where PostgreSQL stores its data.
  • volumeClaimTemplates: This is the template that the StatefulSet will use to create a PersistentVolumeClaim for each Pod. In this case, it will create a PVC named postgres-storage-postgres-0 with 1Gi of storage.

Create the StatefulSet:

Terminal window
kubectl apply -f postgres-statefulset.yaml

4. Verify the Deployment

After a few moments, you should see the StatefulSet, the Pod, and the PVC created:

Terminal window
kubectl get statefulset postgres
kubectl get pod postgres-0
kubectl get pvc postgres-storage-postgres-0

5. Connect to the Database and Create Data

Now, let’s connect to the database and create a table. We will use kubectl exec to run the psql command inside the PostgreSQL Pod.

Terminal window
kubectl exec -it postgres-0 -- psql -U postgres

Once you are connected to the database, create a table and insert some data:

CREATE TABLE users (id SERIAL PRIMARY KEY, name VARCHAR(255));
INSERT INTO users (name) VALUES ('Alex');
SELECT * FROM users;

You should see the following output:

id | name
----+------
1 | Alex
(1 row)

6. Simulate a Pod Failure

Now for the moment of truth. Let’s delete the PostgreSQL Pod and see if our data is still there.

Terminal window
kubectl delete pod postgres-0

The StatefulSet controller will immediately create a new Pod to replace the one we deleted. Wait for the new Pod to be in the “Running” state:

Terminal window
kubectl get pods -l app=postgres

Now, connect to the new Pod and check if the data is still there:

Terminal window
kubectl exec -it postgres-0 -- psql -U postgres
SELECT * FROM users;

You should see the same output as before! Our data has been persisted across the Pod restart. The postgres-storage-postgres-0 PVC was automatically reattached to the new postgres-0 Pod.

Debugging and Troubleshooting

  • PVC stuck in Pending state: This is a common issue. It usually means that there are no PVs that can satisfy the requirements of the PVC. Use kubectl describe pvc <pvc-name> to see the events and find out why it’s not being bound. It could be a mismatch in storage class, size, or access modes.
  • Pod in ContainerCreating state: If a Pod is stuck in this state, it might be having trouble mounting the Volume. Use kubectl describe pod <pod-name> to check for any storage-related errors.
  • Permissions issues: If your application is unable to write to the Volume, it could be a permissions issue. You might need to use a securityContext in your Pod spec to set the fsGroup to the user ID that the container is running as.

Key Takeaways

  • Kubernetes Volumes provide a way to persist data beyond the lifecycle of a container.
  • PersistentVolumes (PVs) and PersistentVolumeClaims (PVCs) provide a powerful abstraction for managing storage in a cluster.
  • StorageClasses enable dynamic provisioning of PVs, which simplifies storage management.
  • StatefulSets are the ideal choice for running stateful applications like databases in Kubernetes, as they provide stable network identifiers and persistent storage.

Story Closing + Teaser

Alex leaned back in his chair, a smile of satisfaction on his face. He had not only successfully deployed a PostgreSQL database on Kubernetes, but he had also gained a deep understanding of the fundamental concepts of Kubernetes storage. He now knew how to give his applications a persistent memory, a crucial skill for building robust, stateful services.

Just as he was about to call it a day, a notification popped up on his screen. It was a message from his manager. “Great work on the database, Alex! The team is really impressed. Now, we have another challenge for you. We need to figure out how to manage configuration and secrets for our applications in a secure and scalable way. Can you look into that?”

Alex’s smile widened. He was ready for the next chapter in his Kubernetes journey. In the next part of “The Container Odyssey,” we will explore the world of ConfigMaps and Secrets, and learn how to manage application configuration in a cloud-native way.