Part 8: The Night Watch — Health Checks, Resource Management, and Observability
The Night Watch — Health Checks, Resource Management, and Observability
The 3 AM Page
The shrill, insistent wail of his phone shattered the 3 AM silence. Alex fumbled for it, his heart pounding a frantic rhythm against his ribs. The screen glowed with a PagerDuty alert: P1 - API Service Unavailable - OOMKilled.
“OOMKilled? Out of Memory?” he mumbled, already swinging his legs out of bed and reaching for his laptop. As a seasoned engineer with over a decade of experience building and maintaining monolithic applications on virtual machines, Alex was no stranger to late-night emergencies. But this was different. This was NovaCraft, his new home for the past three months, and their entire infrastructure ran on a technology he was still grappling with: Kubernetes.
He logged into the company’s dashboard, his eyes scanning the graphs. The main API service, a critical component for their flagship product, was in a crash loop. The pod restarts were relentless, and the reason was always the same: OOMKilled. The application was consuming more memory than it was allocated, and Kubernetes, in its infinite wisdom, was terminating it. But why now? The service had been running fine for weeks.
He tried to remember what he knew about resource management in Kubernetes. It was a far cry from the static world of VMs where you provisioned a machine with a fixed amount of RAM and CPU and that was that. Here, in this dynamic world of containers and pods, resources were fluid, requested and limited, but the specifics were still hazy to him. He felt a familiar pang of imposter syndrome. He was a senior engineer, yet he felt like a novice in this new containerized world.
He spent the next hour trying to stabilize the service, a temporary fix by bumping up the memory limits on the deployment. It was a band-aid, not a solution. The sun was beginning to rise, casting a pale light across his home office, and Alex knew this was just the beginning. He had to understand what had happened, why it had happened, and how to prevent it from ever happening again. He had to learn how to tame the beast that was Kubernetes resource management. His journey into the heart of the container orchestrator was about to take a deep, and very practical, turn.
The Guardians of Pod Health: Probes
Alex’s 3 AM adventure was a classic case of a service failing silently until it was too late. In the world of Kubernetes, where applications are expected to be resilient and self-healing, this shouldn’t happen. The key to preventing such scenarios lies in a simple yet powerful mechanism: probes. Probes are diagnostics performed periodically by the kubelet (the primary node agent) on a container to determine its health.
Think of probes as a doctor performing regular check-ups on your application. These check-ups help Kubernetes make intelligent decisions about whether to send traffic to a pod or whether to restart it. There are three types of probes, each serving a distinct purpose:
| Probe Type | Purpose | Analogy |
|---|---|---|
| Liveness | Determines if a container is running. If the probe fails, the kubelet kills and restarts the container. | A heart monitor. If the heart stops beating, immediate intervention (a restart) is required to revive the patient. |
| Readiness | Determines if a container is ready to accept traffic. If the probe fails, the pod’s IP is removed from the Service’s endpoints. | A shop’s “Open” sign. Even if the shop is staffed (alive), it might not be ready for customers (e.g., still stocking shelves). |
| Startup | Determines if a container has started successfully. It disables liveness and readiness checks until it succeeds. | A pre-flight check for a rocket. You don’t want to start the main engines (liveness/readiness) until all systems are go. |
Liveness Probes: Is the Application Alive?
A liveness probe checks if your application is still running. If your application is running but in a broken state—for example, a deadlock—it won’t be able to respond to the liveness probe. The kubelet will then restart the container, attempting to recover the application. This is the self-healing power of Kubernetes in action.
Readiness Probes: Is the Application Ready for Visitors?
A readiness probe, on the other hand, signals whether your application is ready to serve traffic. A pod can be alive but not ready. For instance, an application might need to load a large dataset into memory or warm up a cache before it can start serving requests. During this time, the liveness probe would pass (the process is running), but the readiness probe would fail. Kubernetes will wait for the readiness probe to succeed before allowing traffic to be sent to the pod. This prevents users from experiencing errors or slow responses.
Startup Probes: A Graceful Start
For slow-starting applications, there’s the startup probe. Some applications, especially legacy ones or those with heavy initialization tasks, can take a long time to start. If you set a liveness probe with a short initial delay, Kubernetes might restart the application before it even has a chance to become fully operational. The startup probe addresses this by providing an initial grace period. Liveness and readiness probes are disabled until the startup probe succeeds, ensuring that the application has enough time to start up properly.
Taming the Beast: Resource Requests and Limits
Alex’s temporary fix of increasing the memory limit highlights a crucial aspect of Kubernetes: resource management. In a shared environment like a Kubernetes cluster, it’s essential to manage how much CPU and memory each container is allowed to consume. This prevents a single misbehaving application from starving other applications of resources and destabilizing the entire cluster.
Kubernetes provides a simple yet powerful mechanism for this: requests and limits.
- Requests: This is the amount of resources that Kubernetes guarantees to a container. The scheduler uses this information to find a suitable node for the pod. If a node has enough available resources to satisfy the container’s request, the pod can be scheduled on that node.
- Limits: This is the maximum amount of resources that a container is allowed to use. If a container tries to exceed its CPU limit, it will be throttled. If it tries to exceed its memory limit, it will be terminated with an
OOMKillederror, just like Alex’s API service.
| Resource | Request | Limit |
|---|---|---|
| CPU | The minimum CPU power guaranteed to the container. | The maximum CPU power the container can use. If exceeded, it gets throttled. |
| Memory | The amount of memory guaranteed to the container. | The maximum memory the container can use. If exceeded, it gets terminated. |
CPU is measured in “cores” (or “millicores,” e.g., 100m is 0.1 cores), and memory is measured in bytes (e.g., 128Mi for mebibytes).
By setting requests and limits, you provide Kubernetes with valuable information about your application’s resource needs. This allows the scheduler to make better decisions about where to place your pods and ensures that your applications have the resources they need to run reliably.
Quality of Service (QoS) Classes
Based on the requests and limits you set, Kubernetes assigns a Quality of Service (QoS) class to your pods. This class determines how Kubernetes prioritizes and handles your pods, especially when a node is under resource pressure.
There are three QoS classes:
-
Guaranteed: These are the highest priority pods. They are guaranteed to have the resources they request and are the last to be killed if the node runs out of resources. A pod is assigned the
Guaranteedclass if every container in the pod has a memory and CPU request and limit, and they are equal. -
Burstable: These pods have a lower priority than
Guaranteedpods. They are allowed to “burst” and use more resources than they requested, up to their limit. A pod is assigned theBurstableclass if at least one container in the pod has a memory or CPU request. -
BestEffort: These are the lowest priority pods. They have no resource requests or limits and are the first to be killed if the node runs out of resources. A pod is assigned the
BestEffortclass if no container in the pod has a memory or CPU request or limit.
Understanding QoS classes is crucial for running production workloads on Kubernetes. For critical services like Alex’s API, you would want to use the Guaranteed or Burstable class to ensure they have the resources they need to operate reliably.
Scaling on Demand: The Horizontal Pod Autoscaler
While requests and limits help manage resources for individual pods, what happens when the load on your application increases? You could manually scale up the number of pods, but that’s not a very scalable or efficient solution. This is where the Horizontal Pod Autoscaler (HPA) comes in.
The HPA automatically scales the number of pods in a deployment or replica set based on observed CPU utilization or other select metrics. This allows your application to handle increased traffic gracefully and scale back down when the traffic subsides, saving you money and resources.
To use the HPA, you need to have a Metrics Server deployed in your cluster. The Metrics Server collects resource metrics from the kubelet on each node and exposes them to the Kubernetes API. The HPA controller then queries the Metrics Server to get the metrics for the pods it’s managing and scales the number of pods accordingly.
For example, you can configure an HPA to maintain an average CPU utilization of 50% across all pods. If the CPU utilization exceeds this target, the HPA will create new pods. If the utilization drops below the target, it will terminate some of the pods.
Basic Observability: Seeing What’s Inside
Alex’s initial struggle to understand the OOMKilled error highlights the importance of observability. You can’t fix what you can’t see. Kubernetes provides several tools to help you monitor the health and resource consumption of your applications.
-
kubectl top: This command allows you to view the CPU and memory consumption of pods and nodes. It’s a great way to get a quick overview of which pods are using the most resources. -
Metrics Server: As mentioned earlier, the Metrics Server is a crucial component for autoscaling. It also provides the data for the
kubectl topcommand.
While kubectl top and the Metrics Server provide basic observability, for production environments, you’ll want to use a more comprehensive monitoring solution like Prometheus and Grafana. These tools allow you to collect, store, and visualize a wide range of metrics from your cluster, giving you deep insights into the performance and health of your applications.
Hands-On: Putting Theory into Practice
Now it’s time to get our hands dirty. In this section, we’ll walk through a practical example of deploying an application with health checks, resource limits, and autoscaling. We’ll use a simple Flask application that allows us to simulate memory consumption.
Prerequisites
Before we begin, make sure you have the following tools installed on your macOS machine:
- Docker Desktop
- kubectl
- A Docker Hub account
Step 1: The Application
First, let’s create our Flask application. Create a new directory for our project and inside it, create the following three files:
app.py
from flask import Flask, requestimport os
app = Flask(__name__)
# A simple in-memory storestore = []
@app.route('/')def hello(): return 'Hello, World!'
@app.route('/healthz')def healthz(): return 'OK'
@app.route('/ready')def ready(): # Simulate a readiness check. For example, check if a database connection is available. return 'OK'
@app.route('/consume-mem')def consume_mem(): size_in_mb = int(request.args.get('mb', 10)) # Each character is a byte, so we create a string of `size_in_mb` megabytes s = ' ' * (size_in_mb * 1024 * 1024) store.append(s) return f'Allocated {size_in_mb} MB of memory. Total memory allocated: {len(store) * 10} MB'
if __name__ == '__main__': app.run(host='0.0.0.0', port=8080)Dockerfile
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "app.py"]requirements.txt
Flask==2.1.2Step 2: Build and Push the Docker Image
Now, let’s build our Docker image and push it to Docker Hub. Make sure you are logged into your Docker Hub account.
# Replace 'your-dockerhub-username' with your actual Docker Hub usernameexport DOCKERHUB_USERNAME=your-dockerhub-username
docker build -t $DOCKERHUB_USERNAME/k8s-health-check-demo .docker push $DOCKERHUB_USERNAME/k8s-health-check-demoStep 3: Deploy to Kubernetes
Next, we’ll create a Kubernetes deployment to run our application. Create a file named deployment.yaml with the following content. Remember to replace your-dockerhub-username with your Docker Hub username.
apiVersion: apps/v1kind: Deploymentmetadata: name: health-check-demospec: replicas: 1 selector: matchLabels: app: health-check-demo template: metadata: labels: app: health-check-demo spec: containers: - name: health-check-demo image: your-dockerhub-username/k8s-health-check-demo ports: - containerPort: 8080 livenessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 3 periodSeconds: 3 readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5 resources: requests: memory: "64Mi" cpu: "100m" limits: memory: "128Mi" cpu: "200m"Now, apply the deployment to your cluster:
kubectl apply -f deployment.yamlLet’s also create a service to expose our deployment:
apiVersion: v1kind: Servicemetadata: name: health-check-demo-svcspec: selector: app: health-check-demo ports: - protocol: TCP port: 80 targetPort: 8080 type: LoadBalancerCreate a file named service.yaml with the content above and apply it:
kubectl apply -f service.yamlStep 4: Observe the Probes
Let’s see our probes in action. Get the name of your pod:
kubectl get podsThen, describe the pod to see the events related to the probes:
kubectl describe pod <your-pod-name>You should see events indicating that the liveness and readiness probes are succeeding.
Step 5: Trigger an OOMKilled Error
Now, let’s simulate the OOMKilled error that Alex encountered. We’ll use port-forwarding to access our service from our local machine:
kubectl port-forward svc/health-check-demo-svc 8080:80In a new terminal, let’s send requests to our application to consume memory. Our memory limit is 128Mi.
# Send requests to consume memory until the pod is OOMKilledfor i in {1..15}; do curl "http://localhost:8080/consume-mem?mb=10"; echo; sleep 1; doneAfter a few requests, you’ll see the pod being terminated and restarted. Check the status of the pod:
kubectl get podsYou’ll see the pod has been restarted. Describe the pod again to see the reason for the restart:
kubectl describe pod <your-new-pod-name>You should see that the pod was terminated because of an OOMKilled error.
Step 6: Configure the Horizontal Pod Autoscaler
Finally, let’s configure the HPA to automatically scale our application based on CPU utilization. First, you need to have the Metrics Server installed in your cluster. If you are using Docker Desktop, it’s usually enabled by default. You can check with kubectl top pods.
Now, create an HPA for our deployment:
kubectl autoscale deployment health-check-demo --cpu-percent=50 --min=1 --max=5This command creates an HPA that will maintain an average CPU utilization of 50% across all pods. It will scale the number of pods between 1 and 5.
To generate some load, we can run a simple loop:
# In a new terminal, run this to generate loadwhile true; do wget -q -O- http://localhost:8080; doneNow, watch the HPA in action:
kubectl get hpa -wAfter a few minutes, you should see the number of replicas increase as the CPU utilization goes up. When you stop the load generation, the number of replicas will scale back down to 1.
Debugging and Troubleshooting Tips
Even with the best-laid plans, things can go wrong. Here are some common issues you might encounter and how to troubleshoot them:
-
CrashLoopBackOff: If you see a pod in a
CrashLoopBackOffstate, it means the container is starting and then crashing in a loop. This could be due to a failing liveness probe or an application error. Usekubectl describe pod <pod-name>andkubectl logs <pod-name>to investigate the cause. -
ImagePullBackOff: This error means that Kubernetes is unable to pull the Docker image for your container. Double-check that the image name and tag are correct and that you have pushed the image to the registry.
-
HPA Not Scaling: If your HPA is not scaling your deployment, make sure the Metrics Server is running and that you have set resource requests for your containers. The HPA needs CPU requests to be set to calculate the CPU utilization.
-
Probes Failing: If your probes are failing, use
kubectl describe podto see the probe events. You can also usekubectl exec -it <pod-name> -- /bin/shto get a shell into your container and manually test the probe endpoint.
Key Takeaways
This chapter covered a lot of ground. Here are the key takeaways:
- Probes are essential for self-healing: Liveness, readiness, and startup probes help Kubernetes understand the health of your application and make intelligent decisions.
- Resource management is crucial for stability: Setting CPU and memory requests and limits prevents resource contention and ensures your applications run reliably.
- QoS classes determine pod priority:
Guaranteed,Burstable, andBestEffortclasses tell Kubernetes how to handle your pods under pressure. - HPA enables autoscaling: The Horizontal Pod Autoscaler automatically scales your application based on metrics like CPU utilization, ensuring you can handle fluctuating traffic.
- Observability is key: Tools like
kubectl topand the Metrics Server provide basic insights, but for production, you need a more comprehensive monitoring solution.
The Morning After
As the sun streamed into his office, Alex finally leaned back in his chair, a sense of accomplishment washing over him. The API service was stable, the new health checks were in place, and he had a much deeper understanding of how Kubernetes managed resources. He had not only fixed the immediate problem but had also laid the groundwork for a more resilient and scalable system.
He knew his journey with Kubernetes was far from over. There were still many more mysteries to unravel, many more challenges to overcome. But for the first time, he felt a sense of confidence, a feeling that he was starting to master this powerful and complex technology.
His thoughts were interrupted by a Slack notification from his manager: “Great job on the API service, Alex! Can you join a quick call? We need to discuss our strategy for managing secrets and configurations across our services.” Alex smiled. It seemed his next adventure was already beginning.