apps-on-azure

Preparing for Production

It’s straightforward to model your apps in Kubernetes and get them running, but there’s more work to do before you get to production.

Kubernetes can fix apps which have temporary failures, automatically scale up apps which are under load and add security controls around containers.

These are the things you’ll add to your application models to get ready for production.

API specs

Container probes are part of the container spec inside the Pod spec:

spec:
  containers:
    - # normal container spec
      readinessProbe:
        httpGet:
          path: /health
          port: 80
        periodSeconds: 5

HorizontalPodAutoscalers (HPAs) are separate objects which interact with a Pod controller and trigger scale events based on CPU usage:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: whoami-cpu
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: whoami
  minReplicas: 2
  maxReplicas: 5
  targetCPUUtilizationPercentage: 50

Self-healing apps with readiness probes

We know Kubernetes restarts Pods when the container exits, but the app inside the container could be running but not responding - like a web app returning 503 - and Kubernetes won’t know.

The whoami app has a nice feature we can use to trigger a failure like that.

📋 Start by deploying the app from labs/productionizing/specs/whoami.

kubectl apply -f labs/productionizing/specs/whoami

You now have two whoami Pods - make a POST command and one of them will switch to a failed state:

# if you're on Windows, run this to use the correct curl:
Set-ExecutionPolicy -ExecutionPolicy Unrestricted -Scope Process -Force; . ./scripts/windows-tools.ps1

curl http://localhost:8010

curl --data '503' http://localhost:8010/health

curl -i http://localhost:8010

Repeat the last curl command and you’ll get some OK responses and some 503s - the Pod with the broken app doesn’t fix itself.

You can tell Kubernetes how to test your app is healthy with container probes. You define the action for the probe, and Kubernetes runs it repeatedly to make sure the app is healthy:

📋 Deploy the update in labs/productionizing/specs/whoami/update and wait for the Pods with label update=readiness to be ready.

kubectl apply -f labs/productionizing/specs/whoami/update

kubectl wait --for=condition=Ready pod -l app=whoami,update=readiness

Describe a Pod and you’ll see the readiness check listed in the output

These are new Pods so the app is healthy in both; trip one Pod into the unhealthy state and you’ll see the status change:

curl --data '503' http://localhost:8010/health

kubectl get po -l app=whoami --watch

One Pod changes in the Ready column - now 0/1 containers are ready.

If a readiness check fails, the Pod is removed from the Service and it won’t receive any traffic.

📋 Confirm the Service has only one Pod IP and test the app.

# Ctrl-C to exit the watch

kubectl get endpoints whoami-np

curl http://localhost:8010

Only the healthy Pod is in enlisted in the Service, so you will always get an OK response.

If this was a real app the 503 could be happening if the app is overloaded. Removing it from the Service might give it time to recover.

Self-repairing apps with liveness probes

Readiness probes isolate failed Pods from the Service load balancer, but they don’t take action to repair the app.

For that you can use a liveness probe which will restart the Pod with a new container if the probe fails:

You’ll often have the same tests for readiness and liveness, but the liveness check has more significant consequences, so you may want it to run less frequently and have a higher failure threshold.

📋 Deploy the update in labs/productionizing/specs/whoami/update2 and wait for the Pods with label update=liveness to be ready.

kubectl apply -f labs/productionizing/specs/whoami/update2

kubectl wait --for=condition=Ready pod -l app=whoami,update=liveness

📋 Now trigger a failure in one Pod and watch to make sure it gets restarted.

curl --data '503' http://localhost:8010/health

kubectl get po -l app=whoami --watch

One Pod will become ready 0/1 -then it will restart, and then become ready 1/1 again.

Check the endpoint and you’ll see both Pod IPs are in the Service list. When the restarted Pod passed the readiness check it was added back.

Other types of probe exist, so this isn’t just for HTTP apps. This Postgres Pod spec uses a TCP probe and a command probe:

Autoscaling compute-intensive workloads

A Kubernetes cluster is a pool of CPU and memory resources. If you have workloads with different demand peaks, you can use a HorizontalPodAutoscaler to automatically scale Pods up and down, as long as your cluster has capacity.

The basic autoscaler uses CPU metrics powered by the metrics-server project. Not all clusters have it installed, but it’s easy to set up:

kubectl top nodes

# if you see "error: Metrics API not available" run this:
kubectl apply -f labs/productionizing/specs/metrics-server

kubectl top nodes

The Pi app is compute intensive so it’s a good target for an HPA:

📋 Deploy the app from labs/productionizing/specs/pi, check the metrics for the Pod and print the details for the HPA.

kubectl apply -f labs/productionizing/specs/pi

kubectl top pod -l app=pi-web 

kubectl get hpa pi-cpu --watch

Initially the Pod is at 0% CPU. Open 2 browser tabs pointing to http://localhost:8020/pi?dp=100000 - that’s enough work to max out the Pod and trigger the HPA

If your cluster honours CPU limits the HPA will start more Pods. After the requests have been processed workload falls so the average CPU across Pods is below the threshold and then the HPA scales down.

Docker Desktop currently has an issue reporting the metrics for Pods. If you run kubectl top pod and you see error: Metrics not available for pod… then the HPA won’t trigger. Here is the issue - but I don’t recommend following the procedure to fix it.

The default settings wait a few minutes before scaling up and a few more before scaling down. Here’s my output:

NAME     REFERENCE           TARGETS    MINPODS   MAXPODS   REPLICAS   AGE
pi-cpu   Deployment/pi-web   0%/75%     1         5         1          2m52s
pi-cpu   Deployment/pi-web   198%/75%   1         5         1          4m2s
pi-cpu   Deployment/pi-web   66%/75%    1         5         3          5m2s
pi-cpu   Deployment/pi-web   0%/75%     1         5         3          6m2s
pi-cpu   Deployment/pi-web   0%/75%     1         5         1          11m

Lab

Adding production concerns is often something you’ll do after you’ve done the initial modelling and got your app running.

So your task is to add container probes and security settings to the configurable app. Start by running it with a basic spec:

kubectl apply -f labs/productionizing/specs/configurable

Try the app and you’ll see it fails after 3 refreshes and never comes back online. There’s a /healthz endpoint you can use to check that. Your goals are:

This app isn’t CPU intensive so you won’t be able to trigger the HPA by making HTTP calls. How else can you test the HPA scales up and down correctly?

Stuck? Try hints or check the solution.


EXTRA Pod security

Restricting what Pod containers can do

Container resource limits are necessary for HPAs, but you should have them in all your Pod specs because they provide a layer of security. Applying CPU and memory limits protects the nodes, and means workloads can’t max out resources and starve other Pods.

Security is a very large topic in containers, but there are a few features you should aim to include in all your specs:

Kubernetes doesn’t apply these by default, because they can cause breaking changes in your app.

kubectl exec deploy/pi-web -- whoami

kubectl exec deploy/pi-web -- cat /var/run/secrets/kubernetes.io/serviceaccount/token

kubectl exec deploy/pi-web -- chown root:root /app/Pi.Web.dll

The app runs as root, has a token to use the Kubernetes API server and has powerful OS permissions

This alternative spec fixes those security issues:

kubectl apply -f labs/productionizing/specs/pi-secure/

kubectl get pod -l app=pi-secure-web --watch

The spec is more secure, but the app fails. Check the logs and you’ll see it doesn’t have permission to listen on the port.

Port 80 is privileged inside the container, so apps can’t listen on it as a least-privilege user with no Linux capabilities. This is a .NET app which can use a custom port:

📋 Deploy the update and check it fixes those security holes.

kubectl apply -f labs/productionizing/specs/pi-secure/update

kubectl wait --for=condition=Ready pod -l app=pi-secure-web,update=ports

The Pod container is running, so the app is listening, and now it’s more secure:

kubectl exec deploy/pi-secure-web -- whoami

kubectl exec deploy/pi-secure-web -- cat /var/run/secrets/kubernetes.io/serviceaccount/token

kubectl exec deploy/pi-secure-web -- chown root:root /app/Pi.Web.dll

This is not the end of security - it’s only the beginning. Securing containers is a multi-layered approach which starts with your securing your images, but this is a good step up from the default Pod security.


Cleanup

kubectl delete all,hpa -l kubernetes.courselabs.co=productionizing