Django on Kubernetes

This is part 2 of our Kubernetes hackday series. You can find Part 1 here which goes over how we spent the day and what the goals and motivations are. For part 2, we’re going to delve into the architecture required for running a Django application on Kubernetes, as well as some of the tooling […]

This is part 2 of our Kubernetes hackday series. You can find Part 1 here which goes over
how we spent the day and what the goals and motivations are.

For part 2, we’re going to delve into the architecture required for running a Django
application on Kubernetes, as well as some of the tooling we used to assist us.

This post will assume some knowledge with deploying and operating a production
web application. I’m not going to spend much time going over the terminology that
Kubernetes uses either. What I hope to do in this post is present enough information
to kickstart your own migration. Kubernetes is big – and knowing what to research now
and what to put off until later is really tricky.

I’m also going to describe this deployment in terms of Amazon EKS rather than Google GKE simply because I was on the EKS team,
and most of our applications are already on AWS.

Disclaimer

We’re not yet running any production apps on Kubernetes, but we’ve done
a lot of the analysis, and are comfortable with the high level design. If you’re
looking to migrate yourself, feel free to use this design as a jump start, but know there’ll be a lot more detail to deal with.

Current Django Architecture

Before discussing where we’re going, it’s helpful to know where we are. When
deploying a new Django application, we usually have the following components.

  1. Application server(s) (Gunicorn) fronted by Nginx
  2. Worker server(s) (Celery)
  3. Scheduler server (Celery beat + shell, single instance)
  4. Database (Postgres)
  5. Cache (Redis usually, sometimes Memcache)
  6. Message Server for Celery (Redis or RabbitMQ)
  7. Static content pushed to and served from S3
  8. A CDN in front of the static content on S3

The scheduler server is a single instance that we use for our cron tasks, and
for production engineers to ssh to when they need to do production analysis. It’s
incredibly important for our workflows to maintain this capability when moving to
Kubernetes.

The Cluster

EKS requires you to setup the VPC and Subnets that’ll be used to host the master
nodes of the cluster. Thankfully, AWS provide CloudFormation templates that do
exactly this.

The Getting Started
section is required reading, and will walk you through the rather complicated
process for setting up a cluster, authentication, the initial worker nodes, and
configuring kubectl so
you can operate the cluster from the command line.

This setup is one area where EKS lags well behind the GKE experience. Creating a
fully operational kubernetes cluster in the GCP console is literally a few clicks.

Minikube

Minikube is a single-node local
Kubernetes deployment that you can run on your laptop. If you want to experiment
with Kubernetes but don’t want the hassle of spinning up a production-grade cluster
in the cloud, Minikube is an excellent option.

Because EKS took some time to get operational, the rest of the team members worked
exclusively on Minikube until the final hour. Minikube has some limitations, but
was excellent for a development environment.

Kubernetes Design

The very first task we set out to do was to come up with an architecture, in
Kubernetes terms, that would support our application and our desired workflows. The
final design ended up somewhat different from the initial one based on our learnings,
but it gave us a shared language and allowed us to partition the tasks properly.

Architecture Diagram

There are actually two designs. The staging architecture differs from the production
architecture. We can have 10s of staging environments at any time, so being able to
easily bring up and destroy environments is highly desirable.

For this reason, our staging design does not use cloud services like RDS or
ElastiCache, which would require coordinating deployments between Kubernetes
and various cloud provider SDKs. Instead, we deploy Postgres and Redis within
the cluster.

Architecture Diagram - UAT

Note: We love RDS. Properly configuring, backing up, monitoring, and updating
services like Postgres and Redis isn’t trivial. We trust cloud providers to do it
much better than our small team can, especially on Kubernetes where our experience
is, well, very little. But if our Staging database deployment crashes, it’s not a
critical event.

The only real difference between the two environments is that the Services
either point externally to a managed data store, or internally to a datastore. The
applications themselves remain blissfully unaware.

Deployments, Containers, and Services, OH MY

You may have heard of Pods or Services but don’t really understand how all of these
concepts fit together. The docs are dense, and knowing where to begin learning is
half the battle. Let me give a quick run down of the Kubernetes objects we’re
using here.

Namespaces let you group together related components so they can communicate. Namespaces aren’t
(on their own) a security mechanism. Use namespaces to segment apps you trust, or
different environments for a single app.

Deployments
are like an auto-scaling group for a Pod. A Pod has (usually) a
single container, but may have multiple containers if they are co-dependent.
Containers within a pod share a volume and can communicate directly.

Services are
basically TCP load balancers. They are capable of a lot more, but for our purposes,
that’s what we’re using them for.

ConfigMaps
can be environment variables or entire files that are injected into a Pod.

Secrets like passwords
are injected into the cluster, and then made available to Deployments at runtime.

I encourage you to read all of the Concepts
documentation in time. But do so after you have a handle on the above types first.
There’s lots you can ignore as implementation details you won’t have to deal with
directly.

Application Design

Unless you’re already following Continuous Delivery
it’s likely that your application is going to need to change in some way to be
compatible with a Kubernetes deployment. Here are a few things that we needed
to change.

Static Files

Deploy them to S3? Bundle a copy inside each container? Use Nginx? This choice
can be particularly contentious. Here’s my advice.

Use Whitenoise so that Django is in
charge of serving static content, then put a CDN in front of it. If, for some
reason, you don’t trust Whitenoise then use nginx within the same Pod to serve
the files, and still put a CDN infront of it.

Shared Storage

Maybe you’re using an NFS mount for uploaded media? Stop doing that. Use an object
storage service (like S3) instead.

That said, you’ll still need to tell Kubernetes what type of storage engine it
should use for local volumes. Before Kubernetes 1.11 (EKS), you will need to explicitly
configure a storage class. See Storage Classes
documentation for setting this up.

Config Files

You might have config files for services like New Relic or similar. Deploy these
with ConfigMaps instead.

Secrets / Config Variables

Use Secrets for sensitive material or ConfigMaps otherwise. Consume variables
from environment variables within the app.

Dockerfiles

You might already have Dockerfiles for development or some other environment. My
advice is to have a separate Dockerfile optimised for development, and another
optimised for Production.

Perhaps you need auto-reloading and NPM in development? Great! Don’t ruin your
production builds and artifacts with cruft that you don’t need. Reduce the size
and build time aggressively for production.

In our design, we have an app server, a celery worker, and a celery beat. Use the
same Dockerfile for all 3 servers, but change the Entrypoint so the right service
is started. We don’t want to be building 3 slightly different containers.

Deploying to Kubernetes

We have a good idea what objects we require. Our application is ready. How does
it all come together?

With YAML. Specifically, having Helm Charts produce YAML.

Helm Charts are packages. You can use Helm to deploy Charts onto Kubernetes. Helm
Charts represent fully running Applications like Postgres, or our own custom
Django application.

Helm Charts can have dependencies. They may also have configuration, and run
time options that change how an application can be deployed. For example, you may
deploy a chart with a STAGING flag set, that additionally deploys a local Postgres
rather than relying on hosted RDS. Sound familiar?

You will spend a lot of time designing your Helm Chart yaml files. They are basically
golang Templates that will render Kubernetes compatible YAML for each object in our
diagram above. We hope to be able to abstract the chart enough where we can deploy
arbitary Django applications based on an image name.

Monitoring and Operations

We’ve covered the the concepts required for setting up a cluster and deploying the application, but it’s harder to find information on monitoring your application and operating it in production. I’ll briefly touch on some of those facets now.

Logging

Centralised logging is critical for any large application, but doubly so for one running within Kubernetes. You can’t just hop onto a node and tail log files, and doing any kind of post mortem analysis on dying nodes becomes impossible.

The standard wisdom is to run a logging daemon within each Deployment or Node that takes your logs and sends them to the logging service of your choice. There are helm charts for major logging services that you can reuse.

Logging daemons typically run as DaemonSets, which is another Kubernetes object, but one you probably won’t interact with directly.

Monitoring

Prometheus is the standard monitoring tool used within Kubernetes clusters. Your application (and services) can export metrics that Prometheus will then gather.

We’ll continue to use New Relic for our applications though, as it gives a lot of great insight into your Django app. Within your web/celery startup script, run it with newrelic-admin as normal. Deploy your newrelic.ini as a ConfigMap.

SSL

If you’re using AWS Certificate Manager with your Application Load Balancer, then SSL is mostly handled for you. There are also Helm charts for deploying LetsEncrypt into your Kubernetes cluster.

DNS

This one is a bit trickier, and I’m not sure we’ve fully landed on a solution at this time. Amazon Route 53 is the managed DNS service, but what is going to provision a new zone for an instance of our application?

There are tools like Pulumi that allow you to program your infrastructure with nice Kubernetes integration. It may make sense to have Pulumi provision the hosted zone, and then deploy your helm chart into Kubernetes.

Auto Scaling

There are two systems that you’re going to need to autoscale now. The nodes that the cluster is running on, and the number of application instances (Pods/Deployments) you want to run.

Helm cluster-autoscaler claims to support scaling the worker nodes. While Kubernetes has support for Horizontal Pod Autoscaler for scaling your application.

Summary

I hope the information presented above is useful and answers some questions you might have already had about moving to Kubernetes. We found that working our way through and understanding the concepts behind Kubernetes has lead to a lot more productive conversations about what a migration might look like, and the effort required to make that migration. A full migration is definitely on the cards for this year, and we’ll be documenting our journey as we go.

If there are any corrections needed above, please let us know in the comments. We’d also love to hear about your experiences with your own migration and what some of the sticking points were.

And, as always, if infrastructure is your thing and you think you can help us with the migration to Kubernetes, We’re Hiring!

Source: Kogan.com