We’ve recently discovered the wonders of a CloudRun stack here at Kogan. As a fast moving business we often need to quickly spin up new projects to demonstrate new ideas. We previously used Heroku, but for low-traffic or internal sites we didn’t need the application to be running constantly: we’re looking for a serverless solution! […]
We’ve recently discovered the wonders of a CloudRun stack here at Kogan.
As a fast moving business we often need to quickly spin up new projects to demonstrate new ideas. We previously used Heroku, but for low-traffic or internal sites we didn’t need the application to be running constantly: we’re looking for a serverless solution!
We came up with a checklist:
Google Cloud Platform (GCP) had most of the answers: CloudRun, a “serverless” cloud container that operates off Docker images, for the Django requirement; CloudSQL, a fully managed RDBMS, for the Postgres requirement; and CloudBuild for CICD. We’re missing something to run our offline tasks!
We’ve previously relied on Celery to handle our scheduling of long running tasks. However, Celery needs infrastructure: message brokers, worker pools, and a beat. That’s a lot to consider when scaling up, and it also won’t scale down to zero. Wouldn’t it be nice to be able to scale our workers and web traffic in the same way?
The solution we came up with was to define tasks as http endpoints. We could then scale both web and task traffic by increasing the number of containers, and everything is executing in the same environment.
Enter CloudTask and CloudScheduler.
CloudTask is a task queueing service managing retry-on-failure, exponential backoff, and rate limiting. CloudScheduler allows you to define tasks to be sent on a cron-like schedule. Both of these services define tasks as http payloads – perfect!
We’ve previously attempted to use Cloud Pub/Sub as a lower level task queue, but we found that it was difficult to get any visibility into running tasks. In comparison, CloudTask gives you a nice dashboard of metrics where you can view tasks in progress, what the execution rate is, and how many tasks have failed.
We built a library to interface with CloudTask and CloudScheduler: django-cloudtask.
We made the API simple and similar to Celery. Tasks are defined by a decorator and may be called both synchronously and asynchronously. For example:
from django_cloudtask import register_task @register_task def process_data(report_pk, days_ago=1): # tasks support both args and kwargs for record in Report.objects.filter(pk=report_pk).generate_report(days_ago): process(record) @register_task(schedule="15 14 * * *", should_retry=False) def schedule_reports(): # tasks can operate on a cron schedule for pk in Report.objects.filter(enabled=True).values_list("pk", flat=True): process_data.enqueue(pk)
More docs and examples are available in the repo!
We also built a template, django-cloudrun-cookiecutter, which can kickstart a Django project ready to be deployed to CloudRun. We took a lot of inspiration from Google’s demo Django app, Unicodex, a recommended read to understand what all the steps are doing.
We’re very happy with our new stack and are already using the above repos in our production systems. There are still improvements to be made, such as more task managing defined in code, but we’re getting there.