Speed up CI builds with multi-stage Dockerfiles

Leveraging the Docker build cache has been a great way to speed up Docker builds for years now: When building an image, Docker steps through the instructions in your Dockerfile, executing each in the order specified. As each instruction is examined, Docker looks for an existing image in its cache that it can reuse, rather than […]

Leveraging the Docker build cache has been a great way to speed up Docker builds for years now:

When building an image, Docker steps through the instructions in your Dockerfile, executing each in the order specified. As each instruction is examined, Docker looks for an existing image in its cache that it can reuse, rather than creating a new (duplicate) image.

For the ADD and COPY instructions, the contents of the file(s) in the image are examined and a checksum is calculated for each file.

For the RUN instructions, the command string itself is used to find a match.

Once the cache is invalidated, all subsequent Dockerfile commands generate new images and the cache is not used.

The Dockerfile for a simple Ruby service may look something like:

FROM alpine
WORKDIR /app
RUN apk add --update ruby ruby-bundler
COPY Gemfile* ./
RUN bundle install --path vendor/bundle
COPY . ./

This works great since the RUN bundle install step is only executed if:

  • new alpine:latest changes are pulled in
  • apk system dependencies change (not very often)
  • the contents of Gemfile or Gemfile.lock change

Otherwise the build cache is used and the step completes immediately:

Step 5/6 : RUN bundle install --path vendor/bundle
---> Using cache
---> 17ec830a9b5b

Nowadays it’s common for Ruby services to use other dependencies as well e.g. Node for handling misc tasks like asset compilation via webpack:

FROM alpine
WORKDIR /app
RUN apk add --update ruby ruby-bundler nodejs nodejs-npm
COPY Gemfile* ./
RUN bundle install --path vendor/bundle
COPY package.json ./
RUN npm install
COPY webpack.config.js ./
COPY app/assets ./app/assets
RUN npm run webpack
COPY . ./

But here’s where things get a little tricky:

  • Now whenever the Gemfile or Gemfile.lock changes it invalidates all of the cached steps below it
  • That means all NPM packages must be reinstalled and all webpack assets must be recompiled even though gems have nothing to do with those steps
  • The steps could be reordered so that gems are installed after the Node related steps but that results in the same problem where changes to package.json or webpack assets requires a full reinstallation of gems
  • It’s pretty common for these Node dependencies to only be used for asset compilation at build time. In these cases Node, NPM, and the entire node_modules directory isn’t actually needed when the service runs
  • For smaller services this usually isn’t a big deal but for services that rely on a large number of gem or NPM dependencies this results in much slower build times

Now Docker 17.05+ multi-stage builds can be used to solve this problem!

This feature allows a Dockerfile to contain multiple FROM steps which can generate intermediate images and utilize the build cache more efficiently.

These intermediate images are eventually combined into one final image:

FROM alpine AS node
WORKDIR /app
RUN apk add --update nodejs nodejs-npm
COPY package.json ./
RUN npm install
COPY webpack.config.js ./
COPY app/assets app/assets
RUN npm run webpack
FROM alpine
WORKDIR /app
RUN apk add --update ruby ruby-bundler
COPY Gemfile* ./
RUN bundle install --path vendor/bundle
COPY --from=node /app/assets assets
COPY . ./

Now gem changes no longer impact NPM or webpack steps and vice-versa!

This was a simple example but these multi-stage builds are very useful for:

  • Services that have many dependencies so reinstallation is slow
  • Services that have many independent build steps for misc tasks like generating static content e.g. sitemaps, marketing pages, assets, etc.

Note that even though the Ruby and Node related steps no longer invalidate each other’s build cache, all of the intermediate Docker images generated within the Dockerfile are still built one at a time, step by step! On a fresh build, the Ruby related steps won’t be evaluated until the Node related steps complete, even though they do not depend on each other.

Keep an eye on Buildkit – the concurrent, cache-efficient, and Dockerfile-agnostic builder toolkit! One of its upcoming experimental features allows the intermediate images in multi-stage builds to be compiled in parallel!

Our engineering team is still growing! We’re hiring engineers in our San Francisco and Pittsburgh offices. Check out our careers page to learn more. We look forward to hearing from you!


Speed up CI builds with multi-stage Dockerfiles was originally published in LendingHome Tech on Medium, where people are continuing the conversation by highlighting and responding to this story.

Source: LendingHome