This blog post is going to motivate and describe Artsy’s adoption and evolution of the usage of review apps. This first part of this post covers a couple of common problems where topic-specific servers (i.e. review apps) are useful. The rest of the post describes Artsy’s history with review app automation via incremental problem solving […]
This blog post is going to motivate and describe Artsy’s adoption and evolution
of the usage of review apps.
This first part of this post covers a couple of common problems where
topic-specific servers (i.e. review apps) are useful.
The rest of the post describes Artsy’s history with review app automation via
incremental problem solving and the composition of a few well-known technologies.
At Artsy, we have a sizable engineering organization: ~40 engineers organized
across many teams. Engineers on those teams work on many codebases, some are
exclusive to a team, but many codebases are worked on by engineers across many
teams. Artsy’s website (www.artsy.net), Force, is a good example of such a shared
These different teams of developers working on a shared service have (mostly
hidden) dependencies upon each other, most visible when a shared service is being
deployed to production.
Let’s work the following example:
Artsy’s production deployment flow is rooted in a GitHub Pull Request, meaning
that the commits that differ between staging (on the
staging Git branch) and
production (on the
release Git branch) are clear to whomever is managing a
While it’s great that our deploys are easy to understand, ensuring that a deploy
of a service is safe requires communicating with the teams that contributed
code to ensure that their work is “safe to deploy”.
“Safe to deploy” might mean different things depending on the nature of the
For example, Team A’s new list of art auctions might require an API endpoint in
another Service Z to be deployed for their part of the deploy of Service S to
be safe to deploy. Team B’s bugfix might just requires a quick visual confirmation.
Suffice to say, it’s hard to independently confirm that another team’s work is
safe to deploy.
Despite the mitigation strategies discussed next, there’s risk of deploying unsafe
code whenever a single staging environment is used across many teams.
There are a couple of ways Artsy mitigates against the possible pitfalls of a
shared staging environment:
Having a culture of quickly deploying code to production
By building a culture that views frequent deploys positively, there’s, on average,
less diff in every deploy, mitigating the risk of unintentionally deploying code
that’s not safe to deploy.
Using automated quality processes geared towards a stable staging environment
We do our best to feel as confident as possible in a change before it is deployed
to staging by creating automated tests for changes, sharing visual changes over
Slack and in PRs, and other strategies relevant to the work at hand.
When deploying a service, Artsy engineers typically send a message to our #dev
Slack channel notifying of their plan to deploy a particular service, cc’ing the
engineers that are involved in other PRs that are part of the deploy. In the example
above, an engineer on Team B would notify relevant stakeholders of Team A, giving
that team the opportunity to flag if their work is not yet safe to deploy.
While these strategies are impactful:
Semi-unstructured communication is prone to breakdown: the notified engineers
on Team A might be pairing or in a meeting and Team B deploys anyways.
Without a true continuous delivery model, it’s a challenge to operationalize
very frequent production deploys. Moreover, the problem compounds as the team
grows and the velocity of work increases.
Particularly when working in a large distributed system, automated testing at
the individual service level can only provide a certain level of safety for a
change. Visual changes which require correctness on different viewports and devices
are, pragmatically, often best to test manually.
If only there was a way to allow Team A and B to work without risking stepping
on each other toes!
We’ll discuss how review apps provide this safety, but first another related
But before it gets better, it’s going to get a bit worse.
While working on any mature distributed system is a challenge, some work is
particularly challenging or risky. Generally, risk is reduced when many
skilled eyes can review the risky thing in an environment that closely mimics
the production environment.
This class of work might include changes to authorization flows, page
redesigns or infrastructural upgrades (e.g. a NodeJS upgrade).
For example, to feel safe deploying a major version upgrade of Artsy’s design
system in Force the most pragmatic way forward was
to deploy that PR to a server where other engineers could collaborate on QA.
If a single shared staging environment is the only non-production server to
deploy to, the chances that work lands on staging in an unsafe state is high. While
staging being unsafe isn’t itself a bad thing, many bad things can result:
[Bad] Blocked Deploys
If staging is unsafe and this dangerous state is discovered, then top priority
is getting a service back into a safe state. While working to get a service
back to a healthy state, new features can’t be delivered.
In aggregate, this can be a sizeable productivity loss.
[Worse] Unsafe Deploys
If staging is unsafe and this dangerous state is not discovered before a production
deploy (for example, the unsafe code might be from another team, as described above),
then end-users might interact with a service that just isn’t working. No good.
Fear is the mind-killer.
Alright, a bit over the top, but the risk of unsafe or blocked deploys can
implicitly or explicitly result in teams shying away from complex work.
This avoidable fear might result in increased technical debt or not taking on
It’s generally bad for business and does not lead to a pleasant work environment!
To recap, there is an increased risk of unsafe or blocked deploys whenever there
is a single staging environment for a shared service. Certain types of
(incredibly useful) changes require interactive review on a live server
before we feel comfortable with those changes, which magnifies the risk of a unsafe or
Review apps are simply other staging environments that are easy to spin up and
are deployed with the version of the service that you are working on.
By QA-ing on a review app instead of a shared staging environment, teams can
take their time ensuring the safety of their changes, removing the risks
Larger infrastructure upgrades (including upgrades to the
underlying K8s configuration, in addition to app-level changes) can sit on a
server for hours or days, allowing any slow-moving issue to show itself in a
lower risk environment.
Artsy has iterated on its review app tooling to the point where Team A and Team B
can each deploy their changes to isolated servers of our main website, Force,
on a single
git push to a branch matching a naming convention.
The rest of this post describes Artsy’s evolution of its review app tooling
and areas for continued improvement.
In the beginning, Artsy deployed most services on Heroku. There were fewer engineers
and teams, so the engineering organization was less impacted by the problems described above.
Heroku review apps were used on some teams sparingly.
For many reasons outside of the scope of this post, Artsy began migrating
services off of Heroku and onto an AWS-backed Kubernetes (K8s) deployment model
starting in February 2017.
In order to allow application engineers to reasonably interface with K8s-backed
services, Artsy developed a command line interface,
hokusai, to provide a Heroku CLI-like interface for
configuring and deploying these services.
About a year after
hokusai‘s initial release, the tool released its initial
support for review apps.
Via subcommands within the
hokusai review_app namespace, developers were able to:
hokusai‘s official review app feature handles much of the core infrastructure
needed to get a service deployed to a new server, additional tasks are required
to have a working review app, which can be categorized into:
Service Agnostic Tasks
hokusai‘s review app docs for more
Service Specific Tasks
In addition, certain services have service-specific operational requirements that
need to be met before a review app is fully functional.
For example, in Force, we need to:
Impact: Due to the manual labor required to (re)-learn and execute the
commands needed to build a review app, they were used sparingly by a few engineers
that already invested time in learning up on them.
While these tasks described above are tedious, they don’t really require a
decision-making human behind the computer and can be automated.
In August 2019, we automated these tasks via a Bash script.
Impact: A developer is able take a Force commit and get it deployed to K8s
by running a single script on their laptop. Folks became excited about review
apps and started to use them more for Force development.
The increased excitement and usage of review apps in Force revealed a new problem:
Building and pushing >2 GB Docker image across home WiFi networks can be incredibly
slow, decreasing the usefulness and adoption of the Bash script.
After discussions within Artsy’s Platform Practice, a possible solution
emerged: build the review app by running this Bash script on CircleCI upon push
to a branch starting with
This means that a developer’s laptop is then only responsible for pushing a
commit to a branch and CircleCI does all the heavy lifting.
Moreover, the process of centralizing the review app creation into CI helped us realize
the subsequent requirement: updating, not creating, review apps when a review app
already exists for a given branch.
Check out the pull request for the nitty gritty on how
we leveraged CircleCI branch filtering and more Bash to move this workload into
CircleCI and intelligently determine when to update or create a review app.
Impact: Any developer can spin up a Force review app in ~15 minutes on a
Review app are being used often for major and minor changes alike.
Artsy has come far with its tooling for review applications, but, as always,
there’s areas for us for to grow in, including:
Automating the de-provisioning of review apps that no
Automating the creation of DNS CNAME records within Cloudflare, removing one
final manual step.
While the improvements to review app infrastructure has sparked similar
investments in other codebases, there’s a lot of work we could do to bring
this Git-CircleCI-Bash based approach to other shared services we deploy at
One of Artsy’s Engineering Principles is “Incremental
Revolution”, which begins with:
Introduce new technologies slowly and incrementally.
I think Artsy’s approach to review apps is a great example of this principle
As opposed to finding a silver bullet technology or strategy, our approach has
been to build off of a working current state, layering on a new component to
solve the next problem.
At each point along our solution journey, we’re left with a working and more valuable
solution to the problem at hand.
Thanks for reading!