Fsq.io: Open Sourcing Foursquare’s Production Codebase

At Foursquare we have always been very keen about open source: many of our engineers are avid contributors to the community, many of our projects are open source (including some essential tools for location intelligence and big data), and we are regular open source collaborators with other leading tech companies. As a company we are […]

At Foursquare we have always been very keen about open source: many of our engineers are avid contributors to the community, many of our projects are open source (including some essential tools for location intelligence and big data), and we are regular open source collaborators with other leading tech companies.

As a company we are very proud to support our engineers who want to open source their projects. But as a business we have to consider the workload and maintenance cost associated with every additional engineering project.

Juggling between open source infrastructure code and internal codebase was taking its toll. Our engineers would have to maintain two different versions of the same code. Some of our critical infrastructure code was open source and lived in separate Github repositories. Open source releases might lag behind what we used internally, or the reverse. If an open source project lost its maintainer, it could end up abandoned. Foursquare was wasting engineering resources solving these problems on a per-project basis.

Spindle was our motivation to change things.

Spindle, based on Apache Thrift, is a critical piece of custom infrastructure, and began as an independent open source project. So, instead of being inside of our monorepo, it lived in a completely separate Github repo. We treated it as a third party tool, and as such it was becoming increasingly difficult to work on Spindle itself. The development cycle was dominated by the overhead of publishing snapshots in order to run internal tests.

The result was that changes to Spindle would often be tested less thoroughly because it was so time consuming. Spindle would have new features totally written, but they would sometimes take a very long time before they would become available, because updating the tool to be used by our monorepo was tedious and unrewarding. One of our best engineers maintained the project and he estimated it took nearly a quarter of his engineering hours to manage that cycle, and was frustrated by the effort it took to land even small changes.

The solution was to move the Spindle code into our codebase. But the harder question was what to do with the open source project. We wanted our lives to be easier, but we didn’t want to withdraw projects that we had previously been enthusiastic about sharing.

What’s Fsq.io?

Internally, all our code lives in a single monorepo. Fsq.io is the subset that contains all of our open source code, around 1/30th of all the Scala and Python library code at Foursquare. It was first released at the end of 2015 and we add new projects regularly.

The Fsq.io name represents the new namespace — our business logic and internal only projects are under “com.foursquare”, so we created “io.fsq” for the open source code. The new namespace and file paths are the first of several indicators to engineers that this code is scheduled for regular deploys to open source.

Fsq.io is itself a fully functioning monorepo and has greatly lowered the cost of releasing and maintaining open source at Foursquare. It contains all of the latest code we use internally and all of the tools for using that code: it’s a collection of infrastructure and geocoding utilities ready to be plugged into existing codebases or to form the basis for a new company.

Fsq.io requires almost no extra maintenance from our engineers — the cost is close to zero.

Now, anyone can clone Fsq.io and spin up one of our open source projects, perhaps Twofishes, a Scala geocoder, or Spindle, our custom Scala code generator for Thrift. Alternatively, it is entirely possible to use Fsq.io as a foundation for a new startup. We know several companies that bootstrapped their codebase by checking out Fsq.io and adding their feature code on top.

A monorepo?

A monorepo means that all of our projects are in a single git repository, as opposed to coordinating among separate repos for every individual project. Monorepos are increasingly useful as contributor and project counts grow. Some of the biggest tech companies in the world use monorepos to limit complexity and maintenance cost.

For some background, here is a detailed look into why all of Google’s code is stored in a single shared repository, and an excellent explanatory piece by Dan Luu.

Monorepos are quite useful internally, where all the code can function as a unit. But they can complicate releasing subprojects to open source, since the project must function independently as well as it does within the monorepo.

Other companies have released related projects: Twitter released a collection of their libraries as twitter-commons, and Google open sourced MOE, a tool designed to split projects of monorepos.

How do we make it work?

Fsq.io is a strict subset of our monorepo and commits may contain both open source and internal code. We use Sapling (an open source git porcelain tool) to analyze every commit and split out changes to open source file paths. These changes are pushed to a branch, sanity checked, and then released as Fsq.io.

It has become much easier to open source projects and we tend to add a new major project to Fsq.io every couple of months. On average, 1 out of our 40 commits is open source, or about 1 commit per day.

We no longer rely on engineers to maintain each individual project and instead programmatically maintain them as a whole. As long as the code is useful internally, we will continue to faithfully deploy every change.

What builds it?

Fsq.io is built using Pants, an open source build system. Pants is a true community project, representing a long-standing collaboration between Foursquare, Twitter, Square, individual contributors, and more.

Pants was designed with monorepos in mind and works particularly well for Fsq.io, providing:

  • Common API for any project: Pants supports multiple languages and frameworks, and is highly pluggable. Pants serves as a common way to interact with anything in the repo, regardless of language.
  • Cross-language invalidation: Pants easily encodes dependency relationships across languages that otherwise depend on human diligence, essential for a codebase with the breadth of Fsq.io.
  • Cache: Foursquare is primarily a Scala shop, and Scala is notoriously slow to compile. Pants populates and distributes artifacts with a distributed artifact cache. Developers don’t recompile code that they didn’t change, instead it will be read from the cache that was populated by the CI. This saves an incredible amount of engineering time.

How do we maintain this monorepo?

At Foursquare, at any given moment one of our engineers is working on code that lives in Fsq.io, be that adding a new project, a new feature, or fixing a bug. But the cost of maintaining and releasing the code is essentially invisible. I tend to the Fsq.io tooling and cut releases, which takes maybe an hour or so a month. The larger maintenance that all code needs has already happened as a matter of course.

In a nutshell, Fsq.io has been great for both Foursquare and the open source community:

  • It’s cheaper to release and maintain open source projects, so it happens more readily.
  • Open source code is identical to what we deploy internally.
  • Bugs in Fsq.io are high priority because they represent production issues.
  • Increase velocity of those projects due to increased discoverability
  • Continuous integration for Fsq.io from internal CI pipeline.
  • A fully functioning Pants repo released as open source, useful as reference or consumption.
  • Foursquare’s open source projects now outlive their original authors, by default.

Do you want to contribute?

We have had contributions from engineers at Sigma, Twitter and ActionIQ (amongst others) and look forward to more. Pull requests are accepted through GitHub, but the commit itself must land in our internal repo, the source of every Fsq.io commit.

Foursquare engineers will review the pull request, apply the patch internally (preserving authorship) and send it to Fsq.io like any other commit.

Even more so, consider joining our talented team of engineers who contribute to Fsq.io daily.

What’s the quickest project for you to try out?

Exceptionator, an exception aggregator, is one of our open source projects and it’s easy to boot. If you have an existing codebase (and you have a JVM fleet) you can see Exceptionator in all of its exception-catching graphical glory. It’s most useful for live monitoring your production traffic but a demo can be spun up on most OSX or Linux machines.

You can start it up in these 3 steps:

  • Install MongoDB and have a local mongo running:
mkdir mongo_data
path_to_mongo_download/bin/mongod --dbpath mongo_data
  • Use Pants to build and boot an Exceptionator instance:
./pants run src/jvm/io/fsq/exceptionator

In most cases, you will see nothing of great interest. Fsq.io is a useful and interesting collection of infrastructure and geo tooling — but generating a user base is up to you!

Do you want to find out even more?

If you want to keep up with our open source projects make sure to follow Foursquare Engineering on Medium and sign up to receive our upcoming newsletters.

Fsq.io: Open Sourcing Foursquare’s Production Codebase was originally published in Foursquare on Medium, where people are continuing the conversation by highlighting and responding to this story.

Source: Foursquare