Norbert Potocki, Software Engineer @ Yahoo Inc. Warm up: Why configuration management? When working with large-scale software systems, configuration management becomes crucial; supporting non-uniform environments gets greatly simplified if you decouple code from configuration. While building complex software/products such as Flickr, we had to come up with a simple, yet powerful, way to manage configuration. […]
Norbert Potocki, Software Engineer @ Yahoo Inc.
When working with large-scale software systems, configuration management becomes crucial; supporting non-uniform environments gets greatly simplified if you decouple code from configuration. While building complex software/products such as Flickr, we had to come up with a simple, yet powerful, way to manage configuration. Popular approaches to solving this problem include using configuration files or having a dedicated configuration service. Our new solution combines the extremely popular GitHub and cfg4j library, giving you a very flexible approach that will work with applications of any size.
Let’s see what configuration-specific components we’ll be working with today:
Push cache : Intermediary store that we use to improve fetch speed and to ease load on GitHub servers.
CD pipeline: Continuous deployment pipeline pushing changes from repository to push cache, and validating config correctness.
Configuration library: Fetches configs from push cache and exposing them to your business logic.
Bootstrap configuration : Initial configuration specifying where your push cache is (so that library knows where to get configuration from).
All these players work as a team to provide an end-to-end configuration management solution.
The first thing you might expect from the configuration repository and editor is ease of use. Let’s enumerate what that means:
So what options are out there that may satisfy those requirements? The three very popular formats for storing configuration are YAML, Java Property files, and XML files. We use YAML – it is widely supported by multiple programming languages and IDEs, and it’s very readable and easy to understand, even by a non-engineer.
We could use a dedicated configuration store; however, the great thing about files is that they can be easily versioned by version control tools like Git, which we decided to use as it’s widely known and proven.
Git provides us with a history of changes and an easy way to branch off configuration. It also has great support in the form of GitHub which we use both as an editor (built-in support for YAML files) and collaboration tool (pull requests, forks, review tool). Both are nicely glued together by following the Git flow branching model. Here’s an example of a configuration file that we use:
One of the goals was to make managing multiple configuration sets (execution environments) a breeze. We need the ability to add and remove environments quickly. If you look at the screenshot below, you’ll notice a “prod-us-east” directory in the path. For every environment, we store a separate directory with config files in Git. All of them have the exact same structure and only differ in YAML file contents.
This solution makes working with environments simple and comes in very handy during local development or new production fleet rollout (see use cases at the end of this article). Here’s a sample config repo for a project that has only one “feature”:
Some of the products that we work with at Yahoo have a very granular architecture with hundreds of micro-services working together. For scenarios like this, it’s convenient to store configurations for all services in a single repository. It greatly reduces the overhead of maintaining multiple repositories. We support this use case by having multiple top-level directories, each holding configurations for one service only.
The main role of push cache is to decrease the load put on the GitHub server and improve configuration fetch time. Since speed is the only concern here, we decided to keep the push cache simple: it’s just a key-value store. Consul was our choice, in part because it’s fully distributed.
You can install Consul clients on the edge nodes and they will keep being synchronized across the fleet. This greatly improves both the reliability and the performance of the system. If performance is not a concern, any key-value store will do. You can skip using push cache altogether and connect directly to Github, which comes in handy during development (see use cases to learn more about this).
When the configuration repository is updated, a CD pipeline kicks in. This fetches configuration, converts it into a more optimized format, and pushes it to cache. Additionally, the CD pipeline validates the configuration (once at pull-request stage and again after being merged to master) and controls multi-phase deployment by deploying config change to only 20% of production hosts at one time.
Before we can connect to the push cache to fetch configuration, we need to know where it is. That’s where bootstrap configuration comes into play. It’s very simple. The config contains the hostname, port to connect to, and the name of the environment to use. You need to put this config with your code or as part of the CD pipeline. This simple yaml file binding Spring profiles to different Consul hosts suffices for our needs:
The configuration library takes care of fetching the configuration from push cache and exposing it to your business logic. We use the library called cfg4j (“configuration for java”). This library re-loads configurations from the push cache every few seconds and injects them into configuration objects that our code uses. It also takes care of local caching, merging properties from different repositories, and falling back to user-provided defaults when necessary (read more at http://www.cfg4j.org/).
Briefly summarizing how we use cfg4j’s features:
The most important piece is business logic. To best make use of a configuration service, the business logic has to be able to re-configure itself in runtime. Here are a few rules of thumb and code samples:
Direct configuration injection (won’t reload as config changes)
Configuration injection via “interface binding” (will reload as config changes):
When you develop a feature, a main concern is the ability to evolve your code quickly. A full configuration-management pipeline is not conducive to this. We use the following approaches when doing local development:
When you work with multiple environments, some of them may share a configuration. That’s when using configuration defaults may be convenient. You can do this by creating a “default” environment and using cfg4j’s MergeConfigurationSource for reading config first from the original environment and then (as a fallback) from “default” environment.
Configuration repository, push cache, and configuration CD pipeline can experience outages. To minimize the impact of such events, it’s good practice to cache configuration locally (in-memory) after each fetch. cfg4j does that automatically.
Tests can’t always detect all problems. Bugs leak to the production environment and at times it’s important to make a config change as fast as possible to stop the fire. If you’re using push cache, the fastest way to modify config values is to make changes directly within the cache. Consul offers a rich REST API and web ui for updating configuration in the key-value store.
Verifying that code and configuration are kept in sync happens at the configuration CD pipeline level. One part of the continuous deployment process deploys the code into a temporary execution environment, and points it to the branch that contains the configuration changes. Once the service is up, we execute a batch of functional tests to verify configuration correctness.
The presented solution is the result of work that we put into building huge-scale photo-serving services. We needed a simple, yet flexible, configuration management system. Combining Git, Github, Consul and cfg4j provided a very satisfactory solution that we encourage you to try.
I want to thank the following people for reviewing this article: Bhautik Joshi, Elanna Belanger, Archie Russell.