How Airbnb Standardized Metric Computation at Scale

Metric Infrastructure with Minerva @ Airbnb Part II: The six design principles of Minerva compute infrastructure By: Amit Pahwa, Cristian Figueroa, Donghan Zhang, Haim Grosman, John Bodley, Jonathan Parks, Maggie Zhu, Philip Weiss, Robert Chang, Shao Xie, Sylvia Tomiyama, Xiaohui Sun Introduction As described in the first post of this series, Airbnb invested significantly in building Minerva, […]

Metric Infrastructure with Minerva @ Airbnb

Part II: The six design principles of Minerva compute infrastructure

By: Amit Pahwa, Cristian Figueroa, Donghan Zhang, Haim Grosman, John Bodley, Jonathan Parks, Maggie Zhu, Philip Weiss, Robert Chang, Shao Xie, Sylvia Tomiyama, Xiaohui Sun

Introduction

As described in the first post of this series, Airbnb invested significantly in building Minerva, a single source of truth metric platform that standardizes the way business metrics are created, computed, served, and consumed. We spent years iterating toward the right metric infrastructure and designing the right user experience. Because of this multi-year investment, when Airbnb’s business was severely disrupted by COVID-19 last year, we were able to quickly turn data into actionable insights and strategies.

In this second post, we will deep dive into our compute infrastructure. Specifically, we will showcase how we standardize dataset definitions through declarative configurations, explain how data versioning enables us to ensure cross-dataset consistency, and illustrate how we backfill data efficiently with zero downtime. By the end of this post, readers will have a clear understanding of how we manage datasets at scale and create a strong foundation for metric computation.

Minerva’s Design Principles

As shared in Part I, Minerva was born from years of growing pains and related metric inconsistencies found within and across teams. Drawing on our experience managing a metric repository specific to experimentation, we aligned on six design principles for Minerva.

We built Minerva to be:

  • Standardized: Data is defined unambiguously in a single place. Anyone can look up definitions without confusion.
  • Declarative: Users define the “what” and not the “how”. The processes by which the metrics are calculated, stored, or served are entirely abstracted away from end users.
  • Scalable: Minerva must be both computationally and operationally scalable.
  • Consistent: Data is always consistent. If definition or business logic is changed, backfills occur automatically and data remains up-to-date.
  • Highly available: Existing datasets are replaced by new datasets with zero downtime and minimal interruption to data consumption.
  • Well tested: Users can prototype and validate their changes extensively well before they are merged into production.

In the following sections, we will expand on each of the principles described here and highlight some of the infrastructure components that enable these principles. Finally, we will walk through the user experience as we bring it all together.

Minerva is Standardized

You may recall from Part I that the immense popularity of the core_data schema at Airbnb was actually a double-edged sword. On the one hand, core_data standardized table consumption and allowed users to quickly identify which tables to build upon. On the other hand, it burdened the centralized data engineering with the impossible task of gatekeeping and onboarding an endless stream of new datasets into new and existing core tables. Furthermore, pipelines built downstream of core_data created a proliferation of duplicative and diverging metrics. We learned from this experience that table standardization was not enough and that standardization at the metrics level is key to enabling trustworthy consumption. After all, users do not consume tables; they consume metrics, dimensions, and reports.

Minerva is focused around metrics and dimensions as opposed to tables and columns. When a metric is defined in Minerva, authors are required to provide important self-describing metadata. Information such as ownership, lineage, and metric description are all required in the configuration files. Prior to Minerva, all such metadata often existed only as undocumented institutional knowledge or in chart definitions scattered across various business intelligence tools. In Minerva, all definitions are treated as version-controlled code. Modification of these configuration files must go through a rigorous review process, just like any other code review.

At the heart of Minerva’s configuration system are event sources and dimension sources, which correspond to fact tables and dimension tables in a Star Schema design, respectively.

Figure 1: Event sources and dimension sources are the fundamental building blocks of Minerva.

Event sources define the atomic events from which metrics are constructed, and dimension sources contain attributes and cuts that can be used in conjunction with the metrics. Together, event sources and dimension sources are used to define, track, and document metrics and dimensions at Airbnb.

Minerva is Declarative

Prior to Minerva, the road to creating an insightful piece of analysis or a high-fidelity and responsive dashboard was a long one. Managing datasets to keep up with product changes, meet query performance requirements, and avoid metric divergence quickly became a significant operational burden for teams. One of Minerva’s key value propositions is to dramatically simplify this tedious and time consuming workflow so that users can quickly turn data into actionable insights.

Figure 2: Data Science workflow improvement.

With Minerva, users can simply define a dimension set, an analysis-friendly dataset that is joined from Minerva metrics and dimensions. Unlike datasets created in an ad-hoc manner, dimension sets have several desirable properties:

  • Users only define the what and need not concern about the how. All the implementation details and complexity are abstracted from users.
  • Datasets created this way are guaranteed to follow our best data engineering practices, from data quality checks, to joins, to backfills, everything is done efficiently and cost effectively.
  • Data is stored efficiently and is optimized to reduce query times and responsiveness of downstream dashboards.
  • Because datasets are defined transparently in Minerva, we encourage metric reuse and reduce dataset duplication.
Figure 3: Programmatic denormalization generates dimension sets which users can easily configure.

By focusing on the “what” and not on the “how”, Minerva improves user productivity and maximizes time spent on their primary objectives: studying trends, uncovering insights, and performing experiment deep dives. This value proposition has been the driving force behind Minerva’s steady and continual adoption.

Minerva is Scalable

Minerva was designed with scalability in mind from the outset. With Minerva now serving 5,000+ datasets across hundreds of users and 80+ teams, the cost and maintenance overhead is a top priority.

At its core, Minerva’s computation is built with the DRY (Do not Repeat Yourself) principle in mind. This means that we attempt to re-use materialized data as much as possible in order to reduce wasted compute and ensure consistency. The computational flow can be broken down into several distinct stages:

  • Ingestion Stage: Partition sensors wait for upstream data and data is ingested into Minerva.
  • Data Check Stage: Data quality checks are run to ensure that upstream data is not malformed.
  • Join Stage: Data is joined programmatically based on join keys to generate dimension sets.
  • Post-processing and Serving Stage: Joined outputs are further aggregated and derived data is virtualized for downstream use cases.
Figure 4: High-level Minerva computation flow.

In the ingestion stage, Minerva waits for upstream tables to land and ingests the data into the static Minerva tables. This ingested data becomes the source-of-truth and requires modification to the Minerva configs in order to be changed.

The data check stage ensures the source data is formatted correctly before further processing is performed and dimension sets are created. Here are some typical checks that Minerva runs:

  • sources should not be empty
  • timestamps should not be NULL and should meet ISO standards
  • primary keys should be unique
  • dimension values are consistent with what is expected

For the join stage, the same data referenced in disparate dimension sets are sourced, computed, and joined from the same upstream tables with the same transformation logic. Intermediate datasets are also added in cases where it adds overall efficiency to the system. This centralized computation, as illustrated in the diagram above, is the key to how we ensure consistent and efficient dataset computation at scale.

Finally, in the post-processing and serving stage, data is optionally further optimized for end-user query performance and derived data is virtualized. We will dive into more details on this stage in the third post of this series.

In addition to being computationally scalable, we need our platform to be operationally efficient. We have included a few critical features that allow us to do this. Specifically, we will highlight smart self-healing, automated backfilling with batched backfills, and intelligent alerting.

Self-healing allows Minerva to gracefully and automatically recover from various transient issues such as:

  • A bug in the pipeline or platform code
  • Infrastructure instability such as cluster or scheduler outages
  • Timeouts due to upstream data that has missed its SLA

In order to achieve self-healing, Minerva must be data-aware. Each time a job starts, Minerva checks the existing data for any missing data. If missing data is identified, it automatically includes it in the current run. This means a single run can decide computation windows dynamically and backfill data. Users are not required to manually reset tasks when they fail due to transient issues.

Figure 5: Missing data from failed runs are identified and computed as part of future runs.

This self-healing logic also leads to automated backfills. If Minerva identifies that no data exists for the relevant data version, it automatically begins generating the data from its upstream datasets. If the backfill window is very long (e.g., several years), it may generate a long-running query. While our underlying computation engine should be scalable enough to handle heavy queries, having one query running for a long time is still risky: long-running queries are sensitive to transient infrastructure failures, costly to recover if they fail, and create spikes in resource usage. At the other end of the spectrum, using a small backfilling window such as one day is too slow to work well over a longer time period. In order to improve scalability, reduce runtimes, and improve recoverability, we implemented batched backfills.

Figure 6: A single job is broken up into several parallel monthly batches within the 2021–05–01 task.

With batched backfills, Minerva splits the job into several date ranges based on the scalability of that specific dataset. For example, Minerva can split a backfill of two years of data into 24 one-month batches, which run in parallel. Failed batches are retried automatically in the next run.

This automated dataset management also leads to split responsibilities across teams. Infrastructure issues are owned by the platform team, whereas data issues are owned by the respective product or data science teams. Different datasets require different levels of escalation. Minerva intelligently alerts the appropriate team based on the type of error and notifies downstream consumers of data delays. This effectively distributes operational load across the company while assigning responsibility to the team best suited to resolve the root cause. Through this alerting system design we avoid parallel triaging by multiple teams.

Self-healing, automated batched backfills, and intelligent alerting are three features that together enable Minerva to be a low-maintenance, operationally efficient, and resilient system.

Minerva is Consistent

Minerva’s metric repository is altered frequently by many users and evolves very rapidly. If we do not carefully coordinate these changes, it is very likely that metrics and dimensions will diverge. How can we ensure that datasets produced by Minerva are always consistent and up-to-date?

Our solution lies in what we call a data version, which is simply a hash of all the important fields that are specified in a configuration file. When we change any field that impacts what data is generated, the data version gets updated automatically. Each dataset has a unique data version, so when the version is updated a new dataset gets created and backfilled automatically. The example below illustrates how this mechanism works.

Figure 7: An update on a single dimension can trigger backfills across all datasets that use this dimension.

In the scenario seen in Figure 7, a certain dimension in dimension source 1 is updated. Given that this dimension is being used by two dimension sets (i.e., A123 and B123), data versions associated with these two dimension sets will get updated accordingly. Since the data versions are now updated, backfills for these two dimension sets will kick in automatically. In Minerva, the cycle of new changes resulting in new data versions, which in turn trigger new backfills, is what allows us to maintain data consistency across datasets. This mechanism ensures that upstream changes are propagated to all downstream datasets in a controlled manner and that no Minerva dataset will ever diverge from the single source of truth.

Minerva is Highly Available

Now that we have explained how Minerva uses data versioning to maintain data consistency, a keen user might already observe a dilemma: the rate of backfills competes with the rate of user changes. In practice, backfills often could not catch up with user changes, especially when updates affect many datasets. Given that Minerva only surfaces data that is consistent and up-to-date, a rapidly changing dataset could end up in backfill mode forever and cause significant data downtime.

To address this challenge, we created a parallel computation environment called the Staging environment. The Staging environment is a replica of the Production environment built from pending user configuration modifications. By performing the backfills automatically within a shared environment prior to replacing their Production counterparts, Minerva applies multiple unreleased changes to a single set of backfills. This has at least two advantages: 1) Users no longer need to coordinate changes and backfills across teams, and 2) Data consumers no longer experience data downtime.

The data flow for the Staging environment is as follows:

  1. Users create and test new changes in their local environment.
  2. Users merge changes to the Staging environment.
  3. The Staging environment loads the Staging configurations, supplements them with any necessary Production configurations, and begins to backfill any modified datasets.
  4. After the backfills are complete, the Staging configurations are merged into Production.
  5. The Production environment immediately picks up the new definitions and utilizes them for serving data to consumers.
Figure 8: A configuration change is first loaded into Staging and then merged to Production when release-ready.

The Staging environment allows us to have both consistency and availability for critical business metrics, even when users update definitions frequently. This has been critical for the success of many mass data migrations projects within the company, and it has aided efforts to revamp our data warehouse as we focused on data quality.

Minerva is Well Tested

Defining metrics and dimensions is a highly iterative process. Users often uncover raw data irregularities or need to dig deeper to understand how their source data was generated. As the source of truth for metrics and dimensions built on top of automatically generated datasets, Minerva must help users validate data correctness, clearly explain what is happening, and speed up the iteration cycle.

Figure 9: A user’s development flow using the Minerva prototyping tool.

To do this, we created a guided prototyping tool that reads from Production but writes to an isolated sandbox. Similar to the Staging Environment, this tool leverages the Minerva pipeline execution logic to generate sample data quickly on top of the user’s local modifications. This allows users to leverage new and existing data quality checks while also providing sample data to validate the outputs against their assumptions and/or existing data.

The tool clearly shows the step-by-step computation the Minerva pipeline will follow to generate the output. This peek behind-the-curtain provides visibility into Minerva computation logic, helps users debug issues independently, and also serves as an excellent testing environment for the Minerva platform development team.

Finally, the tool uses user-configured date ranges and sampling to limit the size of the data being tested. This dramatically speeds up execution time, reducing iteration from days to minutes, while allowing the datasets to retain many of the statistical properties needed for validation.

Putting It Together: A COVID-19 Case Study

To illustrate how everything fits together, let’s walk through an example of how Alice, an analyst, was able to turn data into shared company insights with Minerva. As described in our first post, COVID-19 has completely changed the way people travel on Airbnb. Historically, Airbnb has been pretty evenly split between demand for urban and non-urban destinations. At the onset of the pandemic, Alice hypothesized that travelers would avoid large cities in favor of destinations where they could keep social distance from other travelers.

To confirm this hypothesis, Alice decided to analyze the nights_booked metric, cut by the dim_listing_urban_category dimension. She knew that the nights_booked metric is already defined in Minerva because it is a top-line metric at the company. The listing dimension she cared about, however, was not readily available in Minerva. Alice worked with her team to leverage the Global Rural-Urban Mapping Project and GPW v4 World Population Density Data¹ created by NASA to tag all listings with this new metadata. She then began to prototype a new Minerva dimension using this new dataset.

Figure 10: Alice configures the new dimension in a dimension source.

Alice also included the new dimension definition in several dimension sets used across the company for tracking the impact of COVID-19 on business operations.

Figure 11: Alice adds the new dimension to the COVID SLA dimension set owned by the Central Insights team.

To validate this new dimension in Minerva, Alice used the prototyping tool described above to compute a sample of data with this new dimension. Within minutes, she was able to confirm that her configuration was valid and that the data was being combined accurately.

Figure 12: Alice was able to share sample data with her teammate within a few minutes.

After validating the data, Alice submitted a pull request for code review from the Core Host team, which owns the definition of all Listing metadata. This pull request included execution logs, computation cost estimates, as well as links to sample data for easy review. After receiving approvals, Alice merged the change into the shared Staging environment where, within a few hours, the entire history of the modified datasets were automatically backfilled and eventually merged into Production.

Figure 13: With Alice’s change, anyone in the company could clearly see the shift in guest demands as travel rebounds.

Using the newly created datasets, teams and leaders across the company began to highlight and track these shifts in user behavior in their dashboards. This change to our key performance indicators also led to new plans to revamp key product pages to suit users’ new travel patterns.

Figure 14: Adoption of the new dimension source (red) across event sources (y-axis).

Through this process, Alice was able to define a new dimension, attach it to pre-existing metrics, obtain approvals from domain owners, and update numerous critical datasets across several teams within days. All of this was done with just a few dozen lines of YAML configuration.

Closing

In this post, we explored Minerva’s compute infrastructure and its design principles. We highlighted how users are able to define standardized data in Minerva with detailed metadata. By making the user interface declarative, users need only focus on the “what” and not the “how”. Because Minerva is well-tested and consistent, we were able to foster trust with our users. Finally, Minerva’s scalable design enabled us to expand our footprint to drive adoption and data standardization within the company.

As Minerva has become ubiquitous within the company, users have found it much easier to do their data work. We are dedicated to making Minerva even better in the future. Some items from our near-term roadmap include leveraging Apache Iceberg as our next-generation table format for storage, expanding to real-time data, and supporting more complex definitions, to name just a few!

In our next post in this series, we will switch gears to discuss how Minerva leveraged the above consistency and availability guarantees to abstract users away from the underlying datasets. We will outline the challenges, highlight our investment in the Minerva API, and explain how we integrated Minerva with the rest of the Airbnb data stack to provide a unified metric centric consumption experience in our metric layer. Stay tuned for our next post!

Acknowledgements

Minerva is made possible only because of the care and dedication from those who worked on it. We would like to thank Clint Kelly and Lauren Chircus for helping us build and maintain Minerva. We would also like to thank Aaron Keys, Mike Lin, Adrian Kuhn, Krishna Bhupatiraju, Michelle Thomas, Erik Ritter, Serena Jiang, Krist Wongsuphasawat, Chris Williams, Ken Chen, Guang Yang, Jinyang Li, Clark Wright, Vaughn Quoss, Jerry Chu, Pala Muthiah, Kevin Yang, Ellen Huynh, and many more who partnered with us to make Minerva more accessible across the company. Finally, thank you Bill Ulammandakh for providing a great case study to walk through!

¹ All data associated with the Global Rural-Urban Mapping Project and GPW v4 World Population Density Data datasets are the property of NASA’s Earth Science Data Systems program; Airbnb claims no ownership of that data and complies fully with all legal use restrictions associated with it.

All product names, logos, and brands are property of their respective owners. All company, product and service names used in this website are for identification purposes only. Use of these names, logos, and brands does not imply endorsement.


How Airbnb Standardized Metric Computation at Scale was originally published in Airbnb Engineering & Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Source: Airbnb