Part-III: Building a coherent consumption experience By: Amit Pahwa, Cristian Figueroa, Donghan Zhang, Haim Grosman, John Bodley, Jonathan Parks, Jenny Liu, Krishna Bhupatiraju, Maggie Zhu, Mike Lin, Philip Weiss, Robert Chang, Shao Xie, Sylvia Tomiyama, Toby Mao, Xiaohui Sun Introduction In the first post of this series, we highlighted the role Minerva plays in transforming how […]
By: Amit Pahwa, Cristian Figueroa, Donghan Zhang, Haim Grosman, John Bodley, Jonathan Parks, Jenny Liu, Krishna Bhupatiraju, Maggie Zhu, Mike Lin, Philip Weiss, Robert Chang, Shao Xie, Sylvia Tomiyama, Toby Mao, Xiaohui Sun
In the first post of this series, we highlighted the role Minerva plays in transforming how Analytics works at Airbnb. In the second post, we dove into Minerva’s core compute infrastructure and explained how we enforce data consistency across datasets and teams. In this third and final post, we will focus our story on how Minerva drastically simplifies and improves the data consumption experience for our users. Specifically, we will showcase how a unified metric layer, which we call the Minerva API, helps us build versatile data consumption experiences tailored to users with a wide range of backgrounds and varying levels of data expertise.
When data consumers use data to frame a business question, they typically think in terms of metrics and dimensions. For example, a business leader may wonder what percentage of bookings (a metric) is made up of long-term stays (a dimension). To answer this question, she needs to find the right set of tables from which to query (where), apply the necessary joins or filters (how), and then finally aggregate the events (how) to arrive at an answer that is, hopefully, correct.
While many traditional BI tools attempt to abstract this work on behalf of their users, most of their data-serving logic still relies heavily on the users to figure out the “where” and the “how”. At Airbnb, we aspired to build a better user experience — one in which users simply ask for metrics and dimension cuts, and receive the answers without having to worry about the “where” or the “how”. This vision, what we call a “metric-centric approach”, turned out to be a difficult engineering challenge.
In most traditional data warehouses, data is organized in tables. This means that to answer an inquiry, a BI tool needs to associate the metrics and dimensions in question to the physical tables that contain the relevant answers. However, for a given metric and dimension combination, there might be many datasets from which to source the answers. These tables often have varying degrees of data quality and correctness guarantees, so picking the right tables to serve the data is nontrivial.
Moving beyond the “where”, the data-serving logic responsible for the “how” also has many nuances. To start, there are different metric types: simple metrics are composed of single materialized events (e.g., bookings); filtered metrics are composed of simple metrics filtered on a dimension value (e.g., bookings in China); and derived metrics are composed of one or more non-derived metrics (e.g. search-to-book rate). Furthermore, while many metrics are additive (e.g., bookings), many other metrics are not: count distincts, percentiles, and point-in-time snapshots cannot simply be calculated by summing individual events. Consistently calculating these various metric types correctly, across all scenarios, is a big challenge.
Finally, to make data-informed decisions, data must be used in a wide variety of contexts, applications, and tools. The more prevalent and important the metric is, the more likely it is to be used in a wide variety of settings. For example, gross booking value (GBV), nights booked, and revenue are among the most frequently used metrics at Airbnb. They are used to track business performance, calculate guardrail metrics for randomized controlled experiments, and leveraged as features for machine learning models. Serving these metrics in different use cases, while providing contextual information for users to use them the right way is yet another core challenge for us.
We have addressed these challenges by building the Minerva API, a metric-serving layer that acts as an interface between upstream data models and downstream applications. With Minerva API, any downstream application is able to serve data consistently and correctly without knowing where the data is stored or how metrics should be computed. In essence, the Minerva API serves as the “how” by connecting the “what” with the “where”.
The Minerva API consists of the API web server, a metadata fetcher application, and several clients that integrate with Apache Superset, Tableau, Python, and R. These components serve native NoSQL and SQL metric queries to the downstream applications.
We mentioned previously that users simply ask Minerva for metrics and dimension cuts without having to figure out the “where”. When a data request is issued, Minerva spends a great deal of effort figuring out which of its datasets should be used to honor that request.
Under the hood, Minerva takes into account several factors before picking an optimal data source — one of the most important factors being data completeness. This means that any data source chosen to serve the query should contain all the columns needed for a given user’s query request and must cover the time range required from the query request.
To accomplish this, we built a service called Metadata Fetcher that periodically fetches data source metadata and caches it in a MySQL database every 15 minutes. Specifically, we periodically fetch the latest copy of the Minerva configuration (stored in Thrift binary) from S3 to get the list of every valid Minerva data source in Druid. For each data source, we query the Druid broker to get its name and a list of associated metrics and dimensions. Furthermore, we can also get the min date, max date, and count of distinct dates from the broker to figure out if there is any missing data. Every time new information is fetched, we update the MySQL database to maintain the source of truth. With this metadata fetcher, we are able to serve the data request using the best data source at any given time.
Imagine a scenario in which a user is interested in knowing the trend of average daily price (ADR), cut by destination region, excluding private rooms for the past 4 weeks in the month of August 2021. The full spec of the example query might look like the following:
When Minerva receives such a request, it needs not only to figure out where to fetch the data, but also how to filter, combine, and aggregate the data to create the final result. It employs a strategy for achieving this via the Split-Apply-Combine paradigm, commonly used in data analysis.
When Minerva API receives a query request such as the one above, the first thing it does is to break up any derived metrics into what we call a Minerva “atomic” metric by creating a set of associated subqueries. If a user query only specifies an atomic Minerva metric, then this first step is essentially a no-op.
In the example above, given that the `price_per_night` metric is a ratio metric (a special case of derived metric) that contains a numerator (`gross_booking_value_stays`) and a denominator (`nights_booked`), Minerva API breaks up this request into two sub-requests.
With the atomic metrics identified from step 1, Minerva leverages metric configurations stored in S3 to extrapolate the associated metric expressions and metadata in order to generate the subqueries. Let’s stick with the same example: Minerva data API looks up the metric definition of `gross_booking_value_stays` and sees that it is a SUM aggregation, and similarly for the `nights_booked` metric. In both requests, a global filter ‘dim_room_type!=”private-room”’ is applied to ensure that private rooms are excluded from the calculation.
Once the associated subqueries are generated for each atomic metric, Minerva API finally sends the queries over to Druid or Presto. It chops up the query into several “slices” that span a smaller time range and then combines the results into a single dataframe if resource limitation is reached. The API also truncates any incomplete leading or trailing data before rolling up the dataframe based on the aggregation granularity.
Once Minerva rolls up the dataframes for each atomic metric, it then combines them into a single dataframe by joining the dataframes on the timestamp column. As a final step, Minerva API performs any necessary post-aggregation calculations, applies ordering, and limits before returning the final result to the client in serialized JSON format.
To recapitulate, with Minerva’s data source API and data API, we are able to abstract away the process of identifying “where” to fetch the data and “how” to return the data. This API serves as the single layer of abstraction for Minerva to honor any request coming from downstream applications. However, our story does not simply end here: many of our engineering challenges involve how to integrate different applications with this API. We will explore these challenges in the next section.
Bearing in mind the diverse set of data consumers within Airbnb, we set out to build tools tailored to different personas and use cases. With the Minerva API, we built a wide range of user interfaces that provide a consistent and coherent data consumption experience. As we mentioned briefly in the first post, there are four major integration endpoints, each supporting a different set of tools and audience:
When building out these features, we were constantly trading off between consistency, flexibility, and accessibility. For example, Metric Explorer is built mostly for non-technical users who are not data experts. This means that it needs to optimize consistency and accessibility over flexibility. Metric Explorer enforces strict guardrails that prevent users from doing the wrong thing, and there is very little opportunity to go off the “paved path”.
At the other extreme, the R and Python clients that are typically favored by data scientists are much more flexible. Users have full controls on how to leverage the clients’ API to perform custom analysis or visualization. In the next few sections, we will explain how some of these consumption experiences are created behind the scenes.
Metric Explorer was created at Airbnb so anyone, regardless of their level of data expertise, can leverage data to make informed decisions. Because of its broad target audience, Metric Explorer optimizes accessibility and data consistency over flexibility.
Under the hood, all of Metric Explorer’s metrics, dimensions, and relevant metadata are sourced from Minerva’s metric repository and ingested into Elasticsearch. These metadata are conveniently presented on the right sidebar as contexts before users perform any operations on the data.
When a user chooses to perform data operations such as Group By and Filter, Metrics Explorer presents dimensions in ranked order so that users with little or no business context can easily drill down, without needing to know the dimension values ahead of time — as illustrated above.
As users slice and dice the data, the Minerva API automatically determines which combination is valid and only surfaces cuts that exist. Nowhere in the experience does a user need to know anything about the underlying physical table from which the metric in question is sourced.
While Metrics Explorer provides high-level information about metrics, more adventurous users who wish to slice and dice the data more can do so in Superset. Apache Superset is a homegrown tool at the core of Airbnb’s self-serve BI solutions. Given the ubiquity of Superset inside the company, we knew that we needed to provide a functional SQL-like integration with Superset in order for Minerva to be widely adopted.
While many applications can be built on top of the Minerva API by talking to its RESTful endpoints directly, the client interfaces for BI tools such as Apache Superset and Tableau are more complex. Commonly, these BI tools speak SQL (via a client), not HTTP requests. This meant that Minerva API needed to support a SQL-like interface that adheres to the OLAP type query structure. To build such an interface, we added to Minerva API a SQL parser — leveraging sqlparse — to parse the SQL statement into an AST which is then validated and transformed into native HTTP requests.
Adhering to the DRY principle, we leveraged Apache Calcite Avatica, which defines a generic database wire API between a client and server. The Minerva API serves as the Avatica HTTP server and the client is either a custom Python Database API database driver with SQLAlchemy dialect (Superset) or Avatica provided JDBC connector (Tableau).
Unlike traditional BI tools for which custom business logic is implemented in the tools themselves, Minerva consolidates and obfuscates all this logic via pseudo SQL-like AGG metric expressions. In the table below, we compare and contrast the queries run in a traditional BI tool to those run in Superset:
In the query on the left, a user need not specify where the metric should be computed from, nor do they need to specify the correct aggregation function — these details are abstracted away by Minerva.
Finally, given that there are 12,000 metrics and 5,000 dimensions in Minerva, not all metric-dimension combinations are valid. For example, active listing can be cut by where the host is located, but not by where the guest is from (i.e. this guest attribute could be different for each booking reservation). We added event listeners to the chart controls to make sure that only eligible metric and dimension combinations are surfaced in the left pane. This design helps to reduce cognitive load and to simplify the data exploration process.
As presented in Part-I, XRF is a framework for producing succinct, high-fidelity, business critical reports that are consumed by executives and leadership teams. This framework is configured via the Minerva configs and powered entirely by Minerva API.
To curate an XRF report, users first define the reporting config and specify the desired business metrics, dimensional cuts, and global filters to apply. In addition, users can configure other controls such as whether a metric should be calculated as a running aggregation (e.g., MTD, QTD, or YTD), and the appropriate unit for growth rate time ratio comparisons (e.g., YoY, MoM, or WoW). Once these settings are specified, the Minerva API performs the necessary aggregations and final pivots to produce the final report.
The data output by XRF can be rendered in a Google sheet via a custom GoogleSheetHook as well as in Tableau via Presto connection. By leveraging the metric definitions in Minerva and its aggregation logic, we enforce consistency safeguards in the users’ choice of presentation layer.
Unlike the analytics or reporting use cases, the experimentation use case is unique in that the metrics used for reporting are only a starting point. To make proper causal inferences, metrics must be joined with experiment assignment data before transforming them into summary statistics that can be used for valid statistical comparisons.
Typically, Minerva supplies the “raw events” to ERF. Depending on the unit of randomization and unit of analysis, we join the Minerva data to the assignment logs using different subject keys so that each event will have the associated subject, as well as the experiment group attached to it. Summary statistics such as means, percent changes, and p-values are then calculated and surfaced in the ERF scorecard.
The Experimentation UI also exposes relevant Minerva metadata directly in the tool. Users can view the description and ownership information of the underlying Minerva events. A lineage view, overlayed with ETA information, allows users to track the progress of ERF metrics and helps them contact the relevant Minerva metric owners in case of delays.
In summary, Minerva and its various integrations enable users to easily track metrics within their scheduled reporting, measure movements due to experimentation, and explore unexpected changes — all with the confidence that the data is correct and consistent. This confidence drastically reduces the time spent deriving insights, increases trust in data, and helps to support data-driven decision making.
Minerva introduced a novel way of thinking about data, not only is it centered around a business- and metric-centric user interface, we also need to adapt traditional BI tools (that mostly talk SQL) to the interface of Minerva API. In some sense, it is akin to fitting a new square peg (Minerva) into an existing round hole ( BI Tools).
As more organizations embrace the concept of a metric layer similar to Minerva, we believe there will be a new set of challenges awaiting us. That said, some of this pioneering work will surely bring analytics to the next level, and we are grateful for contributing to the leading edge of this landscape. We hope that soon more companies will follow suit.
Thanks to everyone who contributed to the work and outcomes represented in this blog post. In addition to previous acknowledgements we would also like to thank those who have partnered with us to adopt Minerva within our consumption landscape.
All trademarks are the properties of their respective owners. Any use of these are for identification purposes only and do not imply sponsorship or endorsement.
How Airbnb Enables Consistent Data Consumption at Scale was originally published in The Airbnb Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.