Benchling is a data platform to help scientists do research. Hundreds of thousands of scientists across academic labs and enterprise companies use Benchling to store and analyze scientific data, record notes, and share findings with each other. But not everyone should be allowed to access everything. Benchling’s platform should allow users to configure exactly who can […]
Benchling is a data platform to help scientists do research. Hundreds of thousands of scientists across academic labs and enterprise companies use Benchling to store and analyze scientific data, record notes, and share findings with each other.
But not everyone should be allowed to access everything. Benchling’s platform should allow users to configure exactly who can take which actions on what data. This authorization is crucial to life science, where companies often worry about regulatory compliance and IP protection. Scientists also often play specialized roles: one person’s role may be to design DNA, while another’s may be to run analyses and collect results. Our platform should allow these scientists to collaborate without accidentally modifying each other’s data.
We originally built Benchling with simple permission levels (read, write, admin) and recently moved to data policies for fine-grained configuration. This post focuses on two key pieces of the project: the details of how we migrated between permission systems without errors or downtime, and a process called granularization — how we can easily make data policies more and more fine-grained based on new configuration needs.
(See Lessons learned for the summary)
Benchling started as a free-to-use DNA platform marketed to academic users and their labs. Users had very basic authorization needs: allow other scientists to view or edit their DNA sequences.
Benchling’s product expanded, and so did our customers. Users store data in Benchling ranging from DNA sequences to lab notebook entries. Enterprise customers often organize the data to match how their own research divisions are organized.
Consider a Cancer Research Project within Benchling, with a DNA Design Team designing DNA and a team of Research Scientists experimenting with that DNA.
Note how the configuration needs are more complex than before — different teams are only granted certain actions to specific types of data based on various conditions.
But these rules are not the same for all customers. They cause too much overhead for smaller companies, including biotech startups and academic labs with only a handful of scientists. These companies want looser configurations, like permitting all scientists to edit DNA sequences, until they grow to a scale where they need to more closely manage their data.
Over the past few years, we’ve learned new needs for configuring authorization, both because we’ve partnered with more mature customers and because each customers’ processes and roles are different. We need to support customer authorization needs becoming more and more granular — for example, wanting to configure who can archive old DNA sequences, who can edit DNA sequence bases vs. only metadata. We call this process granularization. As we continue to learn new roles, our system will need to easily support migrating to a new authorization model.
Users organize their Benchling data, like DNA sequences, into projects. Users also organize themselves into teams and organizations. We started with 3 very rudimentary permission levels: read (viewing project contents), write (editing contents), and admin (configuring project permissions).
To configure authorization, the creator of a project would add users, teams, and organizations as project collaborators and assign each a permission level.
Larger enterprise customers had more complex needs for permissions, but authorization configuration was limited to the three levels. At the core, levels are
We first tried to solve the first issue by introducing more levels, like “append”. We then tried to solve the second issue by introducing flags. Lots of configurable flags for product needs, like project.enable_edit_dna_sequence_bases flag (on the project) to lock down a project's DNA sequence bases or ENABLE_EDIT_SETTINGS_ADMIN_ONLY flag (site-wide) to configure whether the edit settings action mapped to write or admin, and so on.
While these flags unblocked some product needs, they were inconsistently added to different places — site-wide, by organization, team, project, etc. — and made the authorization code hard to understand. Authorization code is the last thing you want to be hard to understand. More branches meant higher likelihood for untested branches and data security bugs. The combination of levels and flags also made the product difficult to configure, and still didn't map cleanly to the customer's needs.
So how should we model enterprise authorization needs?
First, we researched existing enterprise authorization models to help us design the new system for Benchling. These models include ones from products we use, like AWS and Salesforce. We saw recurring concepts like user groups, configurable authorization rules, and fine-grained permissions that convinced us that we were on the right track with our own design.
Our guiding principle was “what do customer admins need to be able to configure?” Benchling data is already organized into projects. Users are already organized into teams inside organizations. The missing piece is to configure those projects to allow those users to only access and manipulate data pertinent to their function. We model this with user-configurable data policies. Each has a list of statements that captures the authorization rules of that policy.
Policy statements govern data along three axes:
We can now configure the Cancer Research Project from before with two data policies:
PolicyStatement(DNA_SEQUENCE, EDIT_METADATA, ALWAYS)
PolicyStatement(NOTEBOOK_ENTRY, EDIT, IF_AUTHOR)
PolicyStatement(DNA_SEQUENCE, EDIT_BASES, ALWAYS)
PolicyStatement(DNA_SEQUENCE, EDIT_METADATA, ALWAYS)
PolicyStatement(NOTEBOOK_ENTRY, READ, ALWAYS)
Now, the customer admin can configure a project to associate users who have access to that project to policies. They can add policies for users, teams, and organizations.
A user is authorized for a piece of data as long as any policy statement in any policy assigned to the user matches the item type of the data, the intended action by the user, and the condition under which they’re operated.
We explicitly decided to use only use additive permissions. Some of our users wanted the ability for policies to deny authorization to users. This was especially common when the customer worked with contractors and wanted to restrict the contractors’ access. However, subtractive permissions become harder to reason about and harder to implement correctly. If a user’s team is denied a policy but the user is granted the same policy, is the user authorized? Pushing back on product complexity allowed us to implement a simpler, easier-to-reason-about system for Benchling developers and users alike. Instead, we recommend that customers create a separate team for contractors and assign them a more restrictive policy.
How do we decide which item types, actions, and conditions to support? For all the things you can do in Benchling, are there 10 actions? 100 actions? Too few actions and customers can’t configure permissions against their processes and roles. Too many actions and the system becomes hard to understand.
The guiding principle is that it’s easy to fragment an existing item type, action, or condition, but it’s very hard, and in fact, nearly impossible to defragment, as customers may already be taking advantage of said granularity. So, we started with the minimum number of actions needed to support our existing product needs, and will carefully granularize as needed, which we’ll dive into a bit later.
Correctness is crucial when implementing permissions — after all, it’s at the core of data access. When we migrated from levels to policies, our top priority was to make sure the new system was still correct, that each user had access to exactly the data they were supposed to have access to.
To migrate from levels to policies without user impact, we started by creating a policy for each of our previous levels: Read, Append, Write, Admin. We then took a number of steps so that we were fully confident that the new policies were behaving the same as levels and the smattering of configuration flags.
For a while, we ran authorization checks through both systems: checking via both policies and levels. To do this, we mapped each policy to a level:
> SELECT * FROM policy;
id | name | legacy_permission_level
1 | "Read" | "READ"
2 | "Append" | "APPEND"
3 | "Write" | "WRITE"
4 | "Admin" | "ADMIN"
Previously, our permissions were modeled by who (a user, organization, or team), what data (e.g. a project), and how much access (read, append, write, admin levels) — in the new system, we transitioned from levels to data policies. To ensure that both the levels and data policies were kept in sync during the migration, we added foreign keys from the old permission models to the corresponding data policy, and used transitive foreign key constraints to ensure that the old level and the new data policy didn’t fall out of sync:
The DB constraints ensured we were updating policies correctly and synchronously. When we later updated our permissions API to configure policies instead of levels, we also redundantly set the levels, to help ensure our system was still working as expected.
This also allowed us to run redundant authorization checks — both with the old level and with the new data policy. In our authorization checks, we’d compare the outputs of the two authorization checks in production, and logged an error if they mismatched.
Redundant checks helped us catch ~5 bugs. We found bugs in the new code where we didn’t port over a configuration flag or didn’t backfill it properly. We also found bugs in the old code, a result of the old system having a complex imperative implementation. It was especially helpful that we ran it across all production environments, which are all configured differently.
We focused heavily on correctness, and were too lenient on performance. Our authorization checks are extremely hot code paths, since they run in nearly every endpoint. We tried keep the redundant checks relatively short-lived (~a few weeks), since it had a non-trivial effect on performance. Our listing endpoints that query for all readable items weren’t all performant, and running it twice made the slowness more noticeable. In retrospect, it would have been nice to add monitoring to authorization checks to measure how we were affecting performance more precisely and address any performance regressions that surfaced.
As part of this project, we wanted to empower all engineers at Benchling to easily modify policies to support new customer needs. In a granularization, we’re breaking down axes of policy statements, and allowing customers to customize policies at a more granular level.
The most common granularization is to break off a specific action from the WRITE action. For example, previously editing bases and editing metadata were both WRITE actions. To support configuring each separately, we introduced the EDIT_BASES action, which is granularized, or split off from, the old WRITE action.
We had a few system requirements during a granularization:
Naively, we could easily granularize WRITE to EDIT_BASES and WRITE (which still includes editing metadata) in two steps:
But a few things would break:
To handle this, we versioned policies: we track policy API versions in the code and versions for policy statements in the database. This version allows us seamlessly upgrade during our deploy cycle:
Granularization became a multi-step process:
Given that different product teams within Benchling will want to perform granularizations in the future, we needed to make the process more developer-friendly. To do this, we added a few things:
Granularizations will always be customer-driven, and we made an effort to make it easy for the developer to execute it without incurring any correctness issues during the transition.
At Benchling, we’re building a platform for scientists to streamline their research. Scientists across academic labs, small biotech startups, and large pharmaceutical companies are relying on Benchling to store their most important data, but they all have different collaboration models. Our permissions system has to support each organization’s data access needs and internal processes. This led us to evolving from a simple, read/write/admin level-based system to a highly configurable, yet easy-to-understand, policy-based system. We know these needs will change as the ways our customers do research evolves, and we’ll continue growing our permissions system alongside it.
And if you’re interested in working on problems like this, we’re hiring!