Reducing costs on infrastructure can often feel like a chore when done in isolation. In this post, we’ll discuss the coordinated process that NextRoll is using to make the effort feel like a full-team project with the satisfaction of having a job well done. Costs have a nasty habit of accumulating over time, usually because […]
Reducing costs on infrastructure can often feel like a chore when done in isolation. In this post, we’ll
discuss the coordinated process that NextRoll is using to make the effort feel like a full-team
project with the satisfaction of having a job well done.
Costs have a nasty habit of accumulating over time, usually because of a wide variety of decisions
made by disparate teams that were made in the past and never revisited. It’s an ancient problem.
For that which is common to the greatest number has the least care bestowed upon it. Every one thinks chiefly
of his own, hardly at all of the common interest; and only when he is himself concerned as an individual.
For besides other considerations, everybody is more inclined to neglect the duty which he expects another
Of course, the difficulty is that many small problems demand many small solutions. As such, significantly
reducing your costs can seem like an exceptionally burdensome task.
Here are our tips if you’re looking to save a chunk out of your budget.
Often, teams are asked to track costs on their own and apply some common sense rules to keep
them down. This works ok in practice, but it is far from optimal. One of the main issues is that
it’s very easy for a team to look at a relatively small cost-saving opportunity and it gets pushed down to
the bottom of their priority list. I mean, in the grand scheme of things, does $1000 per month
seem like that big of a deal?
Metaphor for a non-right-sized instance.
Well, probably every engineering team is making similar calls. NextRoll Engineering has dozens
of teams as a moderately-sized company. Forty-two teams each saving $1000 every month results
in an extra half million dollars in the company coffers. And the thing is, $1000 per month is
not an onerous amount to be saving when you deal with data at the scale that we do; more on that
later, but really, this represents a lower bound on what can be saved.
Beyond this, operating costs affect the bottom line of any company. Ideally, this money would
be invested, so that returns can be made: hiring more people, spinning up new products, etc.
that can meaningfully drive the top line revenue. Instead, in the best case, these costs are just
wasted, but in the worst case, actually compound over time because not investing the money elsewhere
is an opportunity cost. Also, as the company scales, these wasted costs will only increase.
Getting your unit economics in line is critical.
To this end, we recommend creating a task force, with representatives across the engineering
organization, that is responsible for talking through all the cost-saving initiatives. This
team doesn’t need to meet frequently, but tracking and accountability are key. It also helps
everyone get on the same page about how systems are built and interoperate, revealing more
No one knows the code better than the engineers who work on their systems every day. Some in
management may be tempted to set arbitrary cost saving goals, but this is extremely
counterproductive. It’s also not enough to demand some vague sense of “cost savings” as a
project. In reality, each individual cost-saving initiative is unique in its own right and has
its own set of tradeoffs that the engineers are best equipped to evaluate.
To this end, ask every engineering team to submit their ideas. Each team can meet amongst
themselves, spend an hour or two just brainstorming, and write everything down. At
NextRoll, we take an attitude of every idea being valid in this stage, no matter how difficult
to accomplish. In some ways, this is similar to a design sprint: the ideas must be on the table
before further discussions around feasibility and prioritization can happen. There are side
benefits as well: small ideas can be enhanced or compounded; new product ideas can emerge; wild
ideas can inspire other, more feasible ones.
One of NextRoll’s core values is, “Do more with less.” We try to consistently uphold this,
but these cost-saving efforts provide a good opportunity for us to reflect on that value. It’s
important during brainstorming that engineers ask themselves not, “What do we want,” or “What
is convenient,” but rather, “What do we need?”
Everyone is contributing. This creates a shared sense of ownership and engineers
become invested in their own ideas. Finger-pointing at the teams with the largest cost centers
melt away, because, hey, we already know the little stuff adds up and we’re all in this
together. Every team is trying to reduce their footprint to only what’s required.
Every team lists out some potential objectives and it’s clear how these relate to
the overarching theme and goal of the project.
Once all of the ideas are generated, they need to be submitted to the aforementioned task force.
It’s up to the task force to meet and decide which tasks will be tackled. Hopefully the task
force has enough representation across engineering that they know enough context to discuss the
proposals intelligently. But just in case, these ideas should contain documentation that includes:
Estimates do not need to be super accurate. The idea is to get a rough cut of what is possible.
If you have enough ideas submitted, it is likely that some estimates will be over and some will
be under. Statistically, over all submissions, you’ll probably hit a reasonable total
estimate of how much money is on the table for the taking.
From here, the task force can compile the list of items in a centralized location, assess the
total opportunity on each team and across the organization, and be prepared for their first meeting.
Another good tip for soliciting ideas is to set a reasonable minimum on how much a submission
can save. This will keep brainstorming meetings focused and reduce the number of items the
task force has to prioritize. What that minimum should be is going to depend on any number of
variables, such as revenue, current total costs, number of teams, desired savings goals, etc.
Pick something that fits your company’s situation.
The task force finally needs to meet and discuss all items. This will likely be a
time-consuming meeting, but it will be valuable. We’ll talk about side benefits later, but
for now, let’s talk about how to prioritize.
One thing that’s critical to keep in mind is that individual teams still have product roadmaps
to execute on. Cost reduction doesn’t happen in a vacuum. Representatives should be able to
explain to the group what the current workload for each individual team is, and how much
engineering time is available to spare. But time isn’t free, so product management needs to
be on board with the overall effort, and recognize that some time is going to be spent on
getting unit costs down.
At this point, it’s common sense to tackle the biggest bang-for-the-buck items. As mentioned
prior, NextRoll is fundamentally a data company. A really easy place for things to build up
are S3 costs. My background is data science; my personal inclination is to keep plenty of data
around so I can look at historical models, their performance, and so on. But how often do we
really need to go back, say, 90 days? Is 45 enough? These are the questions we were asking
ourselves on my teams, and we realized there were a lot of five-minute tasks to set some TTLs
that slashed our budgets pretty significantly.
Metaphor for a typical S3 bucket.
Right-sizing instances is another big opportunity; as products develop over time, the needs
for their servers can shift. It’s also possible to identify less-used features in the product
that are supported by data and infrastructure and to just delete and shut off those services
entirely. For high-volume systems, even a 10% improvement in efficiency can meaningfully impact
A common question at NextRoll was, “What about AWS reservations? If we’ve already paid for
servers, why should we shut them down?” Sure, but this is something that should largely
factor into prioritization. The reservations buy you time to reduce the number of servers
you need. Try to avoid falling for a sunk cost fallacy: once the money is spent it’s gone.
Those reservations could be used for other services that need them more, or could be used to
spin up new product features. Also, reservations will eventually expire. Even though you’ve
bought some time, try to get ahead of the problem and don’t lose focus on the end goal. Chances
are, that expiration date will sneak up on you, and you’ll be in a new product development
cycle that may be hard to find the time. The point is to strike while the iron’s hot.
Once the task force has balanced every team’s roadmaps and selected a set of appropriate items,
they organize their selections into a centralized document that all teams can reference. The
selected items get pushed down to the individual teams for implementation with some expectation
on when the items will be completed. At NextRoll, we’ve opted for a quarterly deadline. Like any
engineering project, deadlines can slip, but what’s important is to be on top of the tracking.
Some of the side benefits of this task force meeting are that participants gain a better sense
of how other systems they aren’t responsible for work and interact with each other. They also
get a better sense of other teams’ product roadmaps. If they’re technical leaders for their
respective teams, they can bring that knowledge back. On top of all this, the task force
meeting is yet another opportunity to brainstorm and share notes. As the team goes through all
the items, new opportunities and solutions may present themselves that can be discussed with
the implementing teams.
I’m a big fan of this quote, probably apocryphally attributed to Karl Pearson, and likely
expressed by others before him:
“That which is measured improves. That which is measured and reported improves exponentially.”
After each team implements an item, they should spend some time to measure the actualized
savings and report back. The task force can log these data as they come in, and track how
well estimates line up with reality. This feedback loop will hopefully lead to better
estimates in the future.
The most obvious benefit here is that these numbers can help FP&A teams with their jobs.
Understanding the overall impact on the company justifies the effort spent achieving the
results. Measuring the return on investment will also help in the future when it’s time
to tackle further initiatives.
And, of course, measuring the results provides engineers with some actual satisfaction by
seeing the results of their labors. Since we approached this collectively, every engineer
contributed to the outcome. The flipside of the “death by a thousand cuts,” is that the
thousand bandages add up to something big and meaningful. The task force should take some
time to communicate the summarized results to not only the whole engineering team, but the
whole company. Whatever medium works for you is fine; at NextRoll we report on our results
at our All Hands meetings.
At NextRoll, the task force meets monthly to quickly go over updates. We chat about progress,
blockers, and celebrate successes. It’s not particularly different from a scrum of scrums.
With the distributed contributions of every engineer, the task force meeting is not
particularly onerous because they’ve been provided with the information they need ahead of time.
Once the original deadline is passed, the task force reviews unfinished items, reevaluates
items that weren’t prioritized in the first round, and continues to prioritize and push
things down to the teams for the next period. This ensures that harder, but perhaps bigger,
opportunities don’t become forgotten and dropped. Recall that the first pass was about the
best bang-for-the-buck projects. Things will get harder, but with an appropriate balance,
the payoffs should be higher in the next round.
Momentum shouldn’t be lost. Cost saving should not be a one-off initiative because, as
stated, costs tend to accumulate over time. In some sense, it’s easier to
tackle things as they come up; on the other hand, it’s not easy to change culture.
Explicitly maintaining cost reduction as a priority sends a strong message that the value
of “Do more with less” is not empty.
One thing that I’ve noticed at NextRoll that’s been great is that individual engineers
continue to ping me about new opportunities they’ve managed to uncover, even outside of
brainstorming sessions. We add these items to our overall tracking and make sure these
ideas don’t go unnoticed or unprioritized. Sometimes I even get pinged with cost reduction
measures that have already been completed, and I make sure even those things are
documented and tracked to recognize what these engineers have accomplished.
And this is really the attitude that we’re looking for. Yes, it’s a decent amount of
upfront effort to document and coordinate everything. But once the process is in place
to solicit creative ideas from all sides, the tracking mechanisms are in place to measure
accomplishments, and recognition is provided, cost savings becomes less of a chore and
hopefully a rewarding process unto itself.
I hope this provides a glimpse into our cost saving process at NextRoll. Obviously, not
all of these processes are necessarily applicable to your own company or situation. Adapt
as you see fit, but whatever process you decide to implement, I strongly recommend
adhering to the following principles:
Thanks for reading!