QuickData CostBuddy: Taming AWS cost proactively

Objective As organizations move to the cloud, budgeting, tracking, and optimizing dollar spending in the cloud is becoming a critical capability. This is universally true for all teams, and especially exemplified in Data Platform teams supporting multiple Analysts and Data Scientists as tenants. This blog post describes our challenges with cost accountability and budgeting as […]

Objective

As organizations move to the cloud, budgeting, tracking, and optimizing dollar spending in the cloud is becoming a critical capability. This is universally true for all teams, and especially exemplified in Data Platform teams supporting multiple Analysts and Data Scientists as tenants. This blog post describes our challenges with cost accountability and budgeting as we transitioned to operate 100% in AWS. We developed a methodical mechanism we have implements to manage cost.

Pain Points

  1. Single View that provides AWS Cost Details for multiple accounts (Dev/Prod) to Management so that we can proactively forecast and manage costs
  2. Trend of AWS Cost — Forecasts/Actuals
  3. Provides AWS cost mapped to accountable tenant leaders (either owning accounts or tenants within the account)
  4. Provides a cost view for accounts
  5. Provides Alerting sent to individual leaders based on their spend-to-budget ratio and daily trajectory
  6. Provides cost roll up of managed services like AWS EMR, Athena based on the accountable analyst
  7. Tracks untagged & underutilized resources

QuickData CostBuddy

There were several alternatives that we explored and to address all the needs for the QuickData Platform we e have implemented QuickData Cost Buddy

  1. CostBuddy gathers the raw data collected from AWS for all the account spends. The data is stored in Redshift. The data represents the dollar spend of each account along with the tags associated with the resources
  2. CostBuddy combines this data with budgets and leader org structures. It tracks alerts w.r.t. spending anomalies based on spend-to-budget ratio as well as several other patterns.
  3. The AWS cost spend is further broken down by tags. Each leader can further diagnose spends on AWS resources. In a multi-tenant environment, alerts can be defined for individual deployments (such as AWS EMR cluster for an individual team, AWS Athena Cost spend by an individual Team).
  4. Each accountable leader can look at the cost for an AWS Account, Group of Accounts, tenants, Tags to manage the cost of their organization, henceforth providing the high level or drill down of the infrastructure cost being incurred in their respective organization.
  5. AWS Trusted Advisor data about resource usage data is collected for the accounts based on the accountable leader/tenant and data is displayed on the Dashboard per accountable leaders to manage resources and take appropriate action on the underutilized resources.
  6. Similar pattern will be leveraged as QuickData Platform Teams onboard other services like AWS QuickSight etc to support Analyst Community.
  7. Consolidate/close unneeded accounts like Logging, CloudTrail, AWS Support Cost)
  8. In case of any drastic increase in cost the alerts will enable leaders so that proactively measures can be taken.

Architecture Diagram

Figure 1: Aggregating AWS spend across accounts

AWS spend by Accountable/Leader

AWS spend by resource tags

Trusted Advisor (Potential Cost Saving opportunity)

Dashboards as Code

  1. The traditional way of managing dashboards through the UI never worked for us when we started to scale because of following reasons :-
  2. Consistency: Traditional ways of managing dashboards through the UI have consistency issues.
  3. Automation: Inability to automate the creation/modification of Dashboards and alerts.
  4. History: Rollback of unintended change(s) was impossible.
  5. However, managing and reviewing changes in huge json files were near to impossible. We wanted a solution that does some basic validations and not just push a faulty json, should manage a state of the current system, easily adaptable and flexible, should support rollbacks.
  6. Hence we decided to move on with terraform. With terraform wavefront provider, we were able to achieve all the goals without any tech debt.
  7. Terraform with git takes care of the consistency.
  8. Declarative nature of terraform gives us more power. We always define the desired state. Terraform takes care of how to achieve it.
  9. Easy to maintain test and production environment with Terraform Workspace.
  10. Code is more readable than nested JSON. This helped review changes easier.
  11. Rollbacks become easier with terraform and git.
  12. We have few educational series blogs on how we created Dashboards as code. Blog series Part I, Part II and Part III
  13. We use Wavefront for our Dashboard, hence we were using terraform-wavefront-provider, where we have also made contributions based on our learning towards Dashboard as code. Similar providers can be found for other vendors as well including Grafana.

Summary

In summary, AWS CostBuddy provides a one-stop for teams to get visibility on the spend as well as track daily spend w.r.t. allocated budget. Moving forward, CostBuddy will be integrated with alerting solutions such as PagerDuty, etc. This will allow account leaders as well as tenant leaders to get notified for different types of alerts.


QuickData CostBuddy: Taming AWS cost proactively was originally published in QuickBooks Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

Source: Intuit