The Problem
Clever’s infrastructure is complex, heavily relies on autoscaling services, and as an EdTech platform their usage fluctuates significantly across days, weeks, and months. It’s difficult to understand cost changes as they relate back to individual engineering teams, especially when it can be tricky to isolate whether cost changes are to do seasonal usage fluctuations or new applications being launched.
Cloud cost efficiency is a bottoms up demand and a top down mandate at Clever. Bottoms up, engineering owners want to take more ownership of their cloud cost footprint and improve their cost efficiency to help the business. Clever has a “you build it, you run it” engineering culture and they want this to extend to cloud cost ownership. Top down, finance has infrastructure optimization targets they work with Engineering to accomplish.
The Solution
Clever had been using AWS’s Cost Explorer, AWS’s CUDOS dashboard, and had also started writing custom queries on the raw CUR file. These tools fell short for three main reasons:
- Don’t proactively tell you what you should pay attention to - i.e. they won’t ping you when something needs attention
- When you do see an unexpected change, root cause analysis of the change driver is inconvenient
- They lack context - they’re useful for telling you high level absolute numbers but not for understanding whether it’s a reasonable amount to be spending over time
Using Cloudthread’s CloudFormation Stack integration, Clever’s team could immediately create Custom Cost Views that map cloud spend to their engineering ownership and Reports to engage these teams with weekly overviews of spend, weekly/daily anomalies, and drill down - making root cause analysis for engineering owners quick and easy. Using Cloudthread’s Unit Metric Library, Clever adopted the normalized engineering unit metric “Team $ / ELB Request” to track and benchmark each team’s application cost efficiency.
The Results
- Proactively surface weekly and daily anomalies to quickly eliminate inefficiencies. Each application has a dedicated Report to receive weekly overviews of spend and any weekly/daily “Movers and Shakers”.
- Simple, efficient root cause analysis with drill down. Reports have a one-click breakdown and Cloudthread’s Drill Down feature makes root cause analysis for engineering owners quick and easy.
- Uncovering ECS savings insights. Cloudthread monitors CPU and memory to surface containers with low utilization, automatically suggesting an appropriate replacement, and surfacing what the monthly savings would be.
- The engineering unit metric “Team $ / ELB Request” to track and benchmark each team’s application cost efficiency. Regardless of new applications launched and usage fluctuations, Clever has a normalized metric to track cloud cost efficiency.
- Coming soon: Clever will track $ / login and $ / daily active user using Cloudthread’s newly launched Custom Denominator API. These will be used to track infrastructure cost reduction KPIs and forecast infrastructure costs based on forecasted business growth.