Binocs - EKS infra and 60% cloud savings
Saved 1.8K to 2K USD per month at Binocs (about 60 percent on the relevant line items) via EKS right-sizing, spot instances, RDS tier review, and killing an orphan cluster.
A senior at Binocs asked me to "look at the AWS bill". A week later I had cut 1.8K to 2K USD per month off it, about 60 percent on the line items I touched. Here is the order I did it in, because the order is the lesson.
Step one - Cost Explorer, grouped by service, last 90 days. Not EC2, not EKS, not RDS individually. All of them, ranked by cost. Top three for us were EKS node EC2 (40 percent), RDS (25 percent), data transfer (10 percent). Everything else was noise.
Step two - the orphan cluster. Cost Explorer grouped by cluster tag showed a dev cluster running 3 m5.xlarge nodes that nobody had touched in 90 days. Someone had spun it up for a POC and forgotten. Killed it. 280 USD per month gone in one terraform destroy.
Step three - right-sizing the prod node groups. CloudWatch container insights showed our m5.2xlarge prod nodes running at 18 percent CPU and 35 percent memory across a 7-day window. Right-sized to m5.large for the general workload, m5.xlarge for the LLM workers. Container resource requests went from "whatever the helm chart set" to actually measured values from VPA recommendations. About 700 USD per month saved.
Step four - spot instances for batch. The CIM processing pipeline ran on a separate node group. These are batch jobs, interruption tolerant, perfect for spot. Moved to a mixed instance policy with 80 percent spot and 20 percent on-demand fallback. Saved about 60 percent on that node group, roughly 450 USD per month.
Step five - RDS. We were on db.m5.2xlarge with provisioned IOPS. CloudWatch showed CPU peaking at 22 percent. Dropped to db.m5.xlarge with gp3 storage after a soak test in staging. About 380 USD per month saved.
The thing nobody tells you - cloud cost is the same as any other optimization. Measure first, attack the biggest line item, change one thing, measure again. The temptation is to start with whatever is interesting (Karpenter, ARM migrations, Savings Plans). Start with what is expensive.
The whole audit took a week. The savings compound monthly. Best ROI work I did all year.
Learn more
- Docs
- Docs
- Repo
- DocsKarpenter docsKarpenter