The average organization wastes 32% of its cloud spend. That number comes from the FinOps Foundation's 2025 State of FinOps report, and it has barely improved in three years. The problem is not that engineering teams do not care about costs — it is that cost feedback comes too late, in a monthly bill that nobody can trace back to a specific commit or architecture decision.
FinOps fixes this by shifting cost awareness left, into the development workflow. Not as a gate that slows you down, but as a signal — like type checking or linting — that helps you make informed decisions before they hit production.
This guide is written for developers, not finance teams. We will cover the patterns that waste the most money, the tools that make costs visible at development time, and concrete actions that can cut your cloud bill by 25-30%.
What Is FinOps?
FinOps (Financial Operations) is a cultural practice that brings financial accountability to cloud spending. The FinOps Foundation defines three phases:
Inform → Optimize → Operate
Inform: Make costs visible to the people who generate them
Optimize: Identify and implement savings opportunities
Operate: Continuously track, govern, and improveFor developers, the key insight is this: the people who write the code are the people best positioned to optimize costs. A finance team can tell you that your AWS bill went up 40% last month. Only the engineering team knows that the spike was caused by a missing database index that forced a scale-up event.
The FinOps Lifecycle
| Phase | Developer Role | Tools |
|---|---|---|
| Inform | Tag resources, review cost dashboards | AWS Cost Explorer, Infracost, Kubecost |
| Optimize | Right-size, eliminate waste, use spot/preemptible | Terraform, auto-scaling policies, Karpenter |
| Operate | Set budgets, automate alerts, review in sprint retros | AWS Budgets, Datadog Cost Management, Slack alerts |
The 7 Most Common Cloud Waste Patterns
Before optimizing, you need to know where the money goes. These seven patterns account for the vast majority of cloud waste.
1. Idle Compute Instances
The classic waste pattern. Development and staging environments running 24/7 when they are only used during business hours.
The math:
1 x m5.xlarge (on-demand) = $0.192/hour
Running 24/7 = $140/month
Used 10 hours/day, 5 days/week = ~$42/month of actual usage
Waste: $98/month per instanceMultiply by 20 dev environments and you are burning $23,520/year on idle compute.
Fix it:
# Auto-shutdown dev environments on a schedule
resource "aws_autoscaling_schedule" "scale_down_evening" {
scheduled_action_name = "scale-down-evening"
min_size = 0
max_size = 0
desired_capacity = 0
recurrence = "0 19 * * MON-FRI" # 7 PM weekdays
autoscaling_group_name = aws_autoscaling_group.dev.name
}
resource "aws_autoscaling_schedule" "scale_up_morning" {
scheduled_action_name = "scale-up-morning"
min_size = 1
max_size = 3
desired_capacity = 1
recurrence = "0 7 * * MON-FRI" # 7 AM weekdays
autoscaling_group_name = aws_autoscaling_group.dev.name
}2. Oversized Databases
Teams provision RDS instances based on peak load estimates, then never revisit. A db.r6g.2xlarge ($1.50/hr) running at 15% average CPU utilization is 85% waste.
Fix it: Enable Performance Insights and review actual utilization monthly. Most databases can drop 1-2 instance sizes without impact.
-- Check average CPU over the last 7 days (PostgreSQL)
SELECT
avg(cpu_percent) as avg_cpu,
max(cpu_percent) as peak_cpu,
avg(connections) as avg_connections
FROM pg_stat_activity_history
WHERE timestamp > now() - interval '7 days';If your average CPU is under 30% and peak is under 70%, you can safely downsize.
3. Orphaned Storage
EBS volumes from terminated instances, old snapshots, unused S3 buckets with lifecycle policies that never got set up. This is the "forgotten closet" of cloud spending.
# Find unattached EBS volumes
aws ec2 describe-volumes \
--filters "Name=status,Values=available" \
--query "Volumes[*].{ID:VolumeId,Size:Size,Created:CreateTime}" \
--output table
# Find snapshots older than 90 days
aws ec2 describe-snapshots \
--owner-ids self \
--query "Snapshots[?StartTime<='$(date -d '-90 days' -u +%Y-%m-%dT%H:%M:%S)'].{ID:SnapshotId,Size:VolumeSize,Date:StartTime}" \
--output table4. Data Transfer Costs
The silent killer. AWS charges $0.09/GB for data leaving a region. A service making 100M API calls per month with 10KB average responses generates ~$90/month in transfer costs alone — and that is for a single service.
Common traps:
| Pattern | Cost Impact | Fix |
|---|---|---|
| Cross-region API calls | $0.02/GB | Co-locate services in the same region |
| NAT Gateway data processing | $0.045/GB | Use VPC endpoints for AWS services |
| CloudFront to origin | $0.00/GB (free) | Put a CDN in front of everything public |
| S3 cross-region replication | $0.02/GB | Question whether you actually need it |
5. Over-Provisioned Kubernetes
Kubernetes resource requests are promises, not usage. Teams set requests.cpu: "2" and requests.memory: "4Gi" as defaults, and the scheduler reserves that capacity whether it is used or not.
# Over-provisioned (common default)
resources:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu: "4"
memory: "8Gi"
# Right-sized (based on actual P95 usage)
resources:
requests:
cpu: "250m"
memory: "512Mi"
limits:
cpu: "1"
memory: "1Gi"Use the Vertical Pod Autoscaler (VPA) in recommendation mode to see what your pods actually need:
# Install VPA
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/download/vpa-1.2.0/vpa-v1.2.0.yaml
# Check recommendations
kubectl get vpa -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.recommendation.containerRecommendations[*].target}{"\n"}{end}'6. Uncompressed Logs and Metrics
CloudWatch Logs ingestion costs $0.50/GB. A chatty microservice logging every request body at DEBUG level can generate 50+ GB/month — that is $25/month per service just for log ingestion, not counting storage.
Fix it:
// Bad: Logging full request/response bodies
app.use((req, res, next) => {
logger.info('Request received', {
body: req.body, // Could be megabytes
headers: req.headers, // Includes auth tokens
query: req.query,
})
next()
})
// Good: Log what matters, sample the rest
app.use((req, res, next) => {
logger.info('Request received', {
method: req.method,
path: req.path,
contentLength: req.headers['content-length'],
requestId: req.headers['x-request-id'],
})
// Sample detailed logging at 1%
if (Math.random() < 0.01) {
logger.debug('Request detail (sampled)', {
body: truncate(JSON.stringify(req.body), 1000),
query: req.query,
})
}
next()
})7. No Spot/Preemptible Usage
On-demand instances are the most expensive option. Spot instances are 60-90% cheaper for fault-tolerant workloads.
| Workload Type | Spot Suitable? | Typical Savings |
|---|---|---|
| Stateless web servers (behind LB) | Yes | 70-80% |
| Batch processing / data pipelines | Yes | 80-90% |
| CI/CD runners | Yes | 70-80% |
| Databases | No | Use Reserved Instances |
| Single-replica stateful services | No | Use Savings Plans |
Integrating Cost Awareness Into Your Workflow
Infracost: Cost Estimates in Pull Requests
Infracost analyzes Terraform changes and posts cost estimates directly in your pull requests. This is the single highest-impact FinOps tool for developers.
# Install
brew install infracost
# Authenticate
infracost auth login
# Generate a cost breakdown
infracost breakdown --path .
# Compare against the current state
infracost diff --path .The output looks like this:
Project: my-terraform-project
+ aws_instance.api
+$140/month
+ Instance usage (Linux/UNIX, on-demand, m5.xlarge)
+$140/month
+ aws_rds_instance.main
+$380/month
+ Database instance (db.r6g.xlarge, Multi-AZ)
+$380/month
+ aws_s3_bucket.assets
~$0.023/GB/month (usage-based)
Monthly cost will increase by $520 (+23%)GitHub Actions Integration
# .github/workflows/infracost.yml
name: Infracost
on:
pull_request:
paths:
- 'terraform/**'
jobs:
infracost:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
steps:
- uses: actions/checkout@v4
- name: Setup Infracost
uses: infracost/actions/setup@v3
with:
api-key: ${{ secrets.INFRACOST_API_KEY }}
- name: Generate cost diff
run: |
infracost diff \
--path=terraform/ \
--format=json \
--out-file=/tmp/infracost.json
- name: Post PR comment
uses: infracost/actions/comment@v3
with:
path: /tmp/infracost.json
behavior: updateNow every Terraform PR gets an automatic cost estimate. Engineers see the financial impact before merging.
Kubecost for Kubernetes
If you run Kubernetes, Kubecost gives you per-namespace, per-deployment, and per-pod cost allocation.
# Install via Helm
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer \
--namespace kubecost \
--create-namespace \
--set kubecostToken="your-token"Kubecost exposes an API you can integrate into your CI pipeline:
# Get cost by namespace for the last 7 days
curl -s "http://kubecost.internal/model/allocation?window=7d&aggregate=namespace" | \
jq '.data[0] | to_entries[] | {namespace: .key, cost: .value.totalCost}'Output:
{"namespace": "production", "cost": 1247.83}
{"namespace": "staging", "cost": 892.14}
{"namespace": "dev-team-alpha", "cost": 456.21}
{"namespace": "monitoring", "cost": 234.67}Managing AI and LLM API Costs
This is the fastest-growing cost category in 2026. Teams adopt LLM APIs, usage grows organically, and suddenly there is a $15,000 line item on the monthly bill.
Cost Per Token Comparison (March 2026)
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Speed |
|---|---|---|---|
| GPT-4o | $2.50 | $10.00 | Fast |
| GPT-4o mini | $0.15 | $0.60 | Fast |
| Claude Sonnet 4 | $3.00 | $15.00 | Fast |
| Claude Haiku 3.5 | $0.80 | $4.00 | Very fast |
| Gemini 2.0 Flash | $0.10 | $0.40 | Very fast |
| Local Qwen 3.5 9B | $0.00 | $0.00 | Moderate |
The Model Routing Pattern
Do not send every request to your most expensive model. Route based on task complexity:
type TaskComplexity = 'simple' | 'moderate' | 'complex'
interface CostAwareRouter {
selectModel(task: {
complexity: TaskComplexity
inputTokens: number
requiresAccuracy: boolean
}): string
}
const router: CostAwareRouter = {
selectModel(task) {
// Simple tasks: classification, extraction, formatting
// Use the cheapest model that meets quality requirements
if (task.complexity === 'simple') {
return task.inputTokens > 10000
? 'gemini-2.0-flash' // Cheapest for high volume
: 'gpt-4o-mini'
}
// Moderate tasks: summarization, code review, Q&A
if (task.complexity === 'moderate') {
return task.requiresAccuracy
? 'claude-sonnet-4' // Best accuracy/cost ratio
: 'gpt-4o-mini'
}
// Complex tasks: multi-step reasoning, creative writing, analysis
return 'claude-sonnet-4'
}
}Caching LLM Responses
Many LLM calls are repeated with identical or near-identical inputs. A semantic cache can save 30-50% on API costs.
import { createHash } from 'crypto'
class LLMCache {
private cache: Map<string, { response: string; timestamp: number }>
constructor(private ttlMs: number = 3600000) { // 1 hour default
this.cache = new Map()
}
private getKey(model: string, messages: Message[]): string {
const content = JSON.stringify({ model, messages })
return createHash('sha256').update(content).digest('hex')
}
async query(
model: string,
messages: Message[],
callApi: () => Promise<string>
): Promise<{ response: string; cached: boolean }> {
const key = this.getKey(model, messages)
const cached = this.cache.get(key)
if (cached && Date.now() - cached.timestamp < this.ttlMs) {
return { response: cached.response, cached: true }
}
const response= await callApi()
this.cache.set(key, { response, timestamp: Date.now() })
return { response, cached: false }
}
}
// Usage
const cache= new LLMCache()
const { response, cached }= await cache.query(
'gpt-4o-mini',
messages,
()=> openai.chat.completions.create({ model: 'gpt-4o-mini', messages })
.then(r=> r.choices[0].message.content!)
)
console.log(`Response ${cached ? '(cached)' : '(live)'}: ${response}`)Token Budget Controls
Set hard limits to prevent runaway costs:
class TokenBudget {
private usage: Map<string, number> = new Map()
constructor(
private dailyLimit: number,
private alertThreshold: number = 0.8
) {}
async checkBudget(team: string, estimatedTokens: number): Promise<boolean> {
const today = new Date().toISOString().split('T')[0]
const key = `${team}:${today}`
const currentUsage = this.usage.get(key) || 0
if (currentUsage + estimatedTokens > this.dailyLimit) {
await this.notifyOverBudget(team, currentUsage)
return false
}
if (currentUsage + estimatedTokens > this.dailyLimit * this.alertThreshold) {
await this.notifyApproachingLimit(team, currentUsage)
}
this.usage.set(key, currentUsage + estimatedTokens)
return true
}
private async notifyOverBudget(team: string, usage: number) {
// Send Slack alert, log to monitoring, etc.
console.warn(`Team ${team} exceeded daily token budget: ${usage}/${this.dailyLimit}`)
}
private async notifyApproachingLimit(team: string, usage: number) {
console.warn(`Team ${team} at ${Math.round(usage/this.dailyLimit*100)}% of daily budget`)
}
}
// 1M tokens/day budget per team
const budget = new TokenBudget(1_000_000)
Quick Wins: The 25-30% Savings Playbook
Here are the highest-impact actions ranked by effort vs savings.
Tier 1: Do This Week (5-15% savings)
| Action | Effort | Savings | How |
|---|---|---|---|
| Delete orphaned resources | 1 hour | 3-5% | Run AWS Trusted Advisor or aws ec2 describe-volumes --filters "Name=status,Values=available" |
| Schedule dev/staging shutdowns | 2 hours | 5-8% | Auto-scaling schedules or Lambda functions |
| Enable S3 Intelligent-Tiering | 30 min | 1-2% | One bucket policy change |
| Compress CloudWatch logs | 1 hour | 1-2% | Set log retention policies (most teams keep logs forever by default) |
Tier 2: Do This Month (10-15% savings)
| Action | Effort | Savings | How |
|---|---|---|---|
| Right-size compute instances | 1-2 days | 5-8% | Review AWS Compute Optimizer recommendations |
| Right-size RDS instances | 1 day | 3-5% | Check Performance Insights, downsize underutilized instances |
| Buy Reserved Instances / Savings Plans | 2 hours | 5-10% | Commit to 1-year terms for stable workloads |
| Implement LLM model routing | 1-2 days | 2-5% | Route simple tasks to cheaper models |
Tier 3: Do This Quarter (sustained savings)
| Action | Effort | Savings | How |
|---|---|---|---|
| Adopt Spot instances for stateless workloads | 1 week | 5-10% | Karpenter for K8s, mixed ASGs for EC2 |
| Install Infracost in CI | 2 hours | Prevents future waste | GitHub Action + Terraform |
| Deploy Kubecost | 1 day | 5-10% (K8s only) | Helm install + team dashboards |
| Set up cost anomaly alerts | 2 hours | Prevents surprises | AWS Cost Anomaly Detection |
The Savings Waterfall
Current monthly spend: $50,000
───────
Tier 1: Orphaned resources + scheduling -$4,000
Tier 2: Right-sizing + reservations -$6,000
Tier 3: Spot instances + Kubecost optimization -$4,000
───────
Optimized monthly spend: $36,000
───────
Annual savings: $168,000That is real money. For a startup, it could mean 2-3 extra months of runway. For an enterprise, it frees budget for actual product development.
Building a Cost-Aware Culture
Tools alone do not fix the problem. You need cultural change.
1. Make Costs Visible
Post a weekly cost dashboard in your team's Slack channel. When people see the numbers, they start caring.
# Simple weekly cost report script
#!/bin/bash
COST=$(aws ce get-cost-and-usage \
--time-period Start=$(date -d '7 days ago' +%Y-%m-%d),End=$(date +%Y-%m-%d) \
--granularity DAILY \
--metrics BlendedCost \
--query "ResultsByTime[*].Total.BlendedCost.Amount" \
--output text | awk '{sum+=$1} END {printf "%.2f", sum}')
curl -X POST "$SLACK_WEBHOOK" \
-H 'Content-type: application/json' \
-d "{\"text\":\"Cloud spend this week: \$${COST}\"}"2. Add Cost to Your Definition of Done
A feature is not "done" when it passes code review. It is done when:
- It works correctly
- It has tests
- It has monitoring
- It has appropriate resource sizing and cost tags
3. Review Costs in Sprint Retros
Add a 5-minute cost review to your sprint retrospective. Show the cost trend, highlight any spikes, and celebrate wins.
4. Tag Everything
Without tags, you cannot attribute costs to teams, projects, or features. Enforce tagging via Terraform policies:
# Require cost tags on all resources
variable "required_tags" {
type = map(string)
default = {
team = ""
project = ""
environment = ""
cost-center = ""
}
}
resource "aws_instance" "example" {
# ... instance config ...
tags = merge(var.required_tags, {
team = "platform"
project = "api-gateway"
environment = "production"
cost-center = "engineering"
Name = "api-gateway-prod"
})
}Use an OPA (Open Policy Agent) or Sentinel policy to reject Terraform plans that are missing required tags.
The Cost of Ignoring Costs
Here is the uncomfortable truth: most engineering teams treat cloud costs as someone else's problem until there is a crisis. Then it is a fire drill — "cut 30% by end of quarter" — which leads to rushed decisions, broken services, and frustrated engineers.
FinOps is the alternative. Small, continuous improvements. Cost as a first-class engineering metric, like latency or error rate. Not a constraint that slows you down, but a signal that makes you better.
The tools exist. The playbook is proven. The hardest part is starting.
Pick one action from Tier 1. Do it today. Then pick another one next week. In three months, you will wonder why you did not start sooner.