AWS Cost Optimization: How We Cut Our Bill by 60%
Practical strategies and real-world tactics for reducing AWS costs without sacrificing performance or reliability.
AWS Cost Optimization: How We Cut Our Bill by 60%
Last year, our AWS bill was $500K/month. After 6 months of focused optimization, we're down to $200K/month with better performance and reliability.
Here's exactly how we did it.
The Problem
Our monthly AWS bill looked like this:
- EC2: $280K (56%)
- RDS: $120K (24%)
- Data Transfer: $60K (12%)
- S3 & Others: $40K (8%)
Total: $500K/month
If you're not tracking your cloud costs weekly, you're already overspending. Set up billing alerts TODAY.
Step 1: Visibility First
You can't optimize what you can't measure. We implemented:
Cost Allocation Tags
1# Tag everything with owner, environment, and project 2aws ec2 create-tags \ 3 --resources i-1234567890abcdef0 \ 4 --tags \ 5 Key=Environment,Value=production \ 6 Key=Project,Value=web-app \ 7 Key=Owner,Value=team-platform
AWS Cost Explorer + Custom Dashboards
We built a custom dashboard showing:
- Cost per service
- Cost per team
- Cost per environment
- Cost trends
Result: Identified that 40% of our spend was on unused development resources.
Step 2: Right-Sizing EC2
We were massively over-provisioned.
Before
- 50x m5.4xlarge instances
- Average CPU: 15%
- Average Memory: 25%
- Cost: $280K/month
Actions Taken
- Analyzed actual usage with CloudWatch for 30 days
- Switched to smaller instances based on real metrics
- Implemented auto-scaling for variable workloads
1# Auto Scaling Policy 2apiVersion: autoscaling/v2 3kind: HorizontalPodAutoscaler 4metadata: 5 name: web-app-hpa 6spec: 7 scaleTargetRef: 8 apiVersion: apps/v1 9 kind: Deployment 10 name: web-app 11 minReplicas: 10 12 maxReplicas: 50 13 metrics: 14 - type: Resource 15 resource: 16 name: cpu 17 target: 18 type: Utilization 19 averageUtilization: 70
After
- 30x m5.xlarge instances
- Average CPU: 65%
- Average Memory: 70%
- Cost: $70K/month
Savings: $210K/month (75% reduction)
Step 3: Reserved Instances & Savings Plans
For predictable workloads, we switched to:
Compute Savings Plans
- 1-year partial upfront: 20% discount
- 3-year all upfront: 54% discount
1# Calculate potential savings 2aws ce get-savings-plans-purchase-recommendation \ 3 --lookback-period-in-days 60 \ 4 --term-in-years ONE_YEAR \ 5 --payment-option PARTIAL_UPFRONT
We committed to 1-year Savings Plans for baseline workload.
Savings: Additional $15K/month
Step 4: RDS Optimization
Our RDS costs were out of control.
What We Found
- Multiple read replicas barely being used
- Over-provisioned instance types
- Inefficient queries causing high I/O
Optimizations
1. Instance Right-Sizing
1-- Identified slow queries 2SELECT * FROM pg_stat_statements 3ORDER BY total_time DESC 4LIMIT 20;
Fixed inefficient queries, then:
Before: db.r5.8xlarge ($4,800/month)
After: db.r5.2xlarge ($1,200/month)
2. Read Replica Consolidation
Before: 5 read replicas
After: 2 read replicas + RDS Proxy
3. Aurora Serverless v2
For non-critical databases:
1resource "aws_rds_cluster" "staging" { 2 engine = "aurora-postgresql" 3 engine_mode = "provisioned" 4 serverlessv2_scaling_configuration { 5 min_capacity = 0.5 6 max_capacity = 1.0 7 } 8}
Savings: $70K/month (58% reduction)
Step 5: Data Transfer Costs
Data transfer was costing us $60K/month. Most of it was avoidable.
Issues Found
- Cross-AZ traffic: Services in different AZs
- Public internet transfer: Not using VPC endpoints
- Inefficient data sync: Full backups instead of incremental
Solutions
1. VPC Endpoints
1resource "aws_vpc_endpoint" "s3" { 2 vpc_id = aws_vpc.main.id 3 service_name = "com.amazonaws.us-east-1.s3" 4 5 tags = { 6 Name = "s3-endpoint" 7 } 8}
Eliminated internet gateway charges for S3 access.
2. Same-AZ Placement
1# Ensure pods are scheduled in the same AZ 2spec: 3 affinity: 4 nodeAffinity: 5 requiredDuringSchedulingIgnoredDuringExecution: 6 nodeSelectorTerms: 7 - matchExpressions: 8 - key: topology.kubernetes.io/zone 9 operator: In 10 values: 11 - us-east-1a
3. CloudFront for Static Assets
1// Before: Serving from S3 directly 2const imageUrl = "https://s3.amazonaws.com/bucket/image.jpg"; 3 4// After: Serving via CloudFront 5const imageUrl = "https://d111111abcdef8.cloudfront.net/image.jpg";
Savings: $35K/month (58% reduction)
Step 6: S3 Storage Optimization
Intelligent Tiering
1aws s3api put-bucket-intelligent-tiering-configuration \ 2 --bucket my-bucket \ 3 --id my-tiering-config \ 4 --intelligent-tiering-configuration '{ 5 "Id": "my-tiering-config", 6 "Status": "Enabled", 7 "Tierings": [ 8 { 9 "Days": 90, 10 "AccessTier": "ARCHIVE_ACCESS" 11 }, 12 { 13 "Days": 180, 14 "AccessTier": "DEEP_ARCHIVE_ACCESS" 15 } 16 ] 17 }'
Lifecycle Policies
1{ 2 "Rules": [ 3 { 4 "Id": "archive-old-logs", 5 "Status": "Enabled", 6 "Transitions": [ 7 { 8 "Days": 30, 9 "StorageClass": "STANDARD_IA" 10 }, 11 { 12 "Days": 90, 13 "StorageClass": "GLACIER" 14 } 15 ], 16 "Expiration": { 17 "Days": 365 18 } 19 } 20 ] 21}
Savings: $8K/month
Step 7: Automation & Enforcement
We built automation to prevent cost creep:
1. Auto-Stop Non-Production Resources
1# Lambda function to stop dev/staging instances at night 2import boto3 3from datetime import datetime 4 5def lambda_handler(event, context): 6 ec2 = boto3.client('ec2') 7 8 # Get instances tagged as non-production 9 response = ec2.describe_instances( 10 Filters=[ 11 {'Name': 'tag:Environment', 'Values': ['dev', 'staging']}, 12 {'Name': 'instance-state-name', 'Values': ['running']} 13 ] 14 ) 15 16 instance_ids = [] 17 for reservation in response['Reservations']: 18 for instance in reservation['Instances']: 19 instance_ids.append(instance['InstanceId']) 20 21 if instance_ids: 22 ec2.stop_instances(InstanceIds=instance_ids) 23 print(f'Stopped {len(instance_ids)} instances')
Schedule: Monday-Friday 7 PM to 7 AM + weekends
Savings: $25K/month
2. Budget Alerts
1aws budgets create-budget \ 2 --account-id 123456789012 \ 3 --budget '{ 4 "BudgetName": "Monthly-Budget", 5 "BudgetLimit": { 6 "Amount": "250000", 7 "Unit": "USD" 8 }, 9 "TimeUnit": "MONTHLY", 10 "BudgetType": "COST" 11 }' \ 12 --notifications-with-subscribers '[ 13 { 14 "Notification": { 15 "NotificationType": "ACTUAL", 16 "ComparisonOperator": "GREATER_THAN", 17 "Threshold": 80, 18 "ThresholdType": "PERCENTAGE" 19 }, 20 "Subscribers": [ 21 { 22 "SubscriptionType": "EMAIL", 23 "Address": "team@company.com" 24 } 25 ] 26 } 27 ]'
Final Results
| Service | Before | After | Savings | | ------------- | --------- | --------- | --------------- | | EC2 | $280K | $70K | $210K (75%) | | RDS | $120K | $50K | $70K (58%) | | Data Transfer | $60K | $25K | $35K (58%) | | S3 | $20K | $12K | $8K (40%) | | Other | $20K | $43K* | -$23K | | Total | $500K | $200K | $300K (60%) |
*Increased monitoring and tooling costs
Key Takeaways
- Visibility is everything - You can't optimize what you don't measure
- Right-sizing is the biggest win - Most resources are over-provisioned
- Commit where predictable - Savings Plans for baseline workload
- Automate cost controls - Don't rely on manual processes
- Review quarterly - Usage patterns change, optimization is ongoing
Pro tip: Assign a "cost owner" for each major service. When someone owns the budget, they optimize it.
Tools We Used
- AWS Cost Explorer: Built-in cost analysis
- Kubecost: Kubernetes cost visibility
- CloudHealth: Multi-cloud cost management
- Custom dashboards: Grafana + CloudWatch metrics
Next Steps for Your Team
- Set up cost allocation tags (Day 1)
- Enable billing alerts (Day 1)
- Analyze usage for 30 days
- Right-size top 5 biggest spenders
- Implement auto-scaling
- Review and repeat monthly
Want help optimizing your AWS costs? Reach out at hello@yourdomain.com
About Sarah Martinez
Cloud Architect specializing in cost optimization. Helped 20+ companies reduce cloud spend while improving performance.