Back to Blog

AWS Cost Optimization: How We Cut Our Bill by 60%

Practical strategies and real-world tactics for reducing AWS costs without sacrificing performance or reliability.

6 min read
AWS Cost Optimization: How We Cut Our Bill by 60%

AWS Cost Optimization: How We Cut Our Bill by 60%

Last year, our AWS bill was $500K/month. After 6 months of focused optimization, we're down to $200K/month with better performance and reliability.

Here's exactly how we did it.

The Problem

Our monthly AWS bill looked like this:

  • EC2: $280K (56%)
  • RDS: $120K (24%)
  • Data Transfer: $60K (12%)
  • S3 & Others: $40K (8%)

Total: $500K/month

⚠️

If you're not tracking your cloud costs weekly, you're already overspending. Set up billing alerts TODAY.

Step 1: Visibility First

You can't optimize what you can't measure. We implemented:

Cost Allocation Tags

1# Tag everything with owner, environment, and project
2aws ec2 create-tags \
3  --resources i-1234567890abcdef0 \
4  --tags \
5    Key=Environment,Value=production \
6    Key=Project,Value=web-app \
7    Key=Owner,Value=team-platform

AWS Cost Explorer + Custom Dashboards

We built a custom dashboard showing:

  • Cost per service
  • Cost per team
  • Cost per environment
  • Cost trends

Result: Identified that 40% of our spend was on unused development resources.

Step 2: Right-Sizing EC2

We were massively over-provisioned.

Before

- 50x m5.4xlarge instances
- Average CPU: 15%
- Average Memory: 25%
- Cost: $280K/month

Actions Taken

  1. Analyzed actual usage with CloudWatch for 30 days
  2. Switched to smaller instances based on real metrics
  3. Implemented auto-scaling for variable workloads
1# Auto Scaling Policy
2apiVersion: autoscaling/v2
3kind: HorizontalPodAutoscaler
4metadata:
5  name: web-app-hpa
6spec:
7  scaleTargetRef:
8    apiVersion: apps/v1
9    kind: Deployment
10    name: web-app
11  minReplicas: 10
12  maxReplicas: 50
13  metrics:
14    - type: Resource
15      resource:
16        name: cpu
17        target:
18          type: Utilization
19          averageUtilization: 70

After

- 30x m5.xlarge instances
- Average CPU: 65%
- Average Memory: 70%
- Cost: $70K/month

Savings: $210K/month (75% reduction)

Step 3: Reserved Instances & Savings Plans

For predictable workloads, we switched to:

Compute Savings Plans

  • 1-year partial upfront: 20% discount
  • 3-year all upfront: 54% discount
1# Calculate potential savings
2aws ce get-savings-plans-purchase-recommendation \
3  --lookback-period-in-days 60 \
4  --term-in-years ONE_YEAR \
5  --payment-option PARTIAL_UPFRONT

We committed to 1-year Savings Plans for baseline workload.

Savings: Additional $15K/month

Step 4: RDS Optimization

Our RDS costs were out of control.

What We Found

  • Multiple read replicas barely being used
  • Over-provisioned instance types
  • Inefficient queries causing high I/O

Optimizations

1. Instance Right-Sizing

1-- Identified slow queries
2SELECT * FROM pg_stat_statements
3ORDER BY total_time DESC
4LIMIT 20;

Fixed inefficient queries, then:

Before: db.r5.8xlarge ($4,800/month)
After:  db.r5.2xlarge ($1,200/month)

2. Read Replica Consolidation

Before: 5 read replicas
After:  2 read replicas + RDS Proxy

3. Aurora Serverless v2

For non-critical databases:

1resource "aws_rds_cluster" "staging" {
2  engine         = "aurora-postgresql"
3  engine_mode    = "provisioned"
4  serverlessv2_scaling_configuration {
5    min_capacity = 0.5
6    max_capacity = 1.0
7  }
8}

Savings: $70K/month (58% reduction)

Step 5: Data Transfer Costs

Data transfer was costing us $60K/month. Most of it was avoidable.

Issues Found

  1. Cross-AZ traffic: Services in different AZs
  2. Public internet transfer: Not using VPC endpoints
  3. Inefficient data sync: Full backups instead of incremental

Solutions

1. VPC Endpoints

1resource "aws_vpc_endpoint" "s3" {
2  vpc_id       = aws_vpc.main.id
3  service_name = "com.amazonaws.us-east-1.s3"
4
5  tags = {
6    Name = "s3-endpoint"
7  }
8}

Eliminated internet gateway charges for S3 access.

2. Same-AZ Placement

1# Ensure pods are scheduled in the same AZ
2spec:
3  affinity:
4    nodeAffinity:
5      requiredDuringSchedulingIgnoredDuringExecution:
6        nodeSelectorTerms:
7          - matchExpressions:
8              - key: topology.kubernetes.io/zone
9                operator: In
10                values:
11                  - us-east-1a

3. CloudFront for Static Assets

1// Before: Serving from S3 directly
2const imageUrl = "https://s3.amazonaws.com/bucket/image.jpg";
3
4// After: Serving via CloudFront
5const imageUrl = "https://d111111abcdef8.cloudfront.net/image.jpg";

Savings: $35K/month (58% reduction)

Step 6: S3 Storage Optimization

Intelligent Tiering

1aws s3api put-bucket-intelligent-tiering-configuration \
2  --bucket my-bucket \
3  --id my-tiering-config \
4  --intelligent-tiering-configuration '{
5    "Id": "my-tiering-config",
6    "Status": "Enabled",
7    "Tierings": [
8      {
9        "Days": 90,
10        "AccessTier": "ARCHIVE_ACCESS"
11      },
12      {
13        "Days": 180,
14        "AccessTier": "DEEP_ARCHIVE_ACCESS"
15      }
16    ]
17  }'

Lifecycle Policies

1{
2  "Rules": [
3    {
4      "Id": "archive-old-logs",
5      "Status": "Enabled",
6      "Transitions": [
7        {
8          "Days": 30,
9          "StorageClass": "STANDARD_IA"
10        },
11        {
12          "Days": 90,
13          "StorageClass": "GLACIER"
14        }
15      ],
16      "Expiration": {
17        "Days": 365
18      }
19    }
20  ]
21}

Savings: $8K/month

Step 7: Automation & Enforcement

We built automation to prevent cost creep:

1. Auto-Stop Non-Production Resources

1# Lambda function to stop dev/staging instances at night
2import boto3
3from datetime import datetime
4
5def lambda_handler(event, context):
6    ec2 = boto3.client('ec2')
7
8    # Get instances tagged as non-production
9    response = ec2.describe_instances(
10        Filters=[
11            {'Name': 'tag:Environment', 'Values': ['dev', 'staging']},
12            {'Name': 'instance-state-name', 'Values': ['running']}
13        ]
14    )
15
16    instance_ids = []
17    for reservation in response['Reservations']:
18        for instance in reservation['Instances']:
19            instance_ids.append(instance['InstanceId'])
20
21    if instance_ids:
22        ec2.stop_instances(InstanceIds=instance_ids)
23        print(f'Stopped {len(instance_ids)} instances')

Schedule: Monday-Friday 7 PM to 7 AM + weekends

Savings: $25K/month

2. Budget Alerts

1aws budgets create-budget \
2  --account-id 123456789012 \
3  --budget '{
4    "BudgetName": "Monthly-Budget",
5    "BudgetLimit": {
6      "Amount": "250000",
7      "Unit": "USD"
8    },
9    "TimeUnit": "MONTHLY",
10    "BudgetType": "COST"
11  }' \
12  --notifications-with-subscribers '[
13    {
14      "Notification": {
15        "NotificationType": "ACTUAL",
16        "ComparisonOperator": "GREATER_THAN",
17        "Threshold": 80,
18        "ThresholdType": "PERCENTAGE"
19      },
20      "Subscribers": [
21        {
22          "SubscriptionType": "EMAIL",
23          "Address": "team@company.com"
24        }
25      ]
26    }
27  ]'

Final Results

| Service | Before | After | Savings | | ------------- | --------- | --------- | --------------- | | EC2 | $280K | $70K | $210K (75%) | | RDS | $120K | $50K | $70K (58%) | | Data Transfer | $60K | $25K | $35K (58%) | | S3 | $20K | $12K | $8K (40%) | | Other | $20K | $43K* | -$23K | | Total | $500K | $200K | $300K (60%) |

*Increased monitoring and tooling costs

Key Takeaways

  1. Visibility is everything - You can't optimize what you don't measure
  2. Right-sizing is the biggest win - Most resources are over-provisioned
  3. Commit where predictable - Savings Plans for baseline workload
  4. Automate cost controls - Don't rely on manual processes
  5. Review quarterly - Usage patterns change, optimization is ongoing

Pro tip: Assign a "cost owner" for each major service. When someone owns the budget, they optimize it.

Tools We Used

  • AWS Cost Explorer: Built-in cost analysis
  • Kubecost: Kubernetes cost visibility
  • CloudHealth: Multi-cloud cost management
  • Custom dashboards: Grafana + CloudWatch metrics

Next Steps for Your Team

  1. Set up cost allocation tags (Day 1)
  2. Enable billing alerts (Day 1)
  3. Analyze usage for 30 days
  4. Right-size top 5 biggest spenders
  5. Implement auto-scaling
  6. Review and repeat monthly

Want help optimizing your AWS costs? Reach out at hello@yourdomain.com

About Sarah Martinez

Cloud Architect specializing in cost optimization. Helped 20+ companies reduce cloud spend while improving performance.