93%

Needs Review

AWS EC2 Right-Sizing: Stop Overpaying for Compute

Dev PatelMarch 20, 20268 min read

Let Me Show You What This Actually Costs

The average company wastes 35% of their EC2 spend on oversized instances. Let me put that in dollars.

Monthly EC2 Spend	Typical Waste (35%)	Annual Waste
$5,000	$1,750	$21,000
$20,000	$7,000	$84,000
$100,000	$35,000	$420,000

That's money you're burning every month because someone chose m5.2xlarge when m5.large would've been fine. Let's fix that.

Step 1: Find the Waste

AWS Cost Explorer Right-Sizing Recommendations

The easiest starting point. AWS already knows which instances are oversized.

aws ce get-rightsizing-recommendation \
  --service "AmazonEC2" \
  --configuration '{
    "RecommendationTarget": "SAME_INSTANCE_FAMILY",
    "BenefitsConsidered": true
  }'

This returns recommendations like: "Your m5.2xlarge averages 12% CPU utilization. Downsize to m5.large and save $156/month."

CloudWatch Metrics Deep Dive

Don't trust recommendations blindly. Check the actual utilization:

aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-0abc123def456 \
  --start-time 2026-03-01T00:00:00Z \
  --end-time 2026-03-20T00:00:00Z \
  --period 3600 \
  --statistics Average Maximum p99

Key metrics to check:

CPU Average < 20% → Almost certainly oversized
CPU p99 < 60% → Safe to downsize
Memory < 40% (requires CloudWatch Agent) → Consider smaller instance
Network < 30% of baseline → Smaller instance handles the traffic

Step 2: Build Your Right-Sizing Plan

Here's the decision framework I use:

Current Utilization	Action	Expected Savings
CPU avg < 10%	Downsize 2 levels (e.g., 2xlarge → large)	60-75%
CPU avg 10-25%	Downsize 1 level	40-50%
CPU avg 25-50%	Consider ARM (Graviton)	20-30%
CPU avg 50-70%	Right-sized, look at Savings Plans	10-20%
CPU avg > 70%	Monitor for headroom issues	0%

The Graviton Play

This one change saved me $14,000/month at my last job.

# Before: x86 instance
resource "aws_instance" "app" {
  instance_type = "m5.xlarge"   # $0.192/hr = $140/month
  ami           = "ami-x86-app"
}

# After: ARM Graviton instance
resource "aws_instance" "app" {
  instance_type = "m7g.xlarge"  # $0.1632/hr = $119/month
  ami           = "ami-arm-app" # ARM-compatible AMI required
}

Savings: ~15% per instance. Graviton instances also deliver 20-30% better performance per dollar. It's not just cheaper — it's faster AND cheaper.

Step 3: Implement Safely

Never right-size in production without a safety net.

Terraform Module for Gradual Right-Sizing

variable "instance_type" {
  description = "EC2 instance type — change this for right-sizing"
  type        = string
  default     = "m5.xlarge"
}

variable "min_healthy_percentage" {
  description = "Minimum healthy instances during resize"
  type        = number
  default     = 90
}

resource "aws_autoscaling_group" "app" {
  name                = "app-asg"
  min_size            = 2
  max_size            = 6
  desired_capacity    = 3

  instance_refresh {
    strategy = "Rolling"
    preferences {
      min_healthy_percentage = var.min_healthy_percentage
    }
  }

  launch_template {
    id      = aws_launch_template.app.id
    version = "$Latest"
  }
}

resource "aws_launch_template" "app" {
  instance_type = var.instance_type
  # ... other config
}

Change instance_type, run terraform apply, and the ASG rolls instances one at a time while maintaining 90% capacity.

Step 4: Enable the CloudWatch Agent for Memory Metrics

CPU is only half the story. AWS doesn't expose memory utilization by default. You need the CloudWatch Agent.

{
  "agent": {
    "metrics_collection_interval": 60,
    "run_as_user": "cwagent"
  },
  "metrics": {
    "namespace": "CWAgent",
    "append_dimensions": {
      "InstanceId": "${aws:InstanceId}",
      "InstanceType": "${aws:InstanceType}",
      "AutoScalingGroupName": "${aws:AutoScalingGroupName}"
    },
    "metrics_collected": {
      "mem": {
        "measurement": ["mem_used_percent", "mem_available_percent"],
        "metrics_collection_interval": 60
      },
      "disk": {
        "measurement": ["disk_used_percent"],
        "metrics_collection_interval": 300,
        "resources": ["*"]
      },
      "net": {
        "measurement": ["bytes_sent", "bytes_recv"],
        "metrics_collection_interval": 60
      }
    }
  }
}

Deploy the agent via SSM for your fleet:

aws ssm send-command \
  --document-name "AWS-ConfigureAWSPackage" \
  --targets '[{"Key":"tag:Environment","Values":["production"]}]' \
  --parameters '{"action":["Install"],"name":["AmazonCloudWatchAgent"]}'

Once memory data flows in, query it alongside CPU:

aws cloudwatch get-metric-statistics \
  --namespace CWAgent \
  --metric-name mem_used_percent \
  --dimensions Name=InstanceId,Value=i-0abc123def456 \
  --start-time 2026-03-01T00:00:00Z \
  --end-time 2026-03-20T00:00:00Z \
  --period 3600 \
  --statistics Average Maximum

Instances running at 15% CPU and 20% memory are wasting 70-80% of their capacity. Without memory data, you're guessing.

Step 5: Right-Size by Workload Type

Different workloads have different right-sizing strategies. Don't apply the same rule everywhere.

Compute-Bound (CI Runners, Batch Jobs)

These spike to 100% CPU during builds and sit idle otherwise. Look at the p99 CPU over a week, not the average.

# Get p99 CPU for a build server over 7 days
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-0abc123def456 \
  --start-time $(date -d '7 days ago' -u +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 86400 \
  --statistics "p99"

If p99 is below 70%, downsize. If it's above 90%, the instance is properly sized — leave it alone.

Memory-Bound (Caches, JVM Apps)

Java apps preallocate heap. The memory usage graph looks flat. Use the r family (memory-optimized) instead of m (general purpose). Moving from m5.xlarge ($0.192/hr) to r5.large ($0.126/hr) gives you the same 16 GiB RAM at 34% less cost.

# Memory-optimized right-sizing
resource "aws_instance" "cache" {
  # Before: general purpose with 16 GiB
  # instance_type = "m5.xlarge"   # 4 vCPU, 16 GiB, $0.192/hr

  # After: memory-optimized with 16 GiB
  instance_type = "r7g.large"   # 2 vCPU, 16 GiB, $0.1008/hr (Graviton)
}

Network-Bound (API Gateways, Proxies)

Check network bandwidth utilization. Each instance type has a baseline network performance. An m5.large provides "Up to 10 Gbps" but the sustained baseline is much lower.

aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name NetworkIn \
  --dimensions Name=InstanceId,Value=i-0abc123def456 \
  --start-time $(date -d '7 days ago' -u +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 3600 \
  --statistics Maximum

If the maximum network throughput is under 30% of the instance's baseline, a smaller instance handles it fine.

Step 6: Automate Ongoing Right-Sizing

One-time right-sizing is good. Continuous right-sizing is better.

#!/bin/bash
# Monthly right-sizing report script
# Save as right-sizing-report.sh and run via cron

REPORT_DATE=$(date +%Y-%m-%d)
OUTPUT_FILE="right-sizing-report-${REPORT_DATE}.csv"

echo "Instance ID,Current Type,Recommended Type,Monthly Savings" > "$OUTPUT_FILE"

aws ce get-rightsizing-recommendation \
  --service "AmazonEC2" \
  --configuration '{"RecommendationTarget":"SAME_INSTANCE_FAMILY","BenefitsConsidered":true}' \
  --query 'RightsizingRecommendations[?RightsizingType==`Downsize`].[CurrentInstance.ResourceDetails.EC2ResourceDetails.InstanceType,ModifyRecommendationDetail.TargetInstances[0].ResourceDetails.EC2ResourceDetails.InstanceType,ModifyRecommendationDetail.TargetInstances[0].EstimatedMonthlySavings]' \
  --output table

# Send to Slack
TOTAL_SAVINGS=$(aws ce get-rightsizing-recommendation \
  --service "AmazonEC2" \
  --configuration '{"RecommendationTarget":"SAME_INSTANCE_FAMILY","BenefitsConsidered":true}' \
  --query 'Summary.TotalRecommendationCount' \
  --output text)

curl -X POST "$SLACK_WEBHOOK_URL" \
  -H 'Content-Type: application/json' \
  -d "{\"text\":\"Monthly Right-Sizing Report: ${TOTAL_SAVINGS} instances have downsizing recommendations. Check #cloud-cost for details.\"}"

Schedule it with cron or a Lambda function:

# Crontab entry — first Monday of every month at 9 AM
0 9 1-7 * 1 /opt/scripts/right-sizing-report.sh

Step 7: Set Up Automated Tagging for Cost Attribution

Right-sizing without cost attribution is flying blind. You need to know which team owns which instances.

# Enforce tagging with AWS Organizations SCP
data "aws_iam_policy_document" "require_tags" {
  statement {
    sid    = "DenyEC2WithoutTags"
    effect = "Deny"
    actions = [
      "ec2:RunInstances"
    ]
    resources = ["arn:aws:ec2:*:*:instance/*"]
    condition {
      test     = "Null"
      variable = "aws:RequestTag/Team"
      values   = ["true"]
    }
    condition {
      test     = "Null"
      variable = "aws:RequestTag/Environment"
      values   = ["true"]
    }
  }
}

No tag, no instance. Teams can't spin up resources without ownership attribution.

Common Pitfalls

Pitfall 1: Right-sizing production without a canary. Never downsize your entire fleet at once. Start with one instance in the ASG. Monitor for 48 hours. Check response times, error rates, and queue depth. Then roll to the rest.

Pitfall 2: Ignoring burst workloads. A batch job that runs for 2 hours at 95% CPU and idles for 22 hours shows 8% average CPU. The average lies. Check the maximum and p99 before downsizing.

Pitfall 3: Forgetting about Savings Plans after right-sizing. If you right-size FROM an instance covered by a Reserved Instance or Savings Plan, you might not save anything until the commitment expires. Check your RI/SP coverage before making changes.

Pitfall 4: Not accounting for headroom. Target 60-70% peak utilization after right-sizing, not 90%. Auto Scaling needs time to react, and your application needs room for traffic spikes. A perfectly right-sized instance with zero headroom is one spike away from degraded performance.

Cost Impact Summary

For a typical 20-instance fleet averaging 25% CPU utilization:

Action	Per-Instance Savings	Fleet Savings/Month	Annual
Downsize 1 level	$50-$80	$1,000-$1,600	$12,000-$19,200
Switch to Graviton	$20-$40	$400-$800	$4,800-$9,600
Combined	$70-$120	$1,400-$2,400	$16,800-$28,800

That's $17K-$29K/year in savings from a single afternoon's work. The ROI on right-sizing is the highest of any cloud optimization activity.

Tools Worth Knowing

Beyond AWS native tools, these help with right-sizing at scale:

AWS Compute Optimizer — ML-based recommendations considering CPU, memory, disk, and network. More accurate than Cost Explorer for complex workloads.
Spot.io (now Spot by NetApp) — Automatic instance selection and right-sizing with spot instance management.
Kubecost — For Kubernetes workloads, shows per-pod resource waste and recommends request/limit changes.

# Enable Compute Optimizer (one-time setup)
aws compute-optimizer update-enrollment-status \
  --status Active \
  --include-member-accounts

# Get recommendations
aws compute-optimizer get-ec2-instance-recommendations \
  --query 'instanceRecommendations[].{InstanceId:instanceArn,Current:currentInstanceType,Recommended:recommendationOptions[0].instanceType,Savings:recommendationOptions[0].projectedUtilizationMetrics}' \
  --output table

Compute Optimizer uses 14 days of CloudWatch data by default. For more accurate results, enable enhanced infrastructure metrics (3-month lookback) for $0.0003 per resource per hour.

Advertise here

aws ec2 right-sizing cost-optimization finops

Was this article helpful?

Dev Patel

Cloud Cost Optimization Specialist

I find the money your cloud is wasting. FinOps practitioner, data-driven analyst, and the person your CFO wishes they'd hired sooner. Every dollar saved is a dollar earned.

Twitter/X LinkedIn

What our experts think

Dev PatelCloud Cost Optimization SpecialistAgrees

Right-sizing is consistently the highest-ROI cost optimization you can do. I've seen teams save 30-40% just by matching instance types to actual utilization data. Start here before buying reservations.

Riku TanakaSRE & Observability EngineerAdds Context

Make sure you're collecting at least 14 days of CPU, memory, and network metrics before making sizing decisions. Spot-checking a single afternoon will mislead you every time.

Asif MuzammilSenior Cloud ArchitectRecommends

Consider Graviton instances while you're right-sizing. ARM-based instances give you better price-performance, so you might be able to drop a size and switch architecture simultaneously.

Cloud Cost OptimizationChapter 2 of 6

Previous Chapter

AWS Cost Optimization Playbook

Next Chapter

Lambda Cost Optimization

Cloud CostDeep DiveIntermediateNeeds Review

The Complete AWS Cost Optimization Playbook: Compute, Storage, Networking, and Reserved Capacity

A data-driven playbook for cutting AWS costs across compute, storage, networking, and reserved capacity with real numbers and actions.

Dev Patel·Mar 23, 2026

15 min read

Cloud CostTutorialIntermediateNeeds Review

Reserved Instances vs Savings Plans: Which to Buy When

A data-driven comparison of AWS Reserved Instances vs Savings Plans — with decision frameworks, break-even math, and real purchase recommendations.

Dev Patel·Mar 22, 2026

9 min read

Cloud CostTutorialBeginnerNeeds Review

AWS Lambda Cost Optimization: Memory Tuning, Provisioned Concurrency, and ARM

Cut your AWS Lambda costs by 40-70% with memory right-sizing, ARM/Graviton migration, and smart provisioned concurrency strategies.

Dev Patel·Mar 22, 2026

8 min read

Cloud CostTutorialBeginnerNeeds Review

S3 Storage Class Optimization: Stop Paying Hot Prices for Cold Data

A practical guide to S3 storage class selection and lifecycle policies — with real dollar figures showing how to cut storage costs by 60-80%.

Dev Patel·Mar 20, 2026

8 min read

Cloud CostTutorialIntermediateNeeds Review

Spot Instances + Kubernetes: Save 60-90% on Compute Without the Drama

A battle-tested guide to running Kubernetes workloads on spot instances — safely, reliably, and at 60-90% less than on-demand pricing.

Dev Patel·Mar 20, 2026

10 min read

Cloud CostTutorialIntermediateNeeds Review

Automated Cloud Cost Anomaly Detection and Alerting

Set up automated cloud cost anomaly detection with AWS Cost Anomaly Detection and custom Lambda monitors to catch runaway spend early.

Dev Patel·Mar 22, 2026

10 min read

More in Cloud Cost

View all →

Cloud CostDeep DiveIntermediate

AWS Data Transfer Costs Exploding: How To Find And Fix Unexpected Cross-Region Traffic Charges

You opened your AWS bill last month and did a double-take. Data transfer charges that were $200 last quarter are now sitting at $1,800. Nothing major chang...

Nabeel Hassan·Apr 29, 2026

14 min read

Cloud CostDeep DiveIntermediateNeeds Review

GCP Committed Use Discounts Vs Sustained Use Discounts: Maximize Savings With Workload Analysis

As platform engineers, we're constantly balancing performance, reliability, and cost optimization. Google Cloud Platform's discount models—Committed Use Di...

Zara Blackwood·Apr 14, 2026

14 min read

Cloud CostTutorialBeginnerNeeds Review

Multi-Cloud Networking: Transit Gateway Patterns for AWS and Azure

How to design multi-cloud connectivity between AWS and Azure using Transit Gateway and Virtual WAN — patterns, pitfalls, and cost tradeoffs.

Asif Muzammil·Mar 29, 2026

6 min read

Cloud CostTutorialIntermediateNeeds Review

Kubecost Setup for Kubernetes Cost Visibility and Showback

Deploy Kubecost for real-time Kubernetes cost monitoring with namespace-level showback, idle cost detection, and actionable Slack alerts.

Dev Patel·Mar 22, 2026

9 min read

Discussion

View all

On this page

AWS EC2 Right-Sizing: Stop Overpaying for Compute

Let Me Show You What This Actually Costs

Step 1: Find the Waste

AWS Cost Explorer Right-Sizing Recommendations

CloudWatch Metrics Deep Dive

Step 2: Build Your Right-Sizing Plan

The Graviton Play

Step 3: Implement Safely

Terraform Module for Gradual Right-Sizing

Step 4: Enable the CloudWatch Agent for Memory Metrics

Step 5: Right-Size by Workload Type

Compute-Bound (CI Runners, Batch Jobs)

Memory-Bound (Caches, JVM Apps)

Network-Bound (API Gateways, Proxies)

Step 6: Automate Ongoing Right-Sizing

Step 7: Set Up Automated Tagging for Cost Attribution

Common Pitfalls

Cost Impact Summary

Tools Worth Knowing

What our experts think

Related Articles

The Complete AWS Cost Optimization Playbook: Compute, Storage, Networking, and Reserved Capacity

Reserved Instances vs Savings Plans: Which to Buy When

AWS Lambda Cost Optimization: Memory Tuning, Provisioned Concurrency, and ARM

S3 Storage Class Optimization: Stop Paying Hot Prices for Cold Data

Spot Instances + Kubernetes: Save 60-90% on Compute Without the Drama

Automated Cloud Cost Anomaly Detection and Alerting

More in Cloud Cost

AWS Data Transfer Costs Exploding: How To Find And Fix Unexpected Cross-Region Traffic Charges

GCP Committed Use Discounts Vs Sustained Use Discounts: Maximize Savings With Workload Analysis

Multi-Cloud Networking: Transit Gateway Patterns for AWS and Azure

Kubecost Setup for Kubernetes Cost Visibility and Showback

Discussion