82%

Needs Review

The Complete AWS Cost Optimization Playbook: Compute, Storage, Networking, and Reserved Capacity

Dev PatelMarch 23, 202615 min read

The Number That Should Scare You

The average AWS customer wastes 32% of their cloud spend. Not my opinion — that's data from multiple FinOps Foundation studies. For a company spending $50,000/month, that's $192,000 per year set on fire.

I've run cost optimization engagements across dozens of organizations, from startups burning through runway to enterprises with seven-figure monthly bills. The savings are always there. Every single time. And they're usually larger than anyone expected.

This playbook is the complete system I use. We're covering every major cost category, from the obvious wins to the optimizations that require real engineering effort. Every recommendation includes the expected savings range so you can prioritize.

Before You Optimize: Build Visibility

You can't optimize what you can't see. Before touching anything, set up cost allocation.

Tagging Strategy

Every resource needs at minimum these tags:

# Enforce required tags with AWS Config
aws configservice put-config-rule --config-rule '{
  "ConfigRuleName": "required-tags",
  "Source": {
    "Owner": "AWS",
    "SourceIdentifier": "REQUIRED_TAGS"
  },
  "InputParameters": "{\"tag1Key\":\"Environment\",\"tag2Key\":\"Team\",\"tag3Key\":\"Service\",\"tag4Key\":\"CostCenter\"}"
}'

Cost and Usage Report

Enable CUR with hourly granularity. This is your single source of truth.

aws cur put-report-definition --report-definition '{
  "ReportName": "hourly-cost-report",
  "TimeUnit": "HOURLY",
  "Format": "Parquet",
  "Compression": "Parquet",
  "AdditionalSchemaElements": ["RESOURCES", "SPLIT_COST_ALLOCATION_DATA"],
  "S3Bucket": "your-cur-bucket",
  "S3Region": "us-east-1",
  "S3Prefix": "cur",
  "RefreshClosedReports": true,
  "ReportVersioning": "OVERWRITE_REPORT"
}'

Query your CUR data with Athena to find waste:

-- Top 20 most expensive resources last 30 days
SELECT
  line_item_resource_id,
  product_product_name,
  SUM(line_item_unblended_cost) AS total_cost,
  MAX(resource_tags_user_team) AS team
FROM cur_database.cur_table
WHERE line_item_usage_start_date >= date_add('day', -30, current_date)
  AND line_item_line_item_type = 'Usage'
GROUP BY 1, 2
ORDER BY total_cost DESC
LIMIT 20;

Category 1: Compute (Typically 50-60% of Spend)

EC2 Right-Sizing — Expected Savings: 20-40%

Most instances are oversized. Here's how to find them systematically.

# Get right-sizing recommendations
aws ce get-rightsizing-recommendation \
  --service "AmazonEC2" \
  --configuration '{
    "RecommendationTarget": "SAME_INSTANCE_FAMILY",
    "BenefitsConsidered": true
  }' \
  --query 'RightsizingRecommendations[*].{
    Instance: CurrentInstance.ResourceId,
    Current: CurrentInstance.InstanceType,
    Recommended: ModifyRecommendationDetail.TargetInstances[0].EstimatedMonthlyCost,
    Savings: ModifyRecommendationDetail.TargetInstances[0].EstimatedMonthlySavings
  }' \
  --output table

For deeper analysis, pull CloudWatch metrics:

# Find instances with avg CPU < 10% over 14 days
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-0123456789abcdef0 \
  --start-time $(date -d '14 days ago' -u +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 3600 \
  --statistics Average Maximum \
  --query 'Datapoints[*].[Timestamp,Average,Maximum]' \
  --output table

Rules I follow:

Average CPU < 10% for 14 days: downsize by 50%.
Average CPU 10-30%: downsize one instance size.
Memory utilization requires the CloudWatch agent — install it everywhere.
Peak utilization matters. Check the p99, not just the average.

Graviton Migration — Expected Savings: 20%

AWS Graviton (ARM) instances are 20% cheaper and often faster than x86 equivalents. The migration is straightforward for most workloads.

x86 Instance	Graviton Equivalent	Monthly Savings (on-demand)
m5.xlarge ($140)	m7g.xlarge ($113)	$27 (19%)
c5.2xlarge ($248)	c7g.2xlarge ($199)	$49 (20%)
r5.4xlarge ($731)	r7g.4xlarge ($590)	$141 (19%)

# Identify instances eligible for Graviton migration
aws ec2 describe-instances \
  --filters "Name=instance-type,Values=m5.*,m6i.*,c5.*,c6i.*,r5.*,r6i.*" \
  --query 'Reservations[*].Instances[*].{
    ID: InstanceId,
    Type: InstanceType,
    Name: Tags[?Key==`Name`].Value | [0]
  }' --output table

Spot Instances for Fault-Tolerant Workloads — Expected Savings: 60-90%

Spot gives you 60-90% off on-demand prices. Use it for anything that can handle interruptions.

# EKS managed node group with Spot
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: production
  region: us-east-1

managedNodeGroups:
  - name: spot-workers
    instanceTypes:
      - m5.large
      - m5a.large
      - m5d.large
      - m6i.large
      - m7g.large
    spot: true
    desiredCapacity: 5
    minSize: 2
    maxSize: 20
    labels:
      workload-type: fault-tolerant
    taints:
      - key: spot
        value: "true"
        effect: NoSchedule

Golden rule: never run Spot with a single instance type. Use at least 4-6 types across multiple sizes and families. Diversification reduces interruption rates dramatically.

Lambda Optimization — Expected Savings: 30-50%

# Find over-provisioned Lambda functions using AWS Cost Optimization Hub
aws cost-optimization-hub list-recommendations \
  --filter '{
    "resourceTypes": ["Lambda"],
    "actionTypes": ["Rightsize"]
  }' \
  --query 'items[*].{
    Function: resourceId,
    CurrentCost: currentResourceSummary.monthlyCost,
    RecommendedCost: recommendedResourceSummary.monthlyCost,
    Savings: estimatedMonthlySavings.value
  }' --output table

Power-tune every function with the AWS Lambda Power Tuning tool:

# Deploy the power tuning Step Function
aws serverlessrepo create-cloud-formation-change-set \
  --application-id arn:aws:serverlessrepo:us-east-1:451282441545:applications/aws-lambda-power-tuning \
  --stack-name lambda-power-tuning \
  --capabilities CAPABILITY_IAM

Category 2: Storage (Typically 15-25% of Spend)

S3 Lifecycle Policies — Expected Savings: 40-70%

Most S3 data is accessed once and then sits in Standard tier forever. Fix this with lifecycle rules.

{
  "Rules": [
    {
      "ID": "intelligent-tiering-and-archive",
      "Status": "Enabled",
      "Filter": { "Prefix": "" },
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "INTELLIGENT_TIERING"
        },
        {
          "Days": 90,
          "StorageClass": "GLACIER_INSTANT_RETRIEVAL"
        },
        {
          "Days": 365,
          "StorageClass": "DEEP_ARCHIVE"
        }
      ],
      "NoncurrentVersionTransitions": [
        {
          "NoncurrentDays": 30,
          "StorageClass": "GLACIER_INSTANT_RETRIEVAL"
        }
      ],
      "NoncurrentVersionExpiration": {
        "NoncurrentDays": 90
      },
      "AbortIncompleteMultipartUpload": {
        "DaysAfterInitiation": 7
      }
    }
  ]
}

The AbortIncompleteMultipartUpload rule is money you're throwing away right now. Incomplete multipart uploads accumulate silently and cost real money.

# Find incomplete multipart uploads across all buckets
for bucket in $(aws s3api list-buckets --query 'Buckets[*].Name' --output text); do
  count=$(aws s3api list-multipart-uploads --bucket "$bucket" \
    --query 'length(Uploads)' --output text 2>/dev/null)
  if [ "$count" != "None" ] && [ "$count" -gt 0 ]; then
    echo "$bucket: $count incomplete uploads"
  fi
done

EBS Optimization — Expected Savings: 20-40%

# Find unattached EBS volumes (you're paying for these right now)
aws ec2 describe-volumes \
  --filters "Name=status,Values=available" \
  --query 'Volumes[*].{
    ID: VolumeId,
    Size: Size,
    Type: VolumeType,
    Created: CreateTime
  }' --output table

# Find volumes with low IOPS utilization (candidates for gp3 migration)
# gp3 is cheaper than gp2 in every scenario
aws ec2 describe-volumes \
  --filters "Name=volume-type,Values=gp2" \
  --query 'Volumes[*].{
    ID: VolumeId,
    Size: Size,
    Cost: "Migrate to gp3 for 20% savings"
  }' --output table

Every gp2 volume should be gp3. No exceptions. gp3 gives you 3000 IOPS and 125 MB/s baseline for 20% less money. The migration is online and zero-downtime:

aws ec2 modify-volume --volume-id vol-0123456789abcdef0 --volume-type gp3

Category 3: Networking (The Hidden Cost Monster)

NAT Gateway — Expected Savings: 50-80%

NAT Gateway charges $0.045/GB for data processing plus $0.045/hour. For a cluster doing heavy pulls from the internet, this adds up fast.

# Find NAT Gateway costs
aws ce get-cost-and-usage \
  --time-period Start=2026-02-01,End=2026-03-01 \
  --granularity MONTHLY \
  --filter '{
    "Dimensions": {
      "Key": "USAGE_TYPE",
      "Values": ["NatGateway-Bytes"]
    }
  }' \
  --metrics "UnblendedCost" \
  --query 'ResultsByTime[0].Total.UnblendedCost'

Optimizations:

Use VPC endpoints for S3, DynamoDB, ECR, and other AWS services. This removes NAT Gateway from the path entirely.
Deploy NAT Gateway in one AZ and route through it for non-critical traffic.
Consider NAT instances (Fck-NAT or a t4g.nano) for dev/staging environments.

# Create VPC endpoints for common services (free for Gateway endpoints)
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-12345678 \
  --service-name com.amazonaws.us-east-1.s3 \
  --route-table-ids rtb-12345678

aws ec2 create-vpc-endpoint \
  --vpc-id vpc-12345678 \
  --service-name com.amazonaws.us-east-1.dynamodb \
  --route-table-ids rtb-12345678

Cross-AZ Data Transfer — Expected Savings: 10-20%

Every byte that crosses an AZ boundary costs $0.01/GB in each direction. For services communicating heavily across AZs, this adds up.

# Check cross-AZ transfer costs
aws ce get-cost-and-usage \
  --time-period Start=2026-02-01,End=2026-03-01 \
  --granularity MONTHLY \
  --filter '{
    "Dimensions": {
      "Key": "USAGE_TYPE",
      "Values": ["DataTransfer-Regional-Bytes"]
    }
  }' \
  --metrics "UnblendedCost"

Use topology-aware routing in Kubernetes to keep traffic within AZs:

apiVersion: v1
kind: Service
metadata:
  name: my-service
  annotations:
    service.kubernetes.io/topology-mode: Auto
spec:
  selector:
    app: my-app
  ports:
    - port: 80

Category 4: Reserved Capacity — Expected Savings: 30-72%

Savings Plans vs Reserved Instances

Commitment Type	Flexibility	Discount	Best For
Compute Savings Plans	Any instance, any region	Up to 66%	Most teams
EC2 Instance Savings Plans	Specific instance family, any size	Up to 72%	Stable workloads
Reserved Instances (Standard)	Specific instance type and AZ	Up to 72%	Very predictable usage
Reserved Instances (Convertible)	Can change instance type	Up to 66%	Evolving workloads

My recommendation: Start with Compute Savings Plans. They cover EC2, Fargate, and Lambda, and you can change instance types freely. Only go to EC2-specific RIs when you have 6+ months of stable usage data.

# Analyze your commitment coverage
aws ce get-savings-plans-coverage \
  --time-period Start=2026-02-01,End=2026-03-01 \
  --granularity MONTHLY \
  --query 'SavingsPlansCoverages[0].{
    OnDemandCost: Coverage.OnDemandCost,
    CoveredCost: Coverage.SpendCoveredBySavingsPlans,
    CoveragePercent: Coverage.CoveragePercentage
  }'

# Get purchase recommendations
aws ce get-savings-plans-purchase-recommendation \
  --savings-plans-type "COMPUTE_SP" \
  --term-in-years "ONE_YEAR" \
  --payment-option "NO_UPFRONT" \
  --lookback-period-in-days "THIRTY_DAYS"

The 80/20 Commitment Rule

Never commit to 100% of your usage. Here's the rule I follow:

80% of baseline: Covered by 1-year Savings Plans (No Upfront).
Next 15%: On-demand, evaluated quarterly for additional commitments.
Top 5% (peaks): Spot or on-demand.

This gives you the bulk of the savings without locking yourself into capacity you might not need after a re-architecture.

Category 5: Database Optimization — Expected Savings: 20-50%

RDS Right-Sizing

Database instances are the most commonly oversized resources I encounter. Teams provision for peak load and never revisit.

# Check RDS instance utilization
aws cloudwatch get-metric-statistics \
  --namespace AWS/RDS \
  --metric-name CPUUtilization \
  --dimensions Name=DBInstanceIdentifier,Value=production-db \
  --start-time $(date -d '14 days ago' -u +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 3600 \
  --statistics Average Maximum \
  --output table

# Check freeable memory (if consistently > 50% of total, downsize)
aws cloudwatch get-metric-statistics \
  --namespace AWS/RDS \
  --metric-name FreeableMemory \
  --dimensions Name=DBInstanceIdentifier,Value=production-db \
  --start-time $(date -d '14 days ago' -u +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 3600 \
  --statistics Average Minimum \
  --output table

Aurora Serverless v2 for Variable Workloads

If your database usage swings significantly between peak and off-peak, Aurora Serverless v2 can reduce costs by 40-60% compared to provisioned instances:

# Modify existing Aurora cluster to use Serverless v2
aws rds modify-db-instance \
  --db-instance-identifier production-db-instance-1 \
  --db-instance-class db.serverless \
  --apply-immediately

# Set capacity range
aws rds modify-db-cluster \
  --db-cluster-identifier production-cluster \
  --serverless-v2-scaling-configuration MinCapacity=2,MaxCapacity=64

The MinCapacity is your floor — you always pay for at least this many ACUs. Set it to handle your baseline traffic, and let the scaling handle peaks. I've seen teams save $3,000-$5,000/month per cluster by switching from a db.r6g.4xlarge to Aurora Serverless v2 with a 4-32 ACU range.

DynamoDB On-Demand vs Provisioned

# Check your DynamoDB table's consumed capacity
aws dynamodb describe-table --table-name UserSessions \
  --query 'Table.{
    BillingMode: BillingModeSummary.BillingMode,
    ReadCapacity: ProvisionedThroughput.ReadCapacityUnits,
    WriteCapacity: ProvisionedThroughput.WriteCapacityUnits,
    ItemCount: ItemCount,
    TableSize: TableSizeBytes
  }'

Rules I follow:

Consistent traffic (less than 2x variance peak to trough): Use provisioned with auto-scaling. Add reserved capacity for the baseline.
Spiky traffic (more than 4x variance): Use on-demand. The per-request price is higher but you don't pay for unused capacity.
New tables with unknown traffic: Start on-demand, switch to provisioned once you have 2 weeks of data.

Category 6: Container and Serverless Optimization — Expected Savings: 25-40%

EKS Node Right-Sizing with Karpenter

Kubernetes clusters are often running nodes far larger than needed. Karpenter provides right-sized, just-in-time node provisioning:

# Karpenter NodePool for cost-optimized provisioning
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: cost-optimized
spec:
  template:
    spec:
      requirements:
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["m", "c", "r"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["6"]
        - key: kubernetes.io/arch
          operator: In
          values: ["arm64"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
  limits:
    cpu: "200"
    memory: 800Gi
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30s

Karpenter's consolidation feature automatically replaces underutilized nodes with smaller ones. I've seen this reduce node costs by 30-40% compared to static node groups.

Lambda Right-Sizing with Power Tuning

Most Lambda functions are either over-provisioned (wasting money) or under-provisioned (slow and still wasting money because they take longer to execute). The AWS Lambda Power Tuning tool runs your function at different memory sizes and finds the optimal cost/performance balance:

# Deploy the power tuning state machine
aws serverlessrepo create-cloud-formation-change-set \
  --application-id arn:aws:serverlessrepo:us-east-1:451282441545:applications/aws-lambda-power-tuning \
  --stack-name lambda-power-tuning \
  --capabilities CAPABILITY_IAM

# Run it against a function
aws stepfunctions start-execution \
  --state-machine-arn arn:aws:states:us-east-1:123456789012:stateMachine:powerTuningStateMachine \
  --input '{
    "lambdaARN": "arn:aws:lambda:us-east-1:123456789012:function:my-function",
    "powerValues": [128, 256, 512, 1024, 2048, 3072],
    "num": 50,
    "payload": "{}",
    "parallelInvocation": true,
    "strategy": "cost"
  }'

The tool outputs a visualization showing cost vs execution time at each memory level. I've seen functions running at 1024MB that performed identically at 256MB — that's a 75% cost reduction for zero performance loss.

ECR Image Lifecycle Policies

Container images accumulate in ECR and cost $0.10/GB/month. Most teams never clean up old images:

# Apply lifecycle policy to expire untagged images older than 7 days
aws ecr put-lifecycle-policy \
  --repository-name my-app \
  --lifecycle-policy-text '{
    "rules": [
      {
        "rulePriority": 1,
        "description": "Expire untagged images after 7 days",
        "selection": {
          "tagStatus": "untagged",
          "countType": "sinceImagePushed",
          "countUnit": "days",
          "countNumber": 7
        },
        "action": { "type": "expire" }
      },
      {
        "rulePriority": 2,
        "description": "Keep only last 20 tagged images",
        "selection": {
          "tagStatus": "tagged",
          "tagPatternList": ["*"],
          "countType": "imageCountMoreThan",
          "countNumber": 20
        },
        "action": { "type": "expire" }
      }
    ]
  }'

Automated Cleanup for Non-Production

For dev and staging environments, schedule regular cleanup of abandoned resources:

# Find idle EKS node groups in dev
aws eks list-nodegroups --cluster-name dev-cluster \
  --query 'nodegroups' --output text | while read ng; do
  DESIRED=$(aws eks describe-nodegroup \
    --cluster-name dev-cluster \
    --nodegroup-name "$ng" \
    --query 'nodegroup.scalingConfig.desiredSize' --output text)
  echo "$ng: desired=$DESIRED"
done

# Scale down dev cluster outside business hours (cron job)
aws eks update-nodegroup-config \
  --cluster-name dev-cluster \
  --nodegroup-name general \
  --scaling-config minSize=0,maxSize=5,desiredSize=0

Building a Cost Culture: FinOps Practices

Technical optimizations only stick if the organization supports them. Here's what I've seen work.

Weekly Cost Review Meeting

Set up a 30-minute weekly meeting with one dashboard and three questions:

What changed this week? Look at the cost delta from the previous week.
What's the top-growing service? Identify the fastest cost increase.
What's the next action item? Pick one optimization to implement before next week.

Team Cost Accountability

# Generate per-team cost report using tags
aws ce get-cost-and-usage \
  --time-period Start=2026-03-01,End=2026-03-23 \
  --granularity MONTHLY \
  --group-by Type=TAG,Key=Team \
  --metrics "UnblendedCost" \
  --query 'ResultsByTime[0].Groups[*].{
    Team: Keys[0],
    Cost: Metrics.UnblendedCost.Amount
  }' --output table

Send this to team leads monthly. When teams see their own costs, behavior changes. I've watched a team cut 40% of their spend within a month of getting their first cost report — they didn't even know they had 15 unused RDS snapshots.

Budget Alerts

# Create a budget with alerts at 80% and 100%
aws budgets create-budget --account-id 123456789012 --budget '{
  "BudgetName": "monthly-infrastructure",
  "BudgetLimit": {"Amount": "50000", "Unit": "USD"},
  "BudgetType": "COST",
  "TimeUnit": "MONTHLY",
  "CostFilters": {}
}' --notifications-with-subscribers '[
  {
    "Notification": {
      "NotificationType": "ACTUAL",
      "ComparisonOperator": "GREATER_THAN",
      "Threshold": 80,
      "ThresholdType": "PERCENTAGE"
    },
    "Subscribers": [
      {"SubscriptionType": "EMAIL", "Address": "[email protected]"}
    ]
  },
  {
    "Notification": {
      "NotificationType": "ACTUAL",
      "ComparisonOperator": "GREATER_THAN",
      "Threshold": 100,
      "ThresholdType": "PERCENTAGE"
    },
    "Subscribers": [
      {"SubscriptionType": "EMAIL", "Address": "[email protected]"}
    ]
  }
]'

The Optimization Checklist

Run through this quarterly. Every item has a dollar amount attached.

Priority	Action	Expected Savings	Effort
1	Delete unattached EBS volumes	Immediate	30 min
2	Delete unused Elastic IPs	Immediate	10 min
3	Migrate gp2 to gp3	20% on EBS	1 hour
4	Add S3 lifecycle policies	40-70% on S3	2 hours
5	Right-size EC2 instances	20-40% on EC2	1 week
6	Add VPC endpoints for S3/DynamoDB	50%+ on NAT	1 hour
7	Purchase Savings Plans	30-66% on compute	2 hours
8	Migrate to Graviton	20% on EC2	2-4 weeks
9	Implement Spot for fault-tolerant	60-90% on batch	1-2 weeks
10	Optimize cross-AZ traffic	10-20% on networking	1-2 weeks
11	Right-size RDS instances	20-40% on databases	1 week
12	Evaluate Aurora Serverless v2	40-60% on Aurora	1-2 weeks
13	Implement Karpenter for EKS	30-40% on nodes	2 weeks
14	Schedule dev/staging shutdowns	60%+ on non-prod	1-2 days

The Bottom Line

AWS cost optimization isn't a one-time project. It's a continuous practice. The companies that save the most money are the ones that review costs weekly, tag everything, and treat cloud spend as an engineering metric — not just a finance problem.

Start with the quick wins at the top of the checklist. They take hours, not weeks, and they'll fund the engineering time for the bigger optimizations. I've never run this playbook and found less than 25% savings. Usually it's north of 35%.

The hardest part isn't the technical implementation — it's building the organizational habit. Set up the dashboards, send the reports, celebrate the wins publicly. When an engineer saves $2,000/month by right-sizing a database, make sure the whole team knows about it. Cost consciousness is a culture, and cultures are built one visible success at a time.

Your CFO will thank you. Your runway will thank you. And the next time someone spins up an m5.4xlarge for a cron job, you'll have the dashboards to catch it.

One last point: cost optimization never ends because AWS never stops adding services, and your infrastructure never stops growing. Build the review cadence into your team's rhythm. Make it a monthly habit, not an annual crisis. The teams that treat cost as a first-class engineering concern — right alongside performance, reliability, and security — are the ones that sustain their optimizations long term.

Advertise here

aws cost-optimization finops cloud-cost reserved-instances savings-plans s3 ec2 networking

Was this article helpful?

Dev Patel

Cloud Cost Optimization Specialist

I find the money your cloud is wasting. FinOps practitioner, data-driven analyst, and the person your CFO wishes they'd hired sooner. Every dollar saved is a dollar earned.

Twitter/X LinkedIn

What our experts think

Dev PatelCloud Cost Optimization SpecialistAgrees

This playbook covers the full FinOps lifecycle. The biggest mistake teams make is jumping straight to Reserved Instances before they've right-sized and cleaned up waste. Follow the order here.

Asif MuzammilSenior Cloud ArchitectAdds Context

Don't overlook data transfer costs. Inter-AZ and cross-region traffic can quietly become your second-largest line item. Architect for locality where it matters.

Riku TanakaSRE & Observability EngineerRecommends

Set up cost anomaly alerts alongside your infrastructure monitoring. Catching a misconfigured autoscaler in hours instead of at end-of-month can save thousands.

Cloud Cost OptimizationChapter 1 of 6

First chapter

Next Chapter

EC2 Right-Sizing

Cloud CostTutorialIntermediateNeeds Review

Reserved Instances vs Savings Plans: Which to Buy When

A data-driven comparison of AWS Reserved Instances vs Savings Plans — with decision frameworks, break-even math, and real purchase recommendations.

Dev Patel·Mar 22, 2026

9 min read

Cloud CostTutorialBeginnerNeeds Review

AWS EC2 Right-Sizing: Stop Overpaying for Compute

Find and fix oversized EC2 instances with this practical right-sizing guide. Save 30-50% on AWS compute costs using CloudWatch metrics and tooling.

Dev Patel·Mar 20, 2026

8 min read

Cloud CostTutorialBeginnerNeeds Review

S3 Storage Class Optimization: Stop Paying Hot Prices for Cold Data

A practical guide to S3 storage class selection and lifecycle policies — with real dollar figures showing how to cut storage costs by 60-80%.

Dev Patel·Mar 20, 2026

8 min read

Cloud CostTutorialBeginnerNeeds Review

AWS Lambda Cost Optimization: Memory Tuning, Provisioned Concurrency, and ARM

Cut your AWS Lambda costs by 40-70% with memory right-sizing, ARM/Graviton migration, and smart provisioned concurrency strategies.

Dev Patel·Mar 22, 2026

8 min read

Cloud CostTutorialIntermediateNeeds Review

Spot Instances + Kubernetes: Save 60-90% on Compute Without the Drama

A battle-tested guide to running Kubernetes workloads on spot instances — safely, reliably, and at 60-90% less than on-demand pricing.

Dev Patel·Mar 20, 2026

10 min read

Cloud CostTutorialIntermediateNeeds Review

Automated Cloud Cost Anomaly Detection and Alerting

Set up automated cloud cost anomaly detection with AWS Cost Anomaly Detection and custom Lambda monitors to catch runaway spend early.

Dev Patel·Mar 22, 2026

10 min read

More in Cloud Cost

View all →

Cloud CostDeep DiveIntermediate

AWS Data Transfer Costs Exploding: How To Find And Fix Unexpected Cross-Region Traffic Charges

You opened your AWS bill last month and did a double-take. Data transfer charges that were $200 last quarter are now sitting at $1,800. Nothing major chang...

Nabeel Hassan·Apr 29, 2026

14 min read

Cloud CostDeep DiveIntermediate

GCP Committed Use Discounts Vs Sustained Use Discounts: Maximize Savings With Workload Analysis

As platform engineers, we're constantly balancing performance, reliability, and cost optimization. Google Cloud Platform's discount models—Committed Use Di...

Zara Blackwood·Apr 14, 2026

14 min read

Cloud CostTutorialBeginnerNeeds Review

Multi-Cloud Networking: Transit Gateway Patterns for AWS and Azure

How to design multi-cloud connectivity between AWS and Azure using Transit Gateway and Virtual WAN — patterns, pitfalls, and cost tradeoffs.

Asif Muzammil·Mar 29, 2026

6 min read

Cloud CostTutorialIntermediateNeeds Review

Kubecost Setup for Kubernetes Cost Visibility and Showback

Deploy Kubecost for real-time Kubernetes cost monitoring with namespace-level showback, idle cost detection, and actionable Slack alerts.

Dev Patel·Mar 22, 2026

9 min read

Discussion

View all

On this page

What our experts think

Related Articles

Reserved Instances vs Savings Plans: Which to Buy When

AWS EC2 Right-Sizing: Stop Overpaying for Compute

S3 Storage Class Optimization: Stop Paying Hot Prices for Cold Data

AWS Lambda Cost Optimization: Memory Tuning, Provisioned Concurrency, and ARM

Spot Instances + Kubernetes: Save 60-90% on Compute Without the Drama

Automated Cloud Cost Anomaly Detection and Alerting

More in Cloud Cost

AWS Data Transfer Costs Exploding: How To Find And Fix Unexpected Cross-Region Traffic Charges

GCP Committed Use Discounts Vs Sustained Use Discounts: Maximize Savings With Workload Analysis

Multi-Cloud Networking: Transit Gateway Patterns for AWS and Azure

Kubecost Setup for Kubernetes Cost Visibility and Showback

Discussion