DevOpsil
Cloud Cost
87%
Fresh

S3 Storage Class Optimization: Stop Paying Hot Prices for Cold Data

Dev PatelDev Patel8 min read

You're Probably Paying 10x Too Much for Storage

Here's a stat that still blows my mind: the average company stores 70-80% of their S3 data in S3 Standard, but only 20-30% of that data gets accessed regularly. You're paying $0.023/GB/month for data that nobody's touched in six months.

Let me show you what that looks like in dollars.

S3 Data Volume% in Standard (Typical)Monthly OverpayAnnual Waste
1 TB80%$12.80$154
10 TB80%$128$1,536
100 TB80%$1,280$15,360
1 PB80%$12,800$153,600

At the petabyte scale, you're leaving $150K+ on the table every year. And the fix takes about an hour.

The S3 Storage Class Cheat Sheet

Let's cut through the marketing. Here's what each class actually costs and when to use it.

Storage Class$/GB/MonthMin DurationRetrieval CostBest For
S3 Standard$0.023NoneFreeActive data, accessed weekly+
S3 Intelligent-Tiering$0.023 + $0.0025/1K objectsNoneFreeUnpredictable access patterns
S3 Standard-IA$0.012530 days$0.01/GBAccessed < 1x/month
S3 One Zone-IA$0.0130 days$0.01/GBReproducible infrequent data
S3 Glacier Instant$0.00490 days$0.03/GBQuarterly access, millisecond retrieval
S3 Glacier Flexible$0.003690 days$0.01/GB (hours)Annual access, can wait hours
S3 Glacier Deep Archive$0.00099180 days$0.02/GB (12hrs)Compliance, rarely if ever accessed

The spread from Standard to Deep Archive is 23x. That's the kind of number that gets you a raise in FinOps.

Step 1: Analyze Your Access Patterns

Before you move anything, understand what you actually have. S3 Storage Lens gives you the overview, but for granular bucket-level analysis:

# Get storage breakdown by class for a specific bucket
aws s3api list-objects-v2 \
  --bucket my-app-data \
  --query "Contents[].{Key:Key,Size:Size,LastModified:LastModified,StorageClass:StorageClass}" \
  --output json | jq '
    group_by(.StorageClass) |
    map({
      class: .[0].StorageClass,
      count: length,
      total_gb: (map(.Size) | add / 1073741824 | . * 100 | round / 100)
    })'

For access pattern analysis, enable S3 Server Access Logging or use CloudTrail data events:

# Enable S3 access logging
aws s3api put-bucket-logging \
  --bucket my-app-data \
  --bucket-logging-status '{
    "LoggingEnabled": {
      "TargetBucket": "my-access-logs",
      "TargetPrefix": "s3-access/my-app-data/"
    }
  }'

Run logging for 30 days, then query it with Athena to find objects that haven't been accessed.

-- Athena query: find objects not accessed in 90+ days
SELECT key, MAX(requestdatetime) as last_access
FROM s3_access_logs
WHERE bucket = 'my-app-data'
  AND operation LIKE 'REST.GET%'
GROUP BY key
HAVING MAX(requestdatetime) < date_add('day', -90, current_date)
ORDER BY last_access ASC;

Step 2: Build Lifecycle Policies

This is where the savings happen. A well-designed lifecycle policy automates the entire tiering process.

Terraform Lifecycle Configuration

resource "aws_s3_bucket_lifecycle_configuration" "cost_optimized" {
  bucket = aws_s3_bucket.app_data.id

  # Rule 1: Application logs — aggressive tiering
  rule {
    id     = "logs-lifecycle"
    status = "Enabled"

    filter {
      prefix = "logs/"
    }

    transition {
      days          = 30
      storage_class = "STANDARD_IA"
    }

    transition {
      days          = 90
      storage_class = "GLACIER"
    }

    expiration {
      days = 365
    }
  }

  # Rule 2: User uploads — moderate tiering
  rule {
    id     = "uploads-lifecycle"
    status = "Enabled"

    filter {
      prefix = "uploads/"
    }

    transition {
      days          = 60
      storage_class = "STANDARD_IA"
    }

    transition {
      days          = 180
      storage_class = "GLACIER_IR"
    }
  }

  # Rule 3: Backups — deep archive fast
  rule {
    id     = "backups-lifecycle"
    status = "Enabled"

    filter {
      prefix = "backups/"
    }

    transition {
      days          = 1
      storage_class = "GLACIER"
    }

    transition {
      days          = 90
      storage_class = "DEEP_ARCHIVE"
    }

    expiration {
      days = 2555  # 7 years for compliance
    }
  }

  # Rule 4: Clean up incomplete multipart uploads
  rule {
    id     = "abort-multipart"
    status = "Enabled"

    filter {
      prefix = ""
    }

    abort_incomplete_multipart_upload {
      days_after_initiation = 7
    }
  }
}

That last rule — aborting incomplete multipart uploads — is free money. I've seen buckets with hundreds of GBs of orphaned multipart fragments. You're paying Standard rates for literal garbage.

Step 3: Use Intelligent-Tiering for the Unpredictable Stuff

When you genuinely don't know the access pattern, Intelligent-Tiering is the move. It automatically shifts objects between tiers and you never pay retrieval fees.

# Set default storage class for a bucket to Intelligent-Tiering
aws s3api put-bucket-intelligent-tiering-configuration \
  --bucket my-app-data \
  --id "full-tiering" \
  --intelligent-tiering-configuration '{
    "Id": "full-tiering",
    "Status": "Enabled",
    "Tierings": [
      {
        "AccessTier": "ARCHIVE_ACCESS",
        "Days": 90
      },
      {
        "AccessTier": "DEEP_ARCHIVE_ACCESS",
        "Days": 180
      }
    ]
  }'

The monitoring fee is $0.0025 per 1,000 objects/month. For objects under 128 KB, there's no monitoring fee and they always stay in the frequent access tier. At scale, the monitoring cost is negligible compared to the savings.

Scenario (10 TB, 50M objects)Standard (all hot)Intelligent-Tiering
Storage cost$235/mo$47-$140/mo
Monitoring fee$0$125/mo
Retrieval cost$0$0
Total$235/mo$172-$265/mo

For data with unpredictable access patterns, Intelligent-Tiering saves 25-40% on average.

Step 4: Watch for the Gotchas

Minimum Storage Duration Charges

Move an object to Glacier and delete it after 30 days? You still pay for 90 days. Factor this into your lifecycle rules.

# BAD: Transition to Glacier at 60 days, expire at 80 days
# You pay for 90 days of Glacier storage even though object is deleted at day 80

# GOOD: Transition to Glacier at 60 days, expire at 150+ days
# Object lives past the 90-day minimum, no wasted spend

Minimum Object Size

Objects smaller than 128 KB in Standard-IA or One Zone-IA get billed as 128 KB. If you have millions of tiny files, Standard or Intelligent-Tiering is cheaper.

Actual Object SizeStandard Cost (1M objects)Standard-IA Cost (1M objects)
1 KB$0.023$1.60 (billed as 128 KB each!)
10 KB$0.23$1.60
128 KB$2.94$1.60
1 MB$23.00$12.50

See that? For 1 KB objects, Standard-IA is 70x more expensive than Standard. Size matters.

Retrieval Costs Add Up

Before moving everything to Glacier, model the retrieval costs:

# Estimate monthly retrieval cost
# If you retrieve 100 GB/month from Glacier Flexible:
# Retrieval: 100 GB * $0.01/GB = $1.00
# Data transfer: 100 GB * $0.09/GB (to internet) = $9.00
# Total retrieval cost: $10.00/month

# Compare: keeping 100 GB in Standard = $2.30/month
# If you retrieve the full 100 GB monthly, Standard is cheaper!

The break-even point: if you retrieve more than ~15% of your Glacier data per month, Standard-IA is probably cheaper.

Real-World Savings Breakdown

Here's what a lifecycle optimization project looked like for a 50 TB bucket I worked on last quarter:

Data CategoryVolumeBefore (Standard)After (Optimized)Monthly Savings
Active app data10 TB$235$235 (Standard)$0
Logs (30-90 days)15 TB$353$191 (Standard-IA)$162
Logs (90+ days)10 TB$235$36 (Glacier)$199
Old backups12 TB$282$12 (Deep Archive)$270
Multipart fragments3 TB$71$0 (deleted)$71
Totals50 TB$1,176/mo$474/mo$702/mo

That's $8,424/year saved from a single bucket. Multiply that across your org and the numbers get serious fast.

The Action Plan

  1. Today: Enable S3 Storage Lens across all accounts. It's free for the dashboard-level metrics.
  2. This week: Add the abort-incomplete-multipart-upload rule to every bucket. Zero risk, immediate savings.
  3. Next two weeks: Analyze access patterns and deploy lifecycle policies for your top 5 buckets by size.
  4. Ongoing: Review S3 costs monthly. Access patterns change, and your lifecycle policies should evolve with them.

Storage costs are the silent killer of cloud budgets. They grow linearly with your data, and most teams just accept the number on the bill. Don't be that team.

Share:
Dev Patel
Dev Patel

Cloud Cost Optimization Specialist

I find the money your cloud is wasting. FinOps practitioner, data-driven analyst, and the person your CFO wishes they'd hired sooner. Every dollar saved is a dollar earned.

Related Articles