Cloud Cost Tagging Strategy: Make Every Dollar Traceable
The $400K Question Nobody Could Answer
Last year, a VP asked a simple question at my previous company: "Which product team is responsible for the $400K/month AWS bill?" Nobody could answer it. Not engineering leadership, not the platform team, not finance. The bill was one giant blob.
That's what happens without a tagging strategy. You can't optimize what you can't attribute. You can't hold teams accountable for costs they can't see. And you definitely can't forecast when your cost data is a black box.
Here's the breakdown of untagged resources I typically see at companies without enforcement:
| Company Stage | Untagged Resources | Monthly Unattributable Spend |
|---|---|---|
| Startup (< $50K/mo) | 40-60% | $20K-$30K |
| Growth ($50K-$200K/mo) | 50-70% | $25K-$140K |
| Enterprise ($200K+/mo) | 30-50% | $60K-$100K+ |
If you can't attribute it, you can't reduce it. Let's fix that.
The Minimum Viable Tag Set
I've seen companies with 30+ required tags. Nobody fills them in. Nobody enforces them. They're garbage data. Start with the minimum that gives you real cost visibility.
Required Tags (enforce these)
| Tag Key | Example Values | Purpose |
|---|---|---|
team | platform, payments, search | Cost allocation by team |
service | api-gateway, user-service | Cost allocation by service |
environment | production, staging, dev | Separate prod from non-prod costs |
cost-center | CC-1001, CC-2045 | Maps to finance cost centers |
managed-by | terraform, manual, cdk | Track IaC coverage |
Recommended Tags (encourage but don't block on)
| Tag Key | Example Values | Purpose |
|---|---|---|
project | q1-migration, perf-improvements | Temporary project tracking |
data-classification | public, internal, pii | Security + compliance |
backup-policy | daily, weekly, none | Backup automation |
expiry-date | 2026-06-01 | Auto-cleanup for temporary resources |
Five required tags. That's it. You can add more later, but start with five that people will actually use consistently.
Step 1: Enforce Tags in Terraform
This is your first line of defense. If a resource can't be created without tags, the problem solves itself.
Terraform Provider Default Tags
Set org-wide defaults that apply to every resource:
provider "aws" {
region = "us-east-1"
default_tags {
tags = {
managed-by = "terraform"
environment = var.environment
team = var.team
cost-center = var.cost_center
}
}
}
variable "environment" {
type = string
validation {
condition = contains(["production", "staging", "dev", "sandbox"], var.environment)
error_message = "Environment must be one of: production, staging, dev, sandbox."
}
}
variable "team" {
type = string
validation {
condition = can(regex("^[a-z][a-z0-9-]+$", var.team))
error_message = "Team name must be lowercase alphanumeric with hyphens."
}
}
variable "cost_center" {
type = string
validation {
condition = can(regex("^CC-[0-9]{4}$", var.cost_center))
error_message = "Cost center must match format CC-XXXX."
}
}
Now every aws_* resource gets these tags automatically. No developer action required.
Custom Validation Module
For the service tag (which is resource-specific), enforce it at the module level:
# modules/tagged-resource/variables.tf
variable "required_tags" {
type = object({
service = string
})
validation {
condition = length(var.required_tags.service) > 0
error_message = "The 'service' tag is required and cannot be empty."
}
}
# modules/tagged-resource/main.tf
locals {
enforced_tags = merge(var.required_tags, {
terraform-module = "tagged-resource"
})
}
Pre-Commit Hook with tflint
Catch missing tags before code even hits CI:
# .tflint.hcl
plugin "aws" {
enabled = true
version = "0.30.0"
source = "github.com/terraform-linters/tflint-ruleset-aws"
}
rule "aws_resource_missing_tags" {
enabled = true
tags = ["team", "service", "environment", "cost-center", "managed-by"]
}
# .pre-commit-config.yaml
repos:
- repo: https://github.com/antonbabenko/pre-commit-terraform
rev: v1.88.0
hooks:
- id: terraform_tflint
args: ['--args=--config=__GIT_WORKING_DIR__/.tflint.hcl']
Step 2: AWS Tag Policies for Organization-Wide Enforcement
Terraform catches IaC resources. AWS Tag Policies catch everything else — console-created resources, CLI-created resources, SDK-created resources.
{
"tags": {
"team": {
"tag_key": {
"@@assign": "team"
},
"tag_value": {
"@@assign": [
"platform",
"payments",
"search",
"data-engineering",
"mobile",
"infrastructure"
]
},
"enforced_for": {
"@@assign": [
"ec2:instance",
"ec2:volume",
"rds:db",
"s3:bucket",
"lambda:function",
"elasticloadbalancing:loadbalancer"
]
}
},
"environment": {
"tag_key": {
"@@assign": "environment"
},
"tag_value": {
"@@assign": [
"production",
"staging",
"dev",
"sandbox"
]
},
"enforced_for": {
"@@assign": [
"ec2:instance",
"ec2:volume",
"rds:db",
"s3:bucket"
]
}
}
}
}
Apply this at the AWS Organization root, and every account in the org must comply.
SCP: Deny Untagged Resource Creation
For hard enforcement, a Service Control Policy that blocks resource creation without required tags:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyEC2WithoutTags",
"Effect": "Deny",
"Action": [
"ec2:RunInstances",
"ec2:CreateVolume"
],
"Resource": [
"arn:aws:ec2:*:*:instance/*",
"arn:aws:ec2:*:*:volume/*"
],
"Condition": {
"Null": {
"aws:RequestTag/team": "true",
"aws:RequestTag/environment": "true",
"aws:RequestTag/service": "true"
}
}
}
]
}
This is the nuclear option. Nobody creates an EC2 instance or EBS volume without team, environment, and service tags. Period.
Step 3: Activate Tags in Cost Explorer
Tags don't show up in cost reports unless you activate them:
# Activate cost allocation tags
aws ce update-cost-allocation-tags-status \
--cost-allocation-tags-status \
'[{"TagKey":"team","Status":"Active"},
{"TagKey":"service","Status":"Active"},
{"TagKey":"environment","Status":"Active"},
{"TagKey":"cost-center","Status":"Active"},
{"TagKey":"managed-by","Status":"Active"}]'
Important: tags take 24 hours to appear in Cost Explorer after activation, and they're not retroactive. Future costs will be attributed; historical data before activation won't have tag breakdowns.
Step 4: Build Cost Visibility Dashboards
Once tags are flowing, build the reports that drive accountability.
Cost by Team (AWS CLI)
# Monthly cost per team for the last 3 months
aws ce get-cost-and-usage \
--time-period Start=2026-01-01,End=2026-03-20 \
--granularity MONTHLY \
--metrics "UnblendedCost" \
--group-by Type=TAG,Key=team \
--query 'ResultsByTime[].Groups[].{
Team:Keys[0],
Amount:Metrics.UnblendedCost.Amount
}' \
--output table
Untagged Cost Report
This is the accountability lever. Show leadership how much spend can't be attributed:
# Find untagged costs — the "nobody owns this" bucket
aws ce get-cost-and-usage \
--time-period Start=2026-03-01,End=2026-03-20 \
--granularity MONTHLY \
--metrics "UnblendedCost" \
--group-by Type=TAG,Key=team \
--query 'ResultsByTime[].Groups[?Keys[0]==`team$`].{
Untagged:Metrics.UnblendedCost.Amount
}' \
--output text
The empty tag key (team$) captures all untagged resources. Make this number visible and track it trending toward zero.
Step 5: Tag Compliance Scoring
Track compliance by team and make it competitive:
| Team | Total Resources | Tagged | Compliance | Untagged Monthly Cost |
|---|---|---|---|---|
| Platform | 342 | 338 | 98.8% | $47 |
| Payments | 215 | 201 | 93.5% | $312 |
| Search | 187 | 155 | 82.9% | $1,240 |
| Data Engineering | 524 | 311 | 59.4% | $8,750 |
| Mobile (backend) | 98 | 42 | 42.9% | $3,200 |
Publish this weekly. Nobody wants to be the team at the bottom. Peer pressure is more effective than any policy document.
# Generate tag compliance report using AWS Resource Groups
aws resourcegroupstaggingapi get-resources \
--tag-filters Key=team \
--query 'ResourceTagMappingList | length(@)'
aws resourcegroupstaggingapi get-resources \
--query 'ResourceTagMappingList | length(@)'
# Compliance % = (tagged / total) * 100
Common Mistakes to Avoid
Mistake 1: Too many required tags. Five is the sweet spot. More than seven and compliance drops off a cliff.
Mistake 2: Free-form tag values. "prod", "Prod", "production", "PRODUCTION" — these all become separate line items in Cost Explorer. Enforce allowed values.
Mistake 3: Tagging only at creation. Resources get re-purposed. Run a monthly audit to catch tag drift:
# Find resources missing required tags
aws resourcegroupstaggingapi get-resources \
--tags-per-page 100 \
--query 'ResourceTagMappingList[?!contains(Tags[].Key, `team`)].[ResourceARN]' \
--output text
Mistake 4: Not tagging shared resources. Load balancers, VPCs, NAT gateways — these get tagged with the platform team's cost center, not left empty.
Mistake 5: Ignoring data transfer costs. Tagging doesn't help with data transfer attribution. Use VPC Flow Logs + Athena for that. But start with the 80% of costs that tags do cover.
The Rollout Timeline
| Week | Action | Expected Compliance |
|---|---|---|
| 1 | Define tag schema, activate cost allocation tags | Baseline: 30-40% |
| 2 | Add default_tags to Terraform providers | 50-60% |
| 3 | Deploy tag policies and SCPs to sandbox accounts | 55-65% |
| 4 | Roll SCPs to dev/staging accounts | 65-75% |
| 5-6 | Backfill existing resources with a tagging script | 80-85% |
| 7-8 | Roll SCPs to production accounts | 85-92% |
| Ongoing | Weekly compliance reports, monthly tag audits | 95%+ target |
Getting from 30% to 90% takes about two months. Getting from 90% to 98% takes another two. But at 90%, you already have the visibility to make informed cost decisions. Don't let perfect be the enemy of good — start enforcing today.
Related Articles
Related Articles
The Complete AWS Cost Optimization Playbook: Compute, Storage, Networking, and Reserved Capacity
A data-driven playbook for cutting AWS costs across compute, storage, networking, and reserved capacity with real numbers and actions.
AWS Lambda Cost Optimization: Memory Tuning, Provisioned Concurrency, and ARM
Cut your AWS Lambda costs by 40-70% with memory right-sizing, ARM/Graviton migration, and smart provisioned concurrency strategies.
Automated Cloud Cost Anomaly Detection and Alerting
Set up automated cloud cost anomaly detection with AWS Cost Anomaly Detection and custom Lambda monitors to catch runaway spend early.