Terraform Module Design Patterns for Large Teams
If It's Not in Code, It Doesn't Exist
When you're a solo developer, Terraform modules are nice. When you're a team of 20+ engineers deploying to multiple environments, modules are survival. Without structured, versioned, composable modules, you get snowflake infrastructure that nobody can reproduce.
Here are the patterns I've used across platform teams managing 500+ Terraform resources.
Pattern 1: Standard Module Structure
Every module follows this file layout:
modules/
└── vpc/
├── main.tf # Primary resources
├── variables.tf # Input variables with descriptions
├── outputs.tf # Output values
├── versions.tf # Provider and Terraform version constraints
├── locals.tf # Computed values and naming conventions
├── README.md # Auto-generated with terraform-docs
└── examples/
└── basic/
├── main.tf
└── terraform.tfvars
versions.tf — Pin Everything
terraform {
required_version = ">= 1.7.0, < 2.0.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.40"
}
}
}
Never use >= without an upper bound. A provider major version bump will break your module and every team using it.
variables.tf — Type Everything
variable "name" {
description = "Name prefix for all resources"
type = string
validation {
condition = can(regex("^[a-z][a-z0-9-]{2,24}$", var.name))
error_message = "Name must be lowercase, start with a letter, 3-25 chars."
}
}
variable "vpc_cidr" {
description = "CIDR block for the VPC"
type = string
default = "10.0.0.0/16"
validation {
condition = can(cidrhost(var.vpc_cidr, 0))
error_message = "Must be a valid CIDR block."
}
}
variable "environment" {
description = "Environment name (dev, staging, prod)"
type = string
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Must be dev, staging, or prod."
}
}
variable "tags" {
description = "Additional tags for all resources"
type = map(string)
default = {}
}
Validations catch errors at plan time instead of apply time. Your platform should make the right thing the easy thing — and the wrong thing impossible.
Pattern 2: Composition Over Inheritance
Don't build God modules. Build small, focused modules and compose them.
# environments/prod/main.tf
module "vpc" {
source = "../../modules/vpc"
name = "prod"
vpc_cidr = "10.0.0.0/16"
environment = "prod"
}
module "eks" {
source = "../../modules/eks"
name = "prod"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
environment = "prod"
}
module "rds" {
source = "../../modules/rds"
name = "prod"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.database_subnet_ids
environment = "prod"
}
Each module does ONE thing. They connect through outputs and variables. When you need to change networking, you touch the VPC module. When you need to change the database, you touch the RDS module. No blast radius overlap.
Pattern 3: The Locals Pattern for Naming
# locals.tf
locals {
name_prefix = "${var.environment}-${var.name}"
common_tags = merge(var.tags, {
Environment = var.environment
ManagedBy = "terraform"
Module = "vpc"
Team = "platform"
})
}
Every resource uses local.name_prefix and local.common_tags. Consistent naming across your entire infrastructure, enforced by code.
# main.tf
resource "aws_vpc" "this" {
cidr_block = var.vpc_cidr
tags = merge(local.common_tags, {
Name = "${local.name_prefix}-vpc"
})
}
Pattern 4: Remote State for Team Collaboration
# backend.tf
terraform {
backend "s3" {
bucket = "devopsil-terraform-state"
key = "prod/vpc/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
State management is the hardest problem in IaC. Here's how I solve it:
- One state file per module per environment. Never share state across modules.
- S3 + DynamoDB locking. Prevents two engineers from applying simultaneously.
- State file naming convention:
ENV/MODULE/terraform.tfstate(e.g.,prod/vpc/terraform.tfstate)
Accessing Other Module State
data "terraform_remote_state" "vpc" {
backend = "s3"
config = {
bucket = "devopsil-terraform-state"
key = "prod/vpc/terraform.tfstate"
region = "us-east-1"
}
}
# Use outputs from VPC module
resource "aws_instance" "app" {
subnet_id = data.terraform_remote_state.vpc.outputs.private_subnet_ids[0]
}
Pattern 5: Module Versioning with Git Tags
# Use versioned modules in production
module "vpc" {
source = "git::https://github.com/your-org/terraform-modules.git//modules/vpc?ref=v2.1.0"
}
Rules:
mainbranch = latest development- Git tags (
v1.0.0,v2.1.0) = stable releases - Production ALWAYS pins to a tag
- Dev/staging can use
mainfor testing new versions
Pattern 6: Conditional Resource Creation
Not every environment needs every resource. Use count or for_each with conditionals to toggle resources.
variable "create_nat_gateway" {
description = "Whether to create NAT gateways (expensive, not needed in dev)"
type = bool
default = true
}
variable "single_nat_gateway" {
description = "Use single NAT gateway instead of one per AZ (cheaper for non-prod)"
type = bool
default = false
}
resource "aws_nat_gateway" "this" {
count = var.create_nat_gateway ? (var.single_nat_gateway ? 1 : length(var.azs)) : 0
allocation_id = aws_eip.nat[count.index].id
subnet_id = aws_subnet.public[count.index].id
tags = merge(local.common_tags, {
Name = "${local.name_prefix}-nat-${var.azs[count.index]}"
})
}
In dev, set create_nat_gateway = false to save $32/month per NAT gateway. In staging, set single_nat_gateway = true for a single NAT. In prod, deploy one per AZ for high availability.
The for_each Pattern for Dynamic Resources
count has a problem: if you remove an item from the middle of a list, every subsequent resource gets recreated. Use for_each with maps instead.
variable "subnets" {
description = "Subnet configurations"
type = map(object({
cidr_block = string
availability_zone = string
public = bool
}))
}
resource "aws_subnet" "this" {
for_each = var.subnets
vpc_id = aws_vpc.this.id
cidr_block = each.value.cidr_block
availability_zone = each.value.availability_zone
map_public_ip_on_launch = each.value.public
tags = merge(local.common_tags, {
Name = "${local.name_prefix}-${each.key}"
Tier = each.value.public ? "public" : "private"
})
}
Resources are keyed by map key, not by index. Remove "subnet-b" and only "subnet-b" gets destroyed. Everything else stays untouched.
Pattern 7: Output Design for Module Consumers
Outputs are your module's API. Design them like you would a REST API — think about what consumers actually need.
# outputs.tf
# Prefer IDs over full objects — consumers rarely need everything
output "vpc_id" {
description = "The ID of the VPC"
value = aws_vpc.this.id
}
# Return lists for things that come in sets
output "private_subnet_ids" {
description = "List of private subnet IDs"
value = [for s in aws_subnet.private : s.id]
}
output "private_subnet_cidrs" {
description = "List of private subnet CIDR blocks"
value = [for s in aws_subnet.private : s.cidr_block]
}
# Return maps when consumers need to look up specific values
output "subnet_by_az" {
description = "Map of AZ to subnet ID"
value = { for s in aws_subnet.private : s.availability_zone => s.id }
}
# Aggregate info for monitoring and tagging
output "module_metadata" {
description = "Metadata about resources created by this module"
value = {
vpc_id = aws_vpc.this.id
vpc_cidr = aws_vpc.this.cidr_block
private_subnet_count = length(aws_subnet.private)
nat_gateway_count = length(aws_nat_gateway.this)
}
}
Bad outputs force consumers to use element() and index math. Good outputs give them exactly what they need.
Pattern 8: Automated Documentation with terraform-docs
Nobody manually maintains module READMEs. Generate them.
# .terraform-docs.yml
formatter: markdown table
sections:
show:
- header
- requirements
- providers
- inputs
- outputs
Then generate docs with a single command:
# Generate README for a module
terraform-docs markdown table --output-file README.md ./modules/vpc
# Or use the config file
terraform-docs -c .terraform-docs.yml ./modules/vpc
Add injection markers to your module README so terraform-docs knows where to insert:
<!-- BEGIN_TF_DOCS -->
(auto-generated content appears here)
<!-- END_TF_DOCS -->
Run it in CI:
# .github/workflows/docs.yml
name: Terraform Docs
on:
pull_request:
paths: ['modules/**']
jobs:
docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.ref }}
- uses: terraform-docs/gh-actions@v1
with:
working-dir: modules/vpc,modules/eks,modules/rds
output-file: README.md
output-method: inject
git-push: true
Every PR that changes a module auto-updates its README. Documentation that writes itself is documentation that stays current.
Pattern 9: Pre-Commit Hooks for Module Quality
Catch issues before they reach CI:
# .pre-commit-config.yaml
repos:
- repo: https://github.com/antonbabenko/pre-commit-terraform
rev: v1.92.0
hooks:
- id: terraform_fmt
- id: terraform_validate
- id: terraform_tflint
args:
- --args=--config=__GIT_WORKING_DIR__/.tflint.hcl
- id: terraform_docs
args:
- --args=--config=.terraform-docs.yml
- id: terraform_checkov
args:
- --args=--quiet
- --args=--compact
And the TFLint config for enforcing naming conventions:
# .tflint.hcl
plugin "aws" {
enabled = true
version = "0.31.0"
source = "github.com/terraform-linters/tflint-ruleset-aws"
}
rule "terraform_naming_convention" {
enabled = true
format = "snake_case"
}
rule "terraform_documented_variables" {
enabled = true
}
rule "terraform_documented_outputs" {
enabled = true
}
rule "terraform_standard_module_structure" {
enabled = true
}
Every variable needs a description. Every output needs a description. Every resource follows snake_case. Enforced at commit time, not review time.
Troubleshooting
Problem: Module changes break downstream consumers. Fix: Semantic versioning. Breaking changes = major version bump. New features = minor. Fixes = patch. Always provide a migration guide for major version bumps.
Problem: State locks stuck after failed apply.
Fix: terraform force-unlock LOCK_ID — but investigate WHY it was locked first. A stuck lock usually means a crashed apply — check that the apply didn't partially succeed.
Problem: Circular dependencies between modules. Fix: You have a design problem. Extract the shared component into its own module. If A depends on B and B depends on A, create module C that both A and B depend on.
Problem: Module consumers use different Terraform versions.
Fix: Set upper and lower bounds in versions.tf. Run CI tests against both the minimum and maximum supported versions.
Problem: Large modules take forever to plan.
Fix: Split them. If your "networking" module manages 60+ resources, break it into vpc, subnets, nat-gateways, and route-tables. Each module should manage 5-15 resources. Smaller blast radius, faster plans.
Conclusion
Your infrastructure is software. Treat it like software — modular, versioned, tested, reviewed. These patterns scale from 5 resources to 5,000. Start with the standard file structure, compose small modules, pin your versions, enforce quality with pre-commit hooks, auto-generate docs, and manage state like it's the most important file in your repo — because it is.
Related Articles
Platform Engineer
Terraform enthusiast, platform builder, DRY advocate. I believe infrastructure should be versioned, reviewed, and deployed like any other code. GitOps or bust.
Related Articles
Terraform from Zero to Production: Project Structure, Modules, State, and CI/CD
Build production-grade Terraform infrastructure — project structure, module design, state management, testing, and CI/CD pipeline integration.
Terraform CLI: Cheat Sheet
Terraform CLI cheat sheet with commands organized by workflow — init, plan, apply, destroy, state manipulation, imports, and workspace management.
Testing Terraform with Terratest: A Practical Guide
How to write unit and integration tests for Terraform modules using Terratest — because untested infrastructure is a liability.