DevOpsil
Terraform
95%
Fresh

Terraform Module Design Patterns for Large Teams

Zara BlackwoodZara Blackwood9 min read

If It's Not in Code, It Doesn't Exist

When you're a solo developer, Terraform modules are nice. When you're a team of 20+ engineers deploying to multiple environments, modules are survival. Without structured, versioned, composable modules, you get snowflake infrastructure that nobody can reproduce.

Here are the patterns I've used across platform teams managing 500+ Terraform resources.

Pattern 1: Standard Module Structure

Every module follows this file layout:

modules/
└── vpc/
    ├── main.tf          # Primary resources
    ├── variables.tf     # Input variables with descriptions
    ├── outputs.tf       # Output values
    ├── versions.tf      # Provider and Terraform version constraints
    ├── locals.tf        # Computed values and naming conventions
    ├── README.md        # Auto-generated with terraform-docs
    └── examples/
        └── basic/
            ├── main.tf
            └── terraform.tfvars

versions.tf — Pin Everything

terraform {
  required_version = ">= 1.7.0, < 2.0.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.40"
    }
  }
}

Never use >= without an upper bound. A provider major version bump will break your module and every team using it.

variables.tf — Type Everything

variable "name" {
  description = "Name prefix for all resources"
  type        = string

  validation {
    condition     = can(regex("^[a-z][a-z0-9-]{2,24}$", var.name))
    error_message = "Name must be lowercase, start with a letter, 3-25 chars."
  }
}

variable "vpc_cidr" {
  description = "CIDR block for the VPC"
  type        = string
  default     = "10.0.0.0/16"

  validation {
    condition     = can(cidrhost(var.vpc_cidr, 0))
    error_message = "Must be a valid CIDR block."
  }
}

variable "environment" {
  description = "Environment name (dev, staging, prod)"
  type        = string

  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "Must be dev, staging, or prod."
  }
}

variable "tags" {
  description = "Additional tags for all resources"
  type        = map(string)
  default     = {}
}

Validations catch errors at plan time instead of apply time. Your platform should make the right thing the easy thing — and the wrong thing impossible.

Pattern 2: Composition Over Inheritance

Don't build God modules. Build small, focused modules and compose them.

# environments/prod/main.tf

module "vpc" {
  source      = "../../modules/vpc"
  name        = "prod"
  vpc_cidr    = "10.0.0.0/16"
  environment = "prod"
}

module "eks" {
  source      = "../../modules/eks"
  name        = "prod"
  vpc_id      = module.vpc.vpc_id
  subnet_ids  = module.vpc.private_subnet_ids
  environment = "prod"
}

module "rds" {
  source      = "../../modules/rds"
  name        = "prod"
  vpc_id      = module.vpc.vpc_id
  subnet_ids  = module.vpc.database_subnet_ids
  environment = "prod"
}

Each module does ONE thing. They connect through outputs and variables. When you need to change networking, you touch the VPC module. When you need to change the database, you touch the RDS module. No blast radius overlap.

Pattern 3: The Locals Pattern for Naming

# locals.tf
locals {
  name_prefix = "${var.environment}-${var.name}"

  common_tags = merge(var.tags, {
    Environment = var.environment
    ManagedBy   = "terraform"
    Module      = "vpc"
    Team        = "platform"
  })
}

Every resource uses local.name_prefix and local.common_tags. Consistent naming across your entire infrastructure, enforced by code.

# main.tf
resource "aws_vpc" "this" {
  cidr_block = var.vpc_cidr

  tags = merge(local.common_tags, {
    Name = "${local.name_prefix}-vpc"
  })
}

Pattern 4: Remote State for Team Collaboration

# backend.tf
terraform {
  backend "s3" {
    bucket         = "devopsil-terraform-state"
    key            = "prod/vpc/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

State management is the hardest problem in IaC. Here's how I solve it:

  1. One state file per module per environment. Never share state across modules.
  2. S3 + DynamoDB locking. Prevents two engineers from applying simultaneously.
  3. State file naming convention: ENV/MODULE/terraform.tfstate (e.g., prod/vpc/terraform.tfstate)

Accessing Other Module State

data "terraform_remote_state" "vpc" {
  backend = "s3"
  config = {
    bucket = "devopsil-terraform-state"
    key    = "prod/vpc/terraform.tfstate"
    region = "us-east-1"
  }
}

# Use outputs from VPC module
resource "aws_instance" "app" {
  subnet_id = data.terraform_remote_state.vpc.outputs.private_subnet_ids[0]
}

Pattern 5: Module Versioning with Git Tags

# Use versioned modules in production
module "vpc" {
  source = "git::https://github.com/your-org/terraform-modules.git//modules/vpc?ref=v2.1.0"
}

Rules:

  • main branch = latest development
  • Git tags (v1.0.0, v2.1.0) = stable releases
  • Production ALWAYS pins to a tag
  • Dev/staging can use main for testing new versions

Pattern 6: Conditional Resource Creation

Not every environment needs every resource. Use count or for_each with conditionals to toggle resources.

variable "create_nat_gateway" {
  description = "Whether to create NAT gateways (expensive, not needed in dev)"
  type        = bool
  default     = true
}

variable "single_nat_gateway" {
  description = "Use single NAT gateway instead of one per AZ (cheaper for non-prod)"
  type        = bool
  default     = false
}

resource "aws_nat_gateway" "this" {
  count = var.create_nat_gateway ? (var.single_nat_gateway ? 1 : length(var.azs)) : 0

  allocation_id = aws_eip.nat[count.index].id
  subnet_id     = aws_subnet.public[count.index].id

  tags = merge(local.common_tags, {
    Name = "${local.name_prefix}-nat-${var.azs[count.index]}"
  })
}

In dev, set create_nat_gateway = false to save $32/month per NAT gateway. In staging, set single_nat_gateway = true for a single NAT. In prod, deploy one per AZ for high availability.

The for_each Pattern for Dynamic Resources

count has a problem: if you remove an item from the middle of a list, every subsequent resource gets recreated. Use for_each with maps instead.

variable "subnets" {
  description = "Subnet configurations"
  type = map(object({
    cidr_block        = string
    availability_zone = string
    public            = bool
  }))
}

resource "aws_subnet" "this" {
  for_each = var.subnets

  vpc_id                  = aws_vpc.this.id
  cidr_block              = each.value.cidr_block
  availability_zone       = each.value.availability_zone
  map_public_ip_on_launch = each.value.public

  tags = merge(local.common_tags, {
    Name = "${local.name_prefix}-${each.key}"
    Tier = each.value.public ? "public" : "private"
  })
}

Resources are keyed by map key, not by index. Remove "subnet-b" and only "subnet-b" gets destroyed. Everything else stays untouched.

Pattern 7: Output Design for Module Consumers

Outputs are your module's API. Design them like you would a REST API — think about what consumers actually need.

# outputs.tf

# Prefer IDs over full objects — consumers rarely need everything
output "vpc_id" {
  description = "The ID of the VPC"
  value       = aws_vpc.this.id
}

# Return lists for things that come in sets
output "private_subnet_ids" {
  description = "List of private subnet IDs"
  value       = [for s in aws_subnet.private : s.id]
}

output "private_subnet_cidrs" {
  description = "List of private subnet CIDR blocks"
  value       = [for s in aws_subnet.private : s.cidr_block]
}

# Return maps when consumers need to look up specific values
output "subnet_by_az" {
  description = "Map of AZ to subnet ID"
  value       = { for s in aws_subnet.private : s.availability_zone => s.id }
}

# Aggregate info for monitoring and tagging
output "module_metadata" {
  description = "Metadata about resources created by this module"
  value = {
    vpc_id              = aws_vpc.this.id
    vpc_cidr            = aws_vpc.this.cidr_block
    private_subnet_count = length(aws_subnet.private)
    nat_gateway_count   = length(aws_nat_gateway.this)
  }
}

Bad outputs force consumers to use element() and index math. Good outputs give them exactly what they need.

Pattern 8: Automated Documentation with terraform-docs

Nobody manually maintains module READMEs. Generate them.

# .terraform-docs.yml
formatter: markdown table

sections:
  show:
    - header
    - requirements
    - providers
    - inputs
    - outputs

Then generate docs with a single command:

# Generate README for a module
terraform-docs markdown table --output-file README.md ./modules/vpc

# Or use the config file
terraform-docs -c .terraform-docs.yml ./modules/vpc

Add injection markers to your module README so terraform-docs knows where to insert:

<!-- BEGIN_TF_DOCS -->
(auto-generated content appears here)
<!-- END_TF_DOCS -->

Run it in CI:

# .github/workflows/docs.yml
name: Terraform Docs
on:
  pull_request:
    paths: ['modules/**']

jobs:
  docs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          ref: ${{ github.event.pull_request.head.ref }}

      - uses: terraform-docs/gh-actions@v1
        with:
          working-dir: modules/vpc,modules/eks,modules/rds
          output-file: README.md
          output-method: inject
          git-push: true

Every PR that changes a module auto-updates its README. Documentation that writes itself is documentation that stays current.

Pattern 9: Pre-Commit Hooks for Module Quality

Catch issues before they reach CI:

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/antonbabenko/pre-commit-terraform
    rev: v1.92.0
    hooks:
      - id: terraform_fmt
      - id: terraform_validate
      - id: terraform_tflint
        args:
          - --args=--config=__GIT_WORKING_DIR__/.tflint.hcl
      - id: terraform_docs
        args:
          - --args=--config=.terraform-docs.yml
      - id: terraform_checkov
        args:
          - --args=--quiet
          - --args=--compact

And the TFLint config for enforcing naming conventions:

# .tflint.hcl
plugin "aws" {
  enabled = true
  version = "0.31.0"
  source  = "github.com/terraform-linters/tflint-ruleset-aws"
}

rule "terraform_naming_convention" {
  enabled = true
  format  = "snake_case"
}

rule "terraform_documented_variables" {
  enabled = true
}

rule "terraform_documented_outputs" {
  enabled = true
}

rule "terraform_standard_module_structure" {
  enabled = true
}

Every variable needs a description. Every output needs a description. Every resource follows snake_case. Enforced at commit time, not review time.

Troubleshooting

Problem: Module changes break downstream consumers. Fix: Semantic versioning. Breaking changes = major version bump. New features = minor. Fixes = patch. Always provide a migration guide for major version bumps.

Problem: State locks stuck after failed apply. Fix: terraform force-unlock LOCK_ID — but investigate WHY it was locked first. A stuck lock usually means a crashed apply — check that the apply didn't partially succeed.

Problem: Circular dependencies between modules. Fix: You have a design problem. Extract the shared component into its own module. If A depends on B and B depends on A, create module C that both A and B depend on.

Problem: Module consumers use different Terraform versions. Fix: Set upper and lower bounds in versions.tf. Run CI tests against both the minimum and maximum supported versions.

Problem: Large modules take forever to plan. Fix: Split them. If your "networking" module manages 60+ resources, break it into vpc, subnets, nat-gateways, and route-tables. Each module should manage 5-15 resources. Smaller blast radius, faster plans.

Conclusion

Your infrastructure is software. Treat it like software — modular, versioned, tested, reviewed. These patterns scale from 5 resources to 5,000. Start with the standard file structure, compose small modules, pin your versions, enforce quality with pre-commit hooks, auto-generate docs, and manage state like it's the most important file in your repo — because it is.

Share:
Zara Blackwood
Zara Blackwood

Platform Engineer

Terraform enthusiast, platform builder, DRY advocate. I believe infrastructure should be versioned, reviewed, and deployed like any other code. GitOps or bust.

Related Articles

TerraformQuick RefFresh

Terraform CLI: Cheat Sheet

Terraform CLI cheat sheet with commands organized by workflow — init, plan, apply, destroy, state manipulation, imports, and workspace management.

Zara Blackwood·
3 min read