Terraform Remote State: S3 Backends, Locking, Workspaces, and State Surgery

State Is the Source of Truth. Treat It That Way.

Your Terraform state file is the single most critical artifact in your infrastructure pipeline. It maps every resource Terraform manages to real cloud objects. Lose it, corrupt it, or let two engineers write to it simultaneously — and you're in for a very bad day.

Local state is a toy. If you're running terraform apply with state sitting on your laptop, you're one rm -rf away from orphaned infrastructure nobody can manage. Let's fix that.

Setting Up the S3 Backend

First, you need the backend infrastructure itself. Yes, this is the chicken-and-egg problem of IaC — you need infrastructure to store the state that manages your infrastructure.

Bootstrap Module

bootstrap/
├── main.tf
├── variables.tf
├── outputs.tf
└── terraform.tfvars

# bootstrap/main.tf

resource "aws_s3_bucket" "state" {
  bucket = "${var.org_name}-terraform-state"

  tags = {
    ManagedBy = "terraform-bootstrap"
    Purpose   = "terraform-state"
  }
}

resource "aws_s3_bucket_versioning" "state" {
  bucket = aws_s3_bucket.state.id

  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "state" {
  bucket = aws_s3_bucket.state.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm     = "aws:kms"
      kms_master_key_id = aws_kms_key.state.arn
    }
    bucket_key_enabled = true
  }
}

resource "aws_s3_bucket_public_access_block" "state" {
  bucket = aws_s3_bucket.state.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_kms_key" "state" {
  description             = "KMS key for Terraform state encryption"
  deletion_window_in_days = 30
  enable_key_rotation     = true
}

resource "aws_dynamodb_table" "locks" {
  name         = "${var.org_name}-terraform-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }

  tags = {
    ManagedBy = "terraform-bootstrap"
    Purpose   = "terraform-state-locking"
  }
}

KMS encryption, versioning, public access blocked, and DynamoDB for locking. This is the minimum. Apply this with local state, then migrate.

Configuring the Backend

# backend.tf
terraform {
  backend "s3" {
    bucket         = "acme-terraform-state"
    key            = "networking/vpc/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    kms_key_id     = "alias/terraform-state"
    dynamodb_table = "acme-terraform-locks"
  }
}

After adding this, run:

terraform init -migrate-state

Terraform copies your local state to S3. Verify it worked, then delete the local .tfstate file. Don't skip verification.

State Locking: Why DynamoDB Matters

Without locking, this happens:

Engineer A runs terraform plan — sees 3 changes
Engineer B runs terraform plan — sees the same 3 changes
Both run terraform apply at the same time
One apply succeeds, the other corrupts state or creates duplicate resources

DynamoDB locking prevents concurrent writes. When Terraform acquires a lock, it writes a record to the DynamoDB table. Any other apply attempt blocks until the lock is released.

# Lock stuck after a crashed apply?
terraform force-unlock LOCK_ID

# Get the lock ID from the error message. ALWAYS investigate why
# the lock was stuck before force-unlocking.

State Key Strategy

Your key path in the backend config determines how state files are organized in S3. Here's the pattern I use:

s3://acme-terraform-state/
├── networking/
│   ├── vpc/terraform.tfstate
│   └── dns/terraform.tfstate
├── compute/
│   ├── eks/terraform.tfstate
│   └── ec2-bastion/terraform.tfstate
├── data/
│   ├── rds-primary/terraform.tfstate
│   └── elasticache/terraform.tfstate
└── security/
    ├── iam/terraform.tfstate
    └── waf/terraform.tfstate

One state file per logical component. Small blast radius. If an apply goes wrong on your WAF config, your VPC state is untouched.

Workspaces: When They Work and When They Don't

Workspaces create isolated state files within the same backend config. Terraform stores them under env:/ prefixes in S3.

terraform workspace new staging
terraform workspace new prod
terraform workspace select staging

# Using workspace name in resource configuration
locals {
  env = terraform.workspace

  instance_type = {
    dev     = "t3.small"
    staging = "t3.medium"
    prod    = "m5.large"
  }
}

resource "aws_instance" "app" {
  instance_type = local.instance_type[local.env]

  tags = {
    Environment = local.env
  }
}

When workspaces work

Same infrastructure, different sizes per environment
Small teams where everyone understands the workspace model
Ephemeral environments for feature branches

When workspaces fail

Different environments need fundamentally different resources
Teams larger than ~10 engineers (workspace confusion is real)
When you need different backend configs per environment

For most production setups, I prefer directory-based separation over workspaces:

environments/
├── dev/
│   ├── backend.tf    # key = "dev/app/terraform.tfstate"
│   ├── main.tf
│   └── terraform.tfvars
├── staging/
│   ├── backend.tf    # key = "staging/app/terraform.tfstate"
│   ├── main.tf
│   └── terraform.tfvars
└── prod/
    ├── backend.tf    # key = "prod/app/terraform.tfstate"
    ├── main.tf
    └── terraform.tfvars

Explicit. Visible. No hidden terraform.workspace magic.

State Surgery: The Emergency Toolkit

Sometimes state gets out of sync with reality. These commands are your scalpel.

# View everything in state
terraform state list

# See details of a specific resource
terraform state show aws_s3_bucket.data

# Remove a resource from state WITHOUT destroying it
# Use this when you want Terraform to "forget" a resource
terraform state rm aws_s3_bucket.legacy

# Move a resource to a new address (after refactoring)
terraform state mv aws_instance.old aws_instance.new

# Move a resource into a module
terraform state mv aws_vpc.main module.networking.aws_vpc.this

# Import an existing resource into state
terraform import aws_s3_bucket.existing my-bucket-name

The `moved` Block (Terraform 1.1+)

Instead of manual state mv commands, declare moves in code:

moved {
  from = aws_instance.app
  to   = module.compute.aws_instance.app
}

This is refactoring as code. It goes through plan/apply, it's reviewable in a PR, and it's self-documenting. Always prefer moved blocks over manual state surgery.

Recovering from Disaster

S3 versioning is your safety net. If state gets corrupted:

# List state file versions
aws s3api list-object-versions \
  --bucket acme-terraform-state \
  --prefix networking/vpc/terraform.tfstate

# Download a previous version
aws s3api get-object \
  --bucket acme-terraform-state \
  --key networking/vpc/terraform.tfstate \
  --version-id "abc123" \
  recovered.tfstate

# Push the recovered state
terraform state push recovered.tfstate

This is why versioning on the state bucket is non-negotiable.

CI/CD Pipeline for State Operations

Never run terraform apply from a laptop in production. Use a CI pipeline with proper access controls.

# .github/workflows/terraform.yml
name: Terraform
on:
  push:
    branches: [main]
    paths: ['infrastructure/**']
  pull_request:
    paths: ['infrastructure/**']

permissions:
  contents: read
  id-token: write
  pull-requests: write

jobs:
  plan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: "1.8.0"

      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/terraform-plan
          aws-region: us-east-1

      - name: Terraform Init
        run: terraform init
        working-directory: infrastructure/networking

      - name: Terraform Plan
        id: plan
        run: terraform plan -no-color -out=tfplan
        working-directory: infrastructure/networking

      - name: Comment PR with Plan
        if: github.event_name == 'pull_request'
        uses: actions/github-script@v7
        with:
          script: |
            const output = `#### Terraform Plan
            \`\`\`
            ${{ steps.plan.outputs.stdout }}
            \`\`\`
            *Pushed by: @${{ github.actor }}*`;
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: output
            })

  apply:
    needs: plan
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    environment: production
    steps:
      - uses: actions/checkout@v4

      - uses: hashicorp/setup-terraform@v3

      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/terraform-apply
          aws-region: us-east-1

      - run: terraform init && terraform apply -auto-approve
        working-directory: infrastructure/networking

Two IAM roles: terraform-plan has read-only access, terraform-apply has write access. The plan role is used for PRs. The apply role is locked behind a GitHub environment with required reviewers.

State File Security

Your state file contains sensitive data — database passwords, API keys, resource ARNs. Treat it accordingly.

IAM Policy for State Access

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowStateBucketAccess",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": "arn:aws:s3:::acme-terraform-state/*",
      "Condition": {
        "StringEquals": {
          "aws:PrincipalTag/Team": "${s3:prefix}"
        }
      }
    },
    {
      "Sid": "AllowLockTable",
      "Effect": "Allow",
      "Action": [
        "dynamodb:PutItem",
        "dynamodb:GetItem",
        "dynamodb:DeleteItem"
      ],
      "Resource": "arn:aws:dynamodb:us-east-1:123456789012:table/acme-terraform-locks"
    }
  ]
}

Tag-based access control: the networking team can only access state files under the networking/ prefix. The payments team can only access payments/. No one accidentally destroys another team's infrastructure.

Detecting State Drift

State drift happens when someone modifies infrastructure outside of Terraform. Detect it early.

#!/bin/bash
# drift-detection.sh — run on a schedule

MODULES=("networking/vpc" "compute/eks" "data/rds-primary")

for module in "${MODULES[@]}"; do
  echo "Checking drift for: $module"
  cd "infrastructure/$module"
  terraform init -input=false > /dev/null 2>&1
  PLAN_OUTPUT=$(terraform plan -detailed-exitcode -no-color 2>&1)
  EXIT_CODE=$?

  if [ $EXIT_CODE -eq 2 ]; then
    echo "DRIFT DETECTED in $module"
    # Send alert
    curl -X POST "$SLACK_WEBHOOK" \
      -H 'Content-Type: application/json' \
      -d "{\"text\":\"Terraform drift detected in \`$module\`. Run \`terraform plan\` to review.\"}"
  elif [ $EXIT_CODE -eq 0 ]; then
    echo "No drift in $module"
  else
    echo "ERROR checking $module"
  fi
  cd -
done

Schedule this daily. terraform plan -detailed-exitcode returns exit code 2 when there are changes, making it scriptable. Catching drift early prevents the "someone changed this in the console and now my plan shows 47 changes" nightmare.

Common Pitfalls

Pitfall 1: Storing sensitive outputs in state. Terraform stores all outputs in state as plaintext. If you output a database password, it's readable by anyone with state access. Use sensitive = true on outputs to prevent them from showing in logs, but know they're still in the state file.

output "db_password" {
  value     = random_password.db.result
  sensitive = true
}

Pitfall 2: Running terraform state rm instead of moved blocks. Manual state operations are one-shot and unauditable. moved blocks are code-reviewed, reversible, and self-documenting. Always prefer moved blocks.

Pitfall 3: Migrating state without verifying. After terraform init -migrate-state, always run terraform plan to confirm zero changes. If the plan shows changes, the migration went wrong.

Pitfall 4: Sharing state across modules. One module's state file should never be writable by another module's pipeline. Use terraform_remote_state data sources for read-only cross-module references.

Conclusion

Remote state is not optional — it's the foundation of collaborative IaC. Set up S3 with KMS encryption, enable versioning, add DynamoDB locking, and organize your state keys by domain. Use workspaces only when they genuinely simplify your setup, and keep moved blocks and state mv in your back pocket for when refactoring day comes. Run drift detection on a schedule, lock state access with IAM policies, and run terraform apply only from CI. Your state file is your infrastructure's memory. Protect it like production data, because that's exactly what it is.

On this page

Terraform Remote State: S3 Backends, Locking, Workspaces, and State Surgery

State Is the Source of Truth. Treat It That Way.

Setting Up the S3 Backend

Bootstrap Module

Configuring the Backend

State Locking: Why DynamoDB Matters

State Key Strategy

Workspaces: When They Work and When They Don't

When workspaces work

When workspaces fail

State Surgery: The Emergency Toolkit

The `moved` Block (Terraform 1.1+)

Recovering from Disaster

CI/CD Pipeline for State Operations

State File Security

IAM Policy for State Access

Detecting State Drift

Common Pitfalls

Conclusion

Related Articles

Terraform from Zero to Production: Project Structure, Modules, State, and CI/CD

Fix Terraform 'Error Acquiring the State Lock' with DynamoDB

Terraform CLI: Cheat Sheet

Testing Terraform with Terratest: A Practical Guide

Terraform Module Design Patterns for Large Teams

Writing Reusable Terraform Modules: A Practical Guide

More in Terraform

Terraform "Error: No Valid Credential Sources Found" On AWS — Fixing Authentication And Provider Configuration Issues

Terraform "Error: Cycle" In Resource Dependencies — How To Detect And Break Circular References

Terraform Dynamic Blocks For Managing Variable Infrastructure Resources

Terraform Error: Invalid For_each Argument Must Be A Map Or Set Of Strings - Complete Fix Guide

Discussion

Related Articles

Terraform from Zero to Production: Project Structure, Modules, State, and CI/CD

Testing Terraform with Terratest: A Practical Guide

Previous Chapter
Module Design Patterns

Next →
Container Supply Chain Security With Sigstore and Cosign

Fix Terraform 'Error Acquiring the State Lock' with DynamoDB
Zara Blackwood·Mar 30, 2026
3 min read

Writing Reusable Terraform Modules: A Practical Guide
Nabeel Hassan·Mar 29, 2026
7 min read

Terraform "Error: No Valid Credential Sources Found" On AWS — Fixing Authentication And Provider Configuration Issues
Majid Iqbal Nayyar·Apr 24, 2026
7 min read

Terraform "Error: Cycle" In Resource Dependencies — How To Detect And Break Circular References
Sarah Chen·Apr 21, 2026
8 min read

Terraform Dynamic Blocks For Managing Variable Infrastructure Resources
Riku Tanaka·Apr 11, 2026
9 min read

Terraform Error: Invalid For_each Argument Must Be A Map Or Set Of Strings - Complete Fix Guide
Majid Iqbal Nayyar·Apr 2, 2026
4 min read