DevOpsil
Ansible
91%
Fresh

Ansible Dynamic Inventory: Automating Cloud Infrastructure

Dev PatelDev Patel21 min read

Static vs Dynamic Inventory

A static inventory file works well when your infrastructure is stable: a handful of servers that rarely change, provisioned by hand and maintained for months or years. But in cloud environments, instances are ephemeral by design. Auto Scaling groups spin up and terminate EC2 instances based on demand. Kubernetes nodes come and go. Spot instances appear for hours and vanish. Blue-green deployments create entire fleets and destroy them. In this world, maintaining a static inventory is a losing battle.

You either automate the inventory file updates (fragile, prone to race conditions, and guaranteed to drift) or accept that your inventory is always stale (dangerous, because you might miss new servers or target terminated ones). Dynamic inventory solves this by querying your cloud provider's API at runtime. Every time you run a playbook, Ansible discovers the current state of your infrastructure. No manual updates, no drift, no stale hosts.

AspectStatic InventoryDynamic Inventory
Source of truthText file in repoCloud provider API
Update frequencyManual editsEvery playbook run (or cached)
ScaleTens of serversThousands of instances
AccuracyDrifts over timeAlways current
Setup effortMinimalRequires credentials and plugin config
Cloud integrationNoneNative tags, regions, metadata
Multi-cloudManual aggregationAutomatic with multiple plugins
CostFreeAPI calls (usually negligible)

Inventory Plugins vs Legacy Scripts

Ansible supports two approaches to dynamic inventory:

Legacy inventory scripts are standalone executables that output JSON when called with --list or --host. You still see these in older tutorials and some vendor documentation, but they are deprecated in favor of plugins and should not be used for new projects.

Inventory plugins (the modern approach) are configured with YAML files and integrated directly into Ansible's inventory system. They support caching, composable variables, filtering, constructed groups, and templated host variables. Always use plugins for new projects.

FeatureLegacy ScriptsInventory Plugins
ConfigurationCommand-line args or env varsYAML configuration file
CachingMust implement yourselfBuilt-in cache support
Composed groupsNot supportedkeyed_groups and groups directives
Composed variablesNot supportedcompose directive
FilteringMust implement yourselfBuilt-in filter support
MaintenanceDeprecatedActively maintained
TestingDifficultCan use ansible-inventory commands

To see available inventory plugins:

ansible-doc -t inventory -l

To view documentation for a specific plugin:

ansible-doc -t inventory amazon.aws.aws_ec2
ansible-doc -t inventory azure.azcollection.azure_rm
ansible-doc -t inventory google.cloud.gcp_compute

AWS EC2 Inventory Plugin

Prerequisites

Install the AWS collection and Python dependencies:

ansible-galaxy collection install amazon.aws
pip install boto3 botocore

Configure AWS credentials. The plugin supports multiple authentication methods, evaluated in this order:

  1. Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
  2. AWS credentials file (~/.aws/credentials)
  3. AWS config file (~/.aws/config)
  4. IAM instance profile (when running on EC2)
  5. IAM role assumed via STS

For local development, use the AWS CLI to configure credentials:

aws configure
# or for named profiles:
aws configure --profile ansible

For CI/CD, use environment variables or IAM roles.

Minimal IAM Policy

The EC2 inventory plugin needs read access to EC2. Here is a minimal IAM policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeInstances",
        "ec2:DescribeRegions",
        "ec2:DescribeTags",
        "ec2:DescribeInstanceStatus"
      ],
      "Resource": "*"
    }
  ]
}

If you also use RDS, ElastiCache, or other services, add their Describe permissions as well.

Configuration

Create a file that ends in aws_ec2.yml or aws_ec2.yaml (the suffix is required for Ansible to recognize it as an EC2 inventory source):

# inventory/aws_ec2.yml
plugin: amazon.aws.aws_ec2

# AWS profile (optional, uses default if not specified)
# profile: ansible

regions:
  - us-east-1
  - us-west-2
  - eu-west-1

# Only include instances matching these filters
filters:
  instance-state-name: running
  "tag:ManagedBy": ansible
  "tag:Environment":
    - production
    - staging

# Create groups from instance attributes
keyed_groups:
  # Group by AWS region
  - key: placement.region
    prefix: region
    separator: "_"

  # Group by availability zone
  - key: placement.availability_zone
    prefix: az
    separator: "_"

  # Group by instance type
  - key: instance_type
    prefix: type
    separator: "_"

  # Group by the "Role" tag
  - key: tags.Role
    prefix: role
    separator: "_"

  # Group by the "Environment" tag
  - key: tags.Environment
    prefix: env
    separator: "_"

  # Group by the "Team" tag
  - key: tags.Team | default('unassigned')
    prefix: team
    separator: "_"

  # Group by VPC ID
  - key: vpc_id
    prefix: vpc
    separator: "_"

  # Conditional grouping based on instance size
  - key: "'large' if instance_type.split('.')[1] in ['large', 'xlarge', '2xlarge', '4xlarge'] else 'small'"
    prefix: size
    separator: "_"

# Create host variables from instance attributes
compose:
  # Set ansible_host to private IP (for VPN/VPC access)
  ansible_host: private_ip_address

  # Or use public IP for direct SSH access:
  # ansible_host: public_ip_address

  # Set the SSH user based on the OS tag
  ansible_user: "'ubuntu' if tags.get('OS', '') == 'Ubuntu' else 'ec2-user'"

  # Set SSH key based on environment
  ansible_ssh_private_key_file: "'~/.ssh/prod.pem' if tags.get('Environment') == 'production' else '~/.ssh/staging.pem'"

  # Create custom variables from instance metadata
  instance_name: tags.get('Name', instance_id)
  cloud_provider: "'aws'"
  datacenter: placement.availability_zone
  instance_tags: tags

# Hostname preference order
hostnames:
  # Use Name tag as the hostname, fall back to private DNS, then instance ID
  - tag:Name
  - private-dns-name
  - instance-id

# Control how duplicate hostnames are handled
# strict: true means fail if a keyed_groups expression errors
strict: false

# Include only specific instance attributes to reduce memory
# include_extra_api_calls: true  # Adds EBS and ENI details

Testing the Inventory

# List all discovered hosts as JSON
ansible-inventory -i inventory/aws_ec2.yml --list

# Show the inventory as a tree graph
ansible-inventory -i inventory/aws_ec2.yml --graph

# Show graph with all variables
ansible-inventory -i inventory/aws_ec2.yml --graph --vars

# Show variables for a specific host
ansible-inventory -i inventory/aws_ec2.yml --host web-server-1

# Ping all discovered hosts
ansible -i inventory/aws_ec2.yml all -m ping

# Ping only production web servers
ansible -i inventory/aws_ec2.yml role_web:&env_production -m ping

Sample --graph output:

@all:
  |--@region_us_east_1:
  |  |--web-server-1
  |  |--web-server-2
  |  |--api-server-1
  |  |--db-primary
  |--@region_us_west_2:
  |  |--web-server-3
  |  |--web-server-4
  |--@role_web:
  |  |--web-server-1
  |  |--web-server-2
  |  |--web-server-3
  |  |--web-server-4
  |--@role_api:
  |  |--api-server-1
  |--@role_database:
  |  |--db-primary
  |--@env_production:
  |  |--web-server-1
  |  |--web-server-2
  |  |--api-server-1
  |  |--db-primary
  |--@env_staging:
  |  |--web-server-3
  |  |--web-server-4
  |--@type_t3_medium:
  |  |--web-server-1
  |  |--web-server-2
  |  |--web-server-3
  |  |--web-server-4
  |--@type_m5_large:
  |  |--api-server-1
  |--@type_r5_2xlarge:
  |  |--db-primary

Advanced keyed_groups and compose Examples

The keyed_groups directive creates Ansible groups from instance attributes. The key is a Jinja2 expression evaluated against each host's variables.

keyed_groups:
  # Group by a comma-separated tag (e.g., tag "Services" = "nginx,redis,monitoring")
  - key: tags.Services.split(',') if tags.Services is defined else []
    prefix: service

  # Group by AMI ID (useful for tracking which image version hosts run)
  - key: image_id
    prefix: ami

  # Boolean grouping: has_public_ip or no_public_ip
  - key: "'has_public_ip' if public_ip_address else 'no_public_ip'"
    prefix: network

  # Group by launch time (month)
  - key: launch_time[:7]
    prefix: launched

The compose directive creates host variables from instance attributes:

compose:
  # Network information
  ansible_host: private_ip_address
  public_ip: public_ip_address | default('')
  subnet_id: subnet_id
  vpc_id: vpc_id

  # Instance metadata
  instance_name: tags.get('Name', instance_id)
  cloud_region: placement.region
  cloud_az: placement.availability_zone
  cloud_provider: "'aws'"

  # Custom connection settings based on instance attributes
  ansible_ssh_private_key_file: "'~/.ssh/' + tags.get('KeyName', 'default') + '.pem'"
  ansible_python_interpreter: "'/usr/bin/python3'"

  # Application-level variables derived from tags
  app_version: tags.get('AppVersion', 'latest')
  deploy_group: tags.get('DeployGroup', 'default')

Azure Inventory Plugin

Prerequisites

ansible-galaxy collection install azure.azcollection
pip install azure-identity azure-mgmt-compute azure-mgmt-network azure-mgmt-resource

Authenticate via one of these methods:

  1. Environment variables (Service Principal)
  2. Azure CLI (az login)
  3. Managed Identity (when running on Azure)
  4. Azure AD Workload Identity

Service Principal Authentication

export AZURE_SUBSCRIPTION_ID="your-subscription-id"
export AZURE_CLIENT_ID="your-client-id"
export AZURE_SECRET="your-client-secret"
export AZURE_TENANT="your-tenant-id"

Configuration

Create a file ending in azure_rm.yml:

# inventory/azure_rm.yml
plugin: azure.azcollection.azure_rm
auth_source: auto

# Include only specific resource groups
include_vm_resource_groups:
  - production-rg
  - staging-rg
  - shared-services-rg

# Use simple hostnames (not fully qualified Azure resource IDs)
plain_host_names: true

# Include VMSS instances
include_vmss_resource_groups:
  - production-rg

keyed_groups:
  # Group by resource group
  - key: resource_group | lower
    prefix: rg

  # Group by Azure location/region
  - key: location
    prefix: location

  # Group by tags
  - key: tags.Role | default('untagged')
    prefix: role

  # Group by OS type
  - key: os_profile.system | default('unknown')
    prefix: os

  # Group by VM size
  - key: virtual_machine_size | default('unknown')
    prefix: vmsize

  # Group by environment tag
  - key: tags.Environment | default('untagged')
    prefix: env

compose:
  # Prefer public IP, fall back to private
  ansible_host: public_ip_address | default(private_ip_address, true)
  ansible_user: "'azureuser'"

  # Custom variables from Azure metadata
  azure_vm_size: virtual_machine_size
  azure_resource_group: resource_group
  azure_location: location
  cloud_provider: "'azure'"

# Only include running VMs
conditional_groups:
  running: powerstate == "running"
  stopped: powerstate == "deallocated"

# Exclude specific VMs
exclude_host_filters:
  - powerstate != 'running'
  - name.startswith('test-')

Test it:

ansible-inventory -i inventory/azure_rm.yml --graph
ansible-inventory -i inventory/azure_rm.yml --list

GCP Inventory Plugin

Prerequisites

ansible-galaxy collection install google.cloud
pip install google-auth requests

Authenticate via:

  1. Application Default Credentials (gcloud auth application-default login)
  2. Service account JSON key file
  3. GCE metadata server (when running on GCP)

Configuration

Create a file ending in gcp.yml:

# inventory/gcp.yml
plugin: google.cloud.gcp_compute

projects:
  - my-gcp-project-id
  - my-other-project-id

zones:
  - us-central1-a
  - us-central1-b
  - us-central1-c
  - us-east1-b
  - us-east1-c
  - europe-west1-b

# Only include running instances
filters:
  - status = RUNNING

# Authentication
auth_kind: application
# Or specify a service account file:
# auth_kind: serviceaccount
# service_account_file: /path/to/service-account.json

keyed_groups:
  # Group by zone
  - key: zone
    prefix: zone

  # Group by machine type (extract just the type name from the full URL)
  - key: machine_type | regex_search('[^/]+$')
    prefix: type

  # Group by network tags
  - key: tags.items | default([])
    prefix: tag

  # Group by labels
  - key: labels.environment | default('unlabeled')
    prefix: env

  # Group by labels (role)
  - key: labels.role | default('unassigned')
    prefix: role

  # Group by project
  - key: project
    prefix: project

  # Group by status
  - key: status | lower
    prefix: status

compose:
  # Use external NAT IP if available, otherwise internal IP
  ansible_host: networkInterfaces[0].accessConfigs[0].natIP | default(networkInterfaces[0].networkIP)
  ansible_user: "'deploy'"

  # Custom variables
  gcp_zone: zone
  gcp_machine_type: machine_type | regex_search('[^/]+$')
  gcp_project: project
  cloud_provider: "'gcp'"
  internal_ip: networkInterfaces[0].networkIP
  external_ip: networkInterfaces[0].accessConfigs[0].natIP | default('')

hostnames:
  # Use instance name
  - name
  # Fall back to internal IP
  - networkInterfaces[0].networkIP

strict: false

Test:

ansible-inventory -i inventory/gcp.yml --graph
ansible -i inventory/gcp.yml all -m ping

Filtering Instances

Effective filtering reduces API calls, keeps your inventory focused, and prevents accidentally targeting the wrong instances. Each cloud plugin supports its own filter syntax.

AWS EC2 Filters

AWS uses the EC2 API filter syntax:

filters:
  # Instance state
  instance-state-name: running

  # Single tag value
  "tag:Environment": production

  # Multiple tag values (OR logic)
  "tag:Environment":
    - production
    - staging

  # Multiple tags (AND logic between different tags)
  "tag:ManagedBy": ansible
  "tag:Environment": production

  # Instance type filter
  instance-type:
    - t3.medium
    - t3.large
    - m5.large
    - m5.xlarge

  # VPC filter
  vpc-id: vpc-0abc123def456

  # Subnet filter
  subnet-id:
    - subnet-0abc123
    - subnet-0def456

  # Security group
  "instance.group-name": web-servers

Azure Filters

Azure uses include_vm_resource_groups, exclude_host_filters, and conditional_groups:

include_vm_resource_groups:
  - production-rg
  - staging-rg

exclude_host_filters:
  - powerstate != 'running'
  - name.startswith('temp-')
  - name.endswith('-dev')

conditional_groups:
  production_running: powerstate == "running" and tags.get('Environment') == 'production'
  needs_patching: tags.get('PatchGroup', '') == 'weekly'

GCP Filters

GCP uses the Compute API filter syntax:

filters:
  - status = RUNNING
  - labels.environment = production
  - labels.managed-by = ansible
  # Note: GCP labels do not support underscores, only hyphens and lowercase

Combining Static and Dynamic Inventory

One of Ansible's most powerful features is the ability to use multiple inventory sources simultaneously. You can combine static hosts (on-premises servers, network devices) with dynamic cloud discovery in a single inventory directory. Ansible merges them transparently.

Directory Structure

inventory/
  01-static.yml           # Static hosts (on-prem servers, network devices)
  02-aws_ec2.yml          # Dynamic AWS hosts
  03-azure_rm.yml         # Dynamic Azure hosts
  04-gcp.yml              # Dynamic GCP hosts
  group_vars/
    all/
      vars.yml            # Variables for all hosts
      vault.yml           # Encrypted secrets for all hosts
    webservers/
      vars.yml
    role_web/
      vars.yml            # Variables for dynamically grouped web servers
    env_production/
      vars.yml
      vault.yml
    dbservers/
      vars.yml
  host_vars/
    bastion.example.com.yml
    legacy-db-01.yml

The static inventory file handles on-premises infrastructure:

# inventory/01-static.yml
all:
  children:
    onprem:
      children:
        onprem_webservers:
          hosts:
            onprem-web-01.example.com:
              ansible_host: 10.100.1.10
            onprem-web-02.example.com:
              ansible_host: 10.100.1.11
        onprem_dbservers:
          hosts:
            onprem-db-01.example.com:
              ansible_host: 10.100.2.10
              ansible_user: dbadmin
        network_devices:
          hosts:
            core-switch-01:
              ansible_host: 10.100.0.1
              ansible_network_os: ios
              ansible_connection: network_cli
            firewall-01:
              ansible_host: 10.100.0.2
              ansible_network_os: asa
              ansible_connection: network_cli
    bastion:
      hosts:
        bastion.example.com:
          ansible_host: 203.0.113.10
          ansible_user: admin

Point Ansible at the directory:

ansible-playbook -i inventory/ site.yml

Ansible processes files in lexicographic order (which is why the numeric prefix is useful). Static variables in group_vars/ and host_vars/ apply to dynamically discovered hosts that match the group or hostname. This is powerful: you can tag an EC2 instance with Role: web, have the dynamic inventory create a role_web group, and define variables for that group in group_vars/role_web/vars.yml.

Cross-Cloud Group Variables

# inventory/group_vars/all/vars.yml
---
ntp_servers:
  - 0.pool.ntp.org
  - 1.pool.ntp.org
dns_servers:
  - 8.8.8.8
  - 8.8.4.4
monitoring_endpoint: https://monitoring.example.com/api/v1
log_aggregator: logs.example.com
# inventory/group_vars/role_web/vars.yml
---
nginx_worker_processes: auto
nginx_worker_connections: 4096
nginx_server_name: "{{ inventory_hostname }}"
app_document_root: /var/www/html
health_check_path: /health
health_check_port: 80
# inventory/group_vars/env_production/vars.yml
---
app_environment: production
log_level: warn
enable_debug: false
backup_enabled: true
backup_schedule: "0 2 * * *"
monitoring_interval: 30

Inventory Caching

Querying cloud APIs on every playbook run is slow, especially if you have thousands of instances across multiple regions. Enable caching to store the inventory locally and refresh it periodically.

File-Based Caching

# inventory/aws_ec2.yml
plugin: amazon.aws.aws_ec2
regions:
  - us-east-1
  - us-west-2

cache: true
cache_plugin: jsonfile
cache_connection: /tmp/ansible_inventory_cache
cache_timeout: 300  # seconds (5 minutes)

Redis-Based Caching (Shared Across Team)

plugin: amazon.aws.aws_ec2
regions:
  - us-east-1

cache: true
cache_plugin: redis
cache_connection: redis://redis.internal.example.com:6379/0
cache_timeout: 600
cache_prefix: ansible_inventory_

Install the Redis Python client:

pip install redis

Cache Management Commands

# Force refresh the cache
ansible-inventory -i inventory/aws_ec2.yml --list --flush-cache

# List with cache (uses cached data if available)
ansible-inventory -i inventory/aws_ec2.yml --list

# Playbook with cache flush
ansible-playbook -i inventory/ site.yml --flush-cache

Cache Plugin Comparison

Cache PluginStorageUse CaseShared
jsonfileLocal filesystemSingle user, simple setupsNo
redisRedis serverShared cache across teamYes
memcachedMemcached serverHigh-performance shared cacheYes
mongodbMongoDBPersistent shared cacheYes
yamlLocal YAML filesHuman-readable cache for debuggingNo

For CI/CD pipelines, disable caching or set a very short timeout. Stale inventory in CI defeats the purpose of dynamic discovery:

# In CI/CD, either disable caching:
cache: false

# Or use a very short timeout:
cache: true
cache_timeout: 60

Constructed Inventory Plugin

The constructed inventory plugin creates groups and variables based on existing inventory data from any source. It acts as a second pass over your inventory, letting you create logical groups from any combination of attributes.

# inventory/05-constructed.yml
plugin: ansible.builtin.constructed
strict: false

# Create groups based on combined attributes
groups:
  # All production web servers regardless of cloud provider
  production_web: >-
    ('role_web' in group_names or 'onprem_webservers' in group_names)
    and
    ('env_production' in group_names or 'production' in group_names)

  # All databases across all providers
  all_databases: >-
    'role_database' in group_names or
    'onprem_dbservers' in group_names or
    'role_db' in group_names

  # Servers that need patching (haven't been updated in 30 days)
  needs_patching: >-
    tags is defined and
    tags.get('LastPatched', '2000-01-01') < (now().strftime('%Y-%m-%d'))

  # Large instances across all clouds
  large_instances: >-
    (instance_type is defined and instance_type.split('.')[1] in ['xlarge', '2xlarge', '4xlarge', '8xlarge'])
    or
    (virtual_machine_size is defined and 'Standard_D' in virtual_machine_size)

keyed_groups:
  # Group by cloud provider (set via compose in each inventory plugin)
  - key: cloud_provider | default('onprem')
    prefix: cloud
    separator: "_"

compose:
  # Unified variables across all cloud providers
  display_name: instance_name | default(inventory_hostname)
  environment: >-
    tags.get('Environment', '') if tags is defined
    else 'unknown'

Writing Custom Inventory Plugins

When you need to pull inventory from an internal CMDB, a custom API, a database, or any non-standard source, write a custom inventory plugin.

Plugin File Structure

# plugins/inventory/custom_cmdb.py
from ansible.plugins.inventory import BaseInventoryPlugin, Constructable, Cacheable
from ansible.errors import AnsibleParserError

DOCUMENTATION = """
    name: custom_cmdb
    plugin_type: inventory
    short_description: Pull inventory from internal CMDB API
    description:
      - Queries the internal CMDB REST API to discover hosts
      - Supports caching for performance
      - Creates groups based on CMDB roles and environments
    options:
      api_url:
        description: CMDB API endpoint URL
        required: true
        type: str
      api_token:
        description: Authentication token for the CMDB API
        required: true
        type: str
        env:
          - name: CMDB_API_TOKEN
      verify_ssl:
        description: Whether to verify SSL certificates
        type: bool
        default: true
      timeout:
        description: API request timeout in seconds
        type: int
        default: 30
"""

EXAMPLES = """
# inventory/cmdb.yml
plugin: custom_cmdb
api_url: https://cmdb.internal.example.com
verify_ssl: true
timeout: 30
"""


class InventoryModule(BaseInventoryPlugin, Constructable, Cacheable):
    NAME = "custom_cmdb"

    def verify_file(self, path):
        """Verify this is a valid inventory source file."""
        if super().verify_file(path):
            return path.endswith(("cmdb.yml", "cmdb.yaml"))
        return False

    def parse(self, inventory, loader, path, cache=True):
        """Parse the inventory source and populate inventory."""
        super().parse(inventory, loader, path, cache)
        self._read_config_data(path)

        api_url = self.get_option("api_url")
        api_token = self.get_option("api_token")
        verify_ssl = self.get_option("verify_ssl")
        timeout = self.get_option("timeout")

        # Check cache first
        cache_key = self.get_cache_key(path)
        use_cache = cache and self.use_cache
        servers = None

        if use_cache:
            try:
                servers = self._cache[cache_key]
            except KeyError:
                pass

        if servers is None:
            servers = self._fetch_servers(api_url, api_token, verify_ssl, timeout)
            if use_cache:
                self._cache[cache_key] = servers

        self._populate_inventory(servers)

    def _fetch_servers(self, api_url, api_token, verify_ssl, timeout):
        """Fetch server list from the CMDB API."""
        import requests

        try:
            response = requests.get(
                f"{api_url}/api/v1/servers",
                headers={
                    "Authorization": f"Bearer {api_token}",
                    "Accept": "application/json",
                },
                verify=verify_ssl,
                timeout=timeout,
            )
            response.raise_for_status()
            return response.json()
        except requests.RequestException as e:
            raise AnsibleParserError(f"Failed to fetch from CMDB: {e}")

    def _populate_inventory(self, servers):
        """Populate the Ansible inventory from CMDB data."""
        for server in servers:
            hostname = server["hostname"]
            self.inventory.add_host(hostname)

            # Set connection variables
            self.inventory.set_variable(hostname, "ansible_host", server["ip_address"])
            self.inventory.set_variable(
                hostname, "ansible_user", server.get("ssh_user", "deploy")
            )
            self.inventory.set_variable(
                hostname, "ansible_port", server.get("ssh_port", 22)
            )

            # Set custom variables
            for key, value in server.get("metadata", {}).items():
                self.inventory.set_variable(hostname, f"cmdb_{key}", value)

            # Group by role
            role = server.get("role", "ungrouped")
            self.inventory.add_group(role)
            self.inventory.add_host(hostname, group=role)

            # Group by environment
            env = server.get("environment", "unknown")
            self.inventory.add_group(env)
            self.inventory.add_host(hostname, group=env)

            # Group by datacenter/location
            dc = server.get("datacenter", "unknown")
            self.inventory.add_group(f"dc_{dc}")
            self.inventory.add_host(hostname, group=f"dc_{dc}")

            # Group by OS
            os_name = server.get("os", "unknown")
            self.inventory.add_group(f"os_{os_name}")
            self.inventory.add_host(hostname, group=f"os_{os_name}")

Enable it in ansible.cfg:

[defaults]
inventory_plugins = plugins/inventory

[inventory]
enable_plugins = custom_cmdb, amazon.aws.aws_ec2, azure.azcollection.azure_rm, host_list, auto

Create the inventory source file:

# inventory/cmdb.yml
plugin: custom_cmdb
api_url: https://cmdb.internal.example.com
verify_ssl: true
timeout: 30
# api_token is read from CMDB_API_TOKEN environment variable

Integrating Dynamic Inventory with CI/CD

GitHub Actions

# .github/workflows/ansible-deploy.yml
name: Ansible Deploy
on:
  push:
    branches: [main]
  workflow_dispatch:
    inputs:
      target_group:
        description: "Inventory group to target"
        required: true
        default: "role_web"

jobs:
  deploy:
    runs-on: ubuntu-latest
    permissions:
      id-token: write
      contents: read
    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/ansible-deploy
          aws-region: us-east-1

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"

      - name: Install dependencies
        run: |
          pip install ansible boto3
          ansible-galaxy collection install amazon.aws
          ansible-galaxy install -r requirements.yml

      - name: Verify inventory discovery
        run: |
          ansible-inventory -i inventory/ --graph
          echo "---"
          ansible-inventory -i inventory/ --list | python3 -c "
          import sys, json
          data = json.load(sys.stdin)
          hosts = data.get('_meta', {}).get('hostvars', {})
          print(f'Discovered {len(hosts)} hosts')
          "

      - name: Run playbook
        env:
          ANSIBLE_VAULT_PASSWORD: ${{ secrets.VAULT_PASSWORD }}
          ANSIBLE_HOST_KEY_CHECKING: "False"
        run: |
          echo "$ANSIBLE_VAULT_PASSWORD" > .vault_pass
          chmod 600 .vault_pass
          ansible-playbook -i inventory/ site.yml \
            --vault-password-file .vault_pass \
            --limit "${{ inputs.target_group || 'role_web' }}"
          rm -f .vault_pass

Terraform Integration

When using Terraform to provision infrastructure, you can generate Ansible inventory from Terraform state. However, the better approach with dynamic inventory is to simply tag your Terraform-managed resources:

resource "aws_instance" "web" {
  count         = 3
  ami           = "ami-0abcdef1234567890"
  instance_type = "t3.medium"
  subnet_id     = aws_subnet.private[count.index].id

  tags = {
    Name        = "web-server-${count.index + 1}"
    Environment = var.environment
    Role        = "web"
    ManagedBy   = "ansible"
    Team        = "platform"
    Project     = "main-app"
  }
}

The dynamic inventory plugin discovers these instances automatically based on the tags. No inventory file generation needed.

Practical Example: Multi-Cloud Web Server Fleet

Here is a complete, end-to-end example managing web servers across AWS and Azure with dynamic discovery, group variables, and a production playbook.

Inventory Directory

inventory/
  01-static.yml
  02-aws_ec2.yml
  03-azure_rm.yml
  05-constructed.yml
  group_vars/
    all/
      vars.yml
    production_web/
      vars.yml
      vault.yml

Constructed Groups

# inventory/05-constructed.yml
plugin: ansible.builtin.constructed
strict: false

groups:
  production_web: >-
    ('role_web' in group_names or 'role_webserver' in group_names)
    and 'env_production' in group_names

Group Variables for All Production Web Servers

# inventory/group_vars/production_web/vars.yml
---
nginx_port: 80
nginx_ssl_port: 443
nginx_worker_processes: auto
nginx_worker_connections: 4096
nginx_server_name: "{{ inventory_hostname }}"
app_document_root: /var/www/html
app_version: "{{ lookup('env', 'APP_VERSION') | default('latest', true) }}"
monitoring_enabled: true
log_level: warn

The Playbook

# configure-webservers.yml
---
- name: Configure web servers discovered from cloud providers
  hosts: production_web
  become: true
  serial: "30%"
  max_fail_percentage: 10

  pre_tasks:
    - name: Display discovered hosts
      debug:
        msg: >
          Configuring {{ inventory_hostname }}
          ({{ ansible_host }})
          from {{ cloud_provider | default('unknown') }}
          in {{ datacenter | default('unknown') }}

    - name: Wait for SSH to be available
      wait_for_connection:
        timeout: 120
        sleep: 5

    - name: Gather facts
      setup:

    - name: Validate host is in expected state
      assert:
        that:
          - ansible_distribution in ['Ubuntu', 'Debian']
          - ansible_memtotal_mb >= 1024
        fail_msg: "Host does not meet minimum requirements"

  tasks:
    - name: Update apt cache
      apt:
        update_cache: true
        cache_valid_time: 3600

    - name: Install required packages
      apt:
        name:
          - nginx
          - curl
          - jq
          - unzip
        state: present

    - name: Deploy Nginx configuration
      template:
        src: templates/nginx.conf.j2
        dest: /etc/nginx/sites-available/default
        owner: root
        group: root
        mode: "0644"
        validate: "nginx -t -c %s"
      notify: Reload Nginx

    - name: Create document root
      file:
        path: "{{ app_document_root }}"
        state: directory
        owner: www-data
        group: www-data
        mode: "0755"

    - name: Ensure Nginx is running
      service:
        name: nginx
        state: started
        enabled: true

    - name: Verify Nginx is responding
      uri:
        url: "http://localhost:{{ nginx_port }}/health"
        status_code: 200
      register: health_check
      retries: 3
      delay: 5
      until: health_check.status == 200

  handlers:
    - name: Reload Nginx
      service:
        name: nginx
        state: reloaded

Running It

# Preview which hosts will be targeted
ansible-inventory -i inventory/ --graph production_web

# Dry run
ansible-playbook -i inventory/ configure-webservers.yml --check --diff

# Apply
ansible-playbook -i inventory/ configure-webservers.yml

# Apply with fresh inventory (no cache)
ansible-playbook -i inventory/ configure-webservers.yml --flush-cache

# Limit to a specific cloud or region
ansible-playbook -i inventory/ configure-webservers.yml --limit cloud_aws
ansible-playbook -i inventory/ configure-webservers.yml --limit region_us_east_1

# Limit to a specific host for debugging
ansible-playbook -i inventory/ configure-webservers.yml --limit web-server-1 -vvv

Rolling Deployments

The serial and max_fail_percentage parameters in the playbook enable safe rolling deployments across your fleet:

ParameterValueEffect
serial: "30%"30% of hosts at a timeDeploy to roughly a third of hosts, then the next third
serial: 55 hosts at a timeFixed batch size
serial: [1, 5, "50%"]Escalating1 host first, then 5, then half the remaining
max_fail_percentage: 1010%Abort if more than 10% of hosts fail

This ensures that a bad deployment does not take down your entire fleet simultaneously. If the first batch fails, Ansible stops before touching the rest.

Troubleshooting Dynamic Inventory

Common Issues and Solutions

"Unable to parse" error -- The inventory file suffix is wrong. AWS needs aws_ec2.yml, Azure needs azure_rm.yml, GCP needs gcp.yml.

No hosts discovered -- Check filters and credentials:

# Verbose inventory listing
ansible-inventory -i inventory/aws_ec2.yml --list -vvv

# Check if boto3 can connect
python3 -c "import boto3; print(boto3.client('ec2', region_name='us-east-1').describe_instances()['Reservations'])"

Slow inventory runs -- Enable caching or reduce the number of regions. Each region requires a separate API call.

Hosts in wrong groups -- Debug keyed_groups by examining host variables:

ansible-inventory -i inventory/aws_ec2.yml --host web-server-1

Permission denied -- Verify your IAM policy includes the necessary Describe permissions for the services you are inventorying.

Duplicate hostnames -- Multiple instances with the same Name tag cause collisions. Use hostnames with a fallback:

hostnames:
  - tag:Name
  - instance-id  # guaranteed unique

When a new instance is launched with the appropriate tags, the next playbook run discovers it automatically and applies your configuration. No inventory file to update, no manual steps, no drift. That is the power of dynamic inventory.

Share:
Dev Patel
Dev Patel

Cloud Cost Optimization Specialist

I find the money your cloud is wasting. FinOps practitioner, data-driven analyst, and the person your CFO wishes they'd hired sooner. Every dollar saved is a dollar earned.

Related Articles