DevOpsil
Security
85%
Fresh

OPA Gatekeeper: Enforcing Kubernetes Admission Control Policies That Actually Stop Misconfigurations

Amara OkaforAmara Okafor10 min read

Your Cluster Is Only as Secure as What It Admits

In 2022, an engineer at a well-known SaaS company deployed a container running as root with hostPID: true into a production Kubernetes cluster. The deployment passed code review. It passed CI. It passed the "looks fine to me" test. Within 48 hours, an attacker who had compromised a different pod in the same namespace used that privileged container to escape to the host and pivot across the cluster.

The misconfiguration wasn't malicious. It was a leftover from local debugging. But without admission controls, the cluster accepted it without question.

This is the fundamental problem. Kubernetes will happily schedule anything you give it. It doesn't care if your pods are running as root, pulling images from untrusted registries, or missing resource limits. The API server is permissive by default. If you're not enforcing policies at admission time, you're relying on humans to never make mistakes. And humans always make mistakes.

OPA Gatekeeper gives you a programmable, auditable policy layer that sits between kubectl apply and your cluster. Here's how to set it up so bad configurations never land in production.

How Gatekeeper Works

Gatekeeper is a Kubernetes-native admission controller built on the Open Policy Agent (OPA) engine. It intercepts API requests through a ValidatingAdmissionWebhook and evaluates them against policies you define.

The architecture has two key concepts:

ComponentPurpose
ConstraintTemplateDefines the policy logic in Rego (OPA's policy language)
ConstraintInstantiates a template with specific parameters for enforcement

This separation is deliberate. Platform teams write the templates. Application teams see the constraints. You get reusable policy logic without requiring every developer to learn Rego.

Installing Gatekeeper

Deploy Gatekeeper using the official Helm chart:

helm repo add gatekeeper https://open-policy-agent.github.io/gatekeeper/charts
helm repo update

helm install gatekeeper gatekeeper/gatekeeper \
  --namespace gatekeeper-system \
  --create-namespace \
  --set replicas=3 \
  --set audit.replicas=2 \
  --set audit.interval=60

Verify the webhook is registered:

kubectl get validatingwebhookconfigurations | grep gatekeeper

You should see gatekeeper-validating-webhook-configuration in the output. At this point, Gatekeeper is running but not enforcing anything — you need to define policies.

Policy 1: Block Privileged Containers

This is the single most impactful policy you can deploy. Privileged containers have unrestricted access to the host kernel, making container escapes trivial. CVE-2022-0185 demonstrated this — a heap overflow in the filesystem context API allowed privilege escalation from a container with CAP_SYS_ADMIN, which privileged containers have by default.

Create the ConstraintTemplate:

# templates/k8s-block-privileged.yaml
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8sblockprivileged
spec:
  crd:
    spec:
      names:
        kind: K8sBlockPrivileged
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8sblockprivileged

        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          container.securityContext.privileged == true
          msg := sprintf("Privileged container not allowed: %v", [container.name])
        }

        violation[{"msg": msg}] {
          container := input.review.object.spec.initContainers[_]
          container.securityContext.privileged == true
          msg := sprintf("Privileged init container not allowed: %v", [container.name])
        }

Now create the Constraint to enforce it:

# constraints/block-privileged.yaml
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sBlockPrivileged
metadata:
  name: deny-privileged-containers
spec:
  enforcementAction: deny
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
    excludedNamespaces:
      - kube-system
      - gatekeeper-system

Apply both and test:

kubectl apply -f templates/k8s-block-privileged.yaml
kubectl apply -f constraints/block-privileged.yaml

# This should be rejected
kubectl run test-priv --image=nginx --overrides='{
  "spec": {
    "containers": [{
      "name": "test-priv",
      "image": "nginx",
      "securityContext": {"privileged": true}
    }]
  }
}'
# Error: admission webhook "validation.gatekeeper.sh" denied the request:
# Privileged container not allowed: test-priv

Policy 2: Enforce Image Registry Restrictions

You should never allow pods to pull images from arbitrary registries. The 2024 Docker Hub typosquatting campaigns planted malicious images with names nearly identical to popular base images. If your developers can pull from anywhere, one typo in a Dockerfile is a compromise.

# templates/k8s-allowed-registries.yaml
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8sallowedregistries
spec:
  crd:
    spec:
      names:
        kind: K8sAllowedRegistries
      validation:
        openAPIV3Schema:
          type: object
          properties:
            registries:
              type: array
              items:
                type: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8sallowedregistries

        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not registry_allowed(container.image)
          msg := sprintf(
            "Image '%v' is from a disallowed registry. Allowed: %v",
            [container.image, input.parameters.registries]
          )
        }

        registry_allowed(image) {
          registry := input.parameters.registries[_]
          startswith(image, registry)
        }
# constraints/allowed-registries.yaml
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sAllowedRegistries
metadata:
  name: restrict-image-registries
spec:
  enforcementAction: deny
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
    excludedNamespaces:
      - kube-system
  parameters:
    registries:
      - "gcr.io/your-project/"
      - "us-docker.pkg.dev/your-project/"
      - "ghcr.io/your-org/"

Policy 3: Require Resource Limits

Missing resource limits aren't just a reliability issue — they're a security issue. Without limits, a compromised container can consume all node resources, causing denial of service across every pod on that node.

# templates/k8s-require-resource-limits.yaml
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8srequireresourcelimits
spec:
  crd:
    spec:
      names:
        kind: K8sRequireResourceLimits
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8srequireresourcelimits

        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not container.resources.limits.cpu
          msg := sprintf("Container '%v' missing CPU limit", [container.name])
        }

        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not container.resources.limits.memory
          msg := sprintf("Container '%v' missing memory limit", [container.name])
        }

Rolling Out Policies Safely: Dry Run First

Never deploy Gatekeeper policies in deny mode on day one. You will break things. Start with dryrun to audit violations without blocking deployments:

spec:
  enforcementAction: dryrun

Then check what would have been blocked:

kubectl get k8sblockprivileged deny-privileged-containers \
  -o json | jq '.status.violations'

This gives you a list of every existing resource that violates the policy. Fix those first, then switch to deny. A staged rollout looks like this:

  1. Week 1: Deploy in dryrun mode. Collect violations.
  2. Week 2: Remediate existing violations. Notify teams.
  3. Week 3: Switch to warn mode (users see warnings but deployments proceed).
  4. Week 4: Switch to deny mode. Violations are blocked.

Audit Existing Resources

Gatekeeper doesn't just check new admissions — it periodically audits existing resources against your constraints. Configure the audit interval:

helm upgrade gatekeeper gatekeeper/gatekeeper \
  --namespace gatekeeper-system \
  --set audit.interval=30 \
  --set audit.logLevel=INFO

Export all current violations across all constraints:

for constraint_kind in $(kubectl get constrainttemplates -o jsonpath='{.items[*].metadata.name}'); do
  echo "=== $constraint_kind ==="
  kubectl get "$constraint_kind" -o json | \
    jq -r '.items[].status.violations[]? | "\(.kind)/\(.namespace)/\(.name): \(.message)"'
done

Integrating With CI/CD: Shift Left

Catching policy violations at admission time is good. Catching them in the pull request is better. Use gator (Gatekeeper's CLI tool) to test policies against manifests before they ever reach the cluster:

# Install gator
go install github.com/open-policy-agent/gatekeeper/v3/cmd/gator@latest

# Test manifests against your policies
gator verify ./policy-tests/

# Or test specific manifests against specific constraints
gator test --filename=templates/ --filename=constraints/ --filename=manifests/

Add this to your CI pipeline:

# .github/workflows/policy-check.yml
name: OPA Gatekeeper Policy Check
on:
  pull_request:
    paths:
      - 'k8s/**'

jobs:
  gatekeeper-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install gator
        run: |
          curl -sL https://github.com/open-policy-agent/gatekeeper/releases/download/v3.16.0/gator-v3.16.0-linux-amd64.tar.gz | \
            tar xz -C /usr/local/bin

      - name: Run policy tests
        run: |
          gator test \
            --filename=policies/templates/ \
            --filename=policies/constraints/ \
            --filename=k8s/

Monitoring Gatekeeper Health

Gatekeeper exposes Prometheus metrics. If Gatekeeper goes down, your webhook fails open by default — meaning all requests are admitted without policy checks. Monitor these:

# Prometheus alerting rule
groups:
  - name: gatekeeper
    rules:
      - alert: GatekeeperWebhookDown
        expr: up{job="gatekeeper-controller-manager"} == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Gatekeeper webhook is down — admission policies not enforced"

      - alert: GatekeeperAuditViolationsIncreasing
        expr: rate(gatekeeper_violations[10m]) > 0.5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "New policy violations detected in existing resources"

What to Enforce on Day One

If you're starting from nothing, deploy these policies in this order:

  1. Block privileged containers — highest impact, lowest false-positive rate
  2. Restrict image registries — prevents supply chain drift
  3. Require resource limits — prevents noisy-neighbor DoS
  4. Enforce read-only root filesystem — limits post-exploitation persistence
  5. Require non-root user — reduces container escape surface

Each policy should go through the dryrun-warn-deny cycle. Don't skip steps. The goal isn't to break every deployment on a Monday morning — it's to systematically close the gaps that attackers rely on.

Policy 4: Enforce Read-Only Root Filesystem

A read-only root filesystem prevents attackers from writing malicious binaries, scripts, or configuration files to the container filesystem after compromising a pod. Combined with blocking privileged containers, this significantly reduces post-exploitation options.

# templates/k8s-readonly-rootfs.yaml
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8sreadonlyrootfs
spec:
  crd:
    spec:
      names:
        kind: K8sReadOnlyRootFs
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8sreadonlyrootfs

        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not container.securityContext.readOnlyRootFilesystem == true
          msg := sprintf("Container '%v' must set readOnlyRootFilesystem to true", [container.name])
        }
# constraints/readonly-rootfs.yaml
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sReadOnlyRootFs
metadata:
  name: require-readonly-rootfs
spec:
  enforcementAction: deny
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
    excludedNamespaces:
      - kube-system
      - gatekeeper-system

Most applications need write access to specific paths (temp files, caches, PID files). Use emptyDir volumes for those paths instead of making the entire filesystem writable:

# Application pod with read-only root and writable temp dirs
spec:
  containers:
    - name: api
      image: registry.internal/api:v3.1.0
      securityContext:
        readOnlyRootFilesystem: true
        runAsNonRoot: true
      volumeMounts:
        - name: tmp
          mountPath: /tmp
        - name: app-cache
          mountPath: /app/cache
  volumes:
    - name: tmp
      emptyDir: {}
    - name: app-cache
      emptyDir:
        sizeLimit: 100Mi

Troubleshooting Gatekeeper Webhook Failures

Gatekeeper runs as a webhook. When it fails, the impact depends on your failure policy. By default, Gatekeeper uses failurePolicy: Ignore, which means if the webhook is unreachable, all requests are admitted without policy checks. This is a silent security gap.

Detecting Webhook Failures

# Check if the webhook is healthy
kubectl get pods -n gatekeeper-system
kubectl logs -n gatekeeper-system -l control-plane=controller-manager --tail=30

# Check for webhook timeout events in the API server
kubectl get events --field-selector reason=FailedAdmission -A

Switching to Fail-Closed Mode

For security-critical clusters, switch the webhook to Fail mode so requests are rejected when Gatekeeper is unreachable. This is safer but requires Gatekeeper to be highly available:

kubectl get validatingwebhookconfigurations gatekeeper-validating-webhook-configuration -o yaml | \
  sed 's/failurePolicy: Ignore/failurePolicy: Fail/' | \
  kubectl apply -f -

When running in fail-closed mode, you must ensure Gatekeeper has enough replicas and resource headroom to handle admission traffic. A Gatekeeper outage in fail-closed mode blocks all cluster operations — deployments, scaling, even pod restarts. Run at least 3 replicas with proper PodDisruptionBudgets:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: gatekeeper-controller-manager
  namespace: gatekeeper-system
spec:
  minAvailable: 2
  selector:
    matchLabels:
      control-plane: controller-manager

Exempting Critical System Namespaces

Always exempt kube-system and the Gatekeeper namespace from policies. If Gatekeeper blocks its own pods from being scheduled, you've created an irrecoverable deadlock:

# Verify exemptions are in place
kubectl get config config -n gatekeeper-system -o json | \
  jq '.spec.match[].excludedNamespaces'

If you accidentally lock yourself out and Gatekeeper pods can't start, delete the webhook configuration as an emergency escape hatch:

# Emergency: remove Gatekeeper webhook to unblock cluster operations
kubectl delete validatingwebhookconfigurations gatekeeper-validating-webhook-configuration
# Then fix the policy and reinstall

This is a last resort. Document it in your runbook so the on-call engineer doesn't panic when nothing can be deployed.

Assume breach. Then make sure your cluster doesn't let the breach spread.

Share:
Amara Okafor
Amara Okafor

DevSecOps Lead

Security-first mindset in everything I ship. From zero-trust architectures to supply chain security, I make sure your pipeline doesn't become your weakest link.

Related Articles