OPA Gatekeeper: Enforcing Kubernetes Admission Control Policies That Actually Stop Misconfigurations
Your Cluster Is Only as Secure as What It Admits
In 2022, an engineer at a well-known SaaS company deployed a container running as root with hostPID: true into a production Kubernetes cluster. The deployment passed code review. It passed CI. It passed the "looks fine to me" test. Within 48 hours, an attacker who had compromised a different pod in the same namespace used that privileged container to escape to the host and pivot across the cluster.
The misconfiguration wasn't malicious. It was a leftover from local debugging. But without admission controls, the cluster accepted it without question.
This is the fundamental problem. Kubernetes will happily schedule anything you give it. It doesn't care if your pods are running as root, pulling images from untrusted registries, or missing resource limits. The API server is permissive by default. If you're not enforcing policies at admission time, you're relying on humans to never make mistakes. And humans always make mistakes.
OPA Gatekeeper gives you a programmable, auditable policy layer that sits between kubectl apply and your cluster. Here's how to set it up so bad configurations never land in production.
How Gatekeeper Works
Gatekeeper is a Kubernetes-native admission controller built on the Open Policy Agent (OPA) engine. It intercepts API requests through a ValidatingAdmissionWebhook and evaluates them against policies you define.
The architecture has two key concepts:
| Component | Purpose |
|---|---|
| ConstraintTemplate | Defines the policy logic in Rego (OPA's policy language) |
| Constraint | Instantiates a template with specific parameters for enforcement |
This separation is deliberate. Platform teams write the templates. Application teams see the constraints. You get reusable policy logic without requiring every developer to learn Rego.
Installing Gatekeeper
Deploy Gatekeeper using the official Helm chart:
helm repo add gatekeeper https://open-policy-agent.github.io/gatekeeper/charts
helm repo update
helm install gatekeeper gatekeeper/gatekeeper \
--namespace gatekeeper-system \
--create-namespace \
--set replicas=3 \
--set audit.replicas=2 \
--set audit.interval=60
Verify the webhook is registered:
kubectl get validatingwebhookconfigurations | grep gatekeeper
You should see gatekeeper-validating-webhook-configuration in the output. At this point, Gatekeeper is running but not enforcing anything — you need to define policies.
Policy 1: Block Privileged Containers
This is the single most impactful policy you can deploy. Privileged containers have unrestricted access to the host kernel, making container escapes trivial. CVE-2022-0185 demonstrated this — a heap overflow in the filesystem context API allowed privilege escalation from a container with CAP_SYS_ADMIN, which privileged containers have by default.
Create the ConstraintTemplate:
# templates/k8s-block-privileged.yaml
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: k8sblockprivileged
spec:
crd:
spec:
names:
kind: K8sBlockPrivileged
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8sblockprivileged
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
container.securityContext.privileged == true
msg := sprintf("Privileged container not allowed: %v", [container.name])
}
violation[{"msg": msg}] {
container := input.review.object.spec.initContainers[_]
container.securityContext.privileged == true
msg := sprintf("Privileged init container not allowed: %v", [container.name])
}
Now create the Constraint to enforce it:
# constraints/block-privileged.yaml
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sBlockPrivileged
metadata:
name: deny-privileged-containers
spec:
enforcementAction: deny
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
excludedNamespaces:
- kube-system
- gatekeeper-system
Apply both and test:
kubectl apply -f templates/k8s-block-privileged.yaml
kubectl apply -f constraints/block-privileged.yaml
# This should be rejected
kubectl run test-priv --image=nginx --overrides='{
"spec": {
"containers": [{
"name": "test-priv",
"image": "nginx",
"securityContext": {"privileged": true}
}]
}
}'
# Error: admission webhook "validation.gatekeeper.sh" denied the request:
# Privileged container not allowed: test-priv
Policy 2: Enforce Image Registry Restrictions
You should never allow pods to pull images from arbitrary registries. The 2024 Docker Hub typosquatting campaigns planted malicious images with names nearly identical to popular base images. If your developers can pull from anywhere, one typo in a Dockerfile is a compromise.
# templates/k8s-allowed-registries.yaml
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: k8sallowedregistries
spec:
crd:
spec:
names:
kind: K8sAllowedRegistries
validation:
openAPIV3Schema:
type: object
properties:
registries:
type: array
items:
type: string
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8sallowedregistries
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
not registry_allowed(container.image)
msg := sprintf(
"Image '%v' is from a disallowed registry. Allowed: %v",
[container.image, input.parameters.registries]
)
}
registry_allowed(image) {
registry := input.parameters.registries[_]
startswith(image, registry)
}
# constraints/allowed-registries.yaml
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sAllowedRegistries
metadata:
name: restrict-image-registries
spec:
enforcementAction: deny
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
excludedNamespaces:
- kube-system
parameters:
registries:
- "gcr.io/your-project/"
- "us-docker.pkg.dev/your-project/"
- "ghcr.io/your-org/"
Policy 3: Require Resource Limits
Missing resource limits aren't just a reliability issue — they're a security issue. Without limits, a compromised container can consume all node resources, causing denial of service across every pod on that node.
# templates/k8s-require-resource-limits.yaml
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: k8srequireresourcelimits
spec:
crd:
spec:
names:
kind: K8sRequireResourceLimits
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8srequireresourcelimits
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
not container.resources.limits.cpu
msg := sprintf("Container '%v' missing CPU limit", [container.name])
}
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
not container.resources.limits.memory
msg := sprintf("Container '%v' missing memory limit", [container.name])
}
Rolling Out Policies Safely: Dry Run First
Never deploy Gatekeeper policies in deny mode on day one. You will break things. Start with dryrun to audit violations without blocking deployments:
spec:
enforcementAction: dryrun
Then check what would have been blocked:
kubectl get k8sblockprivileged deny-privileged-containers \
-o json | jq '.status.violations'
This gives you a list of every existing resource that violates the policy. Fix those first, then switch to deny. A staged rollout looks like this:
- Week 1: Deploy in
dryrunmode. Collect violations. - Week 2: Remediate existing violations. Notify teams.
- Week 3: Switch to
warnmode (users see warnings but deployments proceed). - Week 4: Switch to
denymode. Violations are blocked.
Audit Existing Resources
Gatekeeper doesn't just check new admissions — it periodically audits existing resources against your constraints. Configure the audit interval:
helm upgrade gatekeeper gatekeeper/gatekeeper \
--namespace gatekeeper-system \
--set audit.interval=30 \
--set audit.logLevel=INFO
Export all current violations across all constraints:
for constraint_kind in $(kubectl get constrainttemplates -o jsonpath='{.items[*].metadata.name}'); do
echo "=== $constraint_kind ==="
kubectl get "$constraint_kind" -o json | \
jq -r '.items[].status.violations[]? | "\(.kind)/\(.namespace)/\(.name): \(.message)"'
done
Integrating With CI/CD: Shift Left
Catching policy violations at admission time is good. Catching them in the pull request is better. Use gator (Gatekeeper's CLI tool) to test policies against manifests before they ever reach the cluster:
# Install gator
go install github.com/open-policy-agent/gatekeeper/v3/cmd/gator@latest
# Test manifests against your policies
gator verify ./policy-tests/
# Or test specific manifests against specific constraints
gator test --filename=templates/ --filename=constraints/ --filename=manifests/
Add this to your CI pipeline:
# .github/workflows/policy-check.yml
name: OPA Gatekeeper Policy Check
on:
pull_request:
paths:
- 'k8s/**'
jobs:
gatekeeper-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install gator
run: |
curl -sL https://github.com/open-policy-agent/gatekeeper/releases/download/v3.16.0/gator-v3.16.0-linux-amd64.tar.gz | \
tar xz -C /usr/local/bin
- name: Run policy tests
run: |
gator test \
--filename=policies/templates/ \
--filename=policies/constraints/ \
--filename=k8s/
Monitoring Gatekeeper Health
Gatekeeper exposes Prometheus metrics. If Gatekeeper goes down, your webhook fails open by default — meaning all requests are admitted without policy checks. Monitor these:
# Prometheus alerting rule
groups:
- name: gatekeeper
rules:
- alert: GatekeeperWebhookDown
expr: up{job="gatekeeper-controller-manager"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Gatekeeper webhook is down — admission policies not enforced"
- alert: GatekeeperAuditViolationsIncreasing
expr: rate(gatekeeper_violations[10m]) > 0.5
for: 5m
labels:
severity: warning
annotations:
summary: "New policy violations detected in existing resources"
What to Enforce on Day One
If you're starting from nothing, deploy these policies in this order:
- Block privileged containers — highest impact, lowest false-positive rate
- Restrict image registries — prevents supply chain drift
- Require resource limits — prevents noisy-neighbor DoS
- Enforce read-only root filesystem — limits post-exploitation persistence
- Require non-root user — reduces container escape surface
Each policy should go through the dryrun-warn-deny cycle. Don't skip steps. The goal isn't to break every deployment on a Monday morning — it's to systematically close the gaps that attackers rely on.
Policy 4: Enforce Read-Only Root Filesystem
A read-only root filesystem prevents attackers from writing malicious binaries, scripts, or configuration files to the container filesystem after compromising a pod. Combined with blocking privileged containers, this significantly reduces post-exploitation options.
# templates/k8s-readonly-rootfs.yaml
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: k8sreadonlyrootfs
spec:
crd:
spec:
names:
kind: K8sReadOnlyRootFs
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8sreadonlyrootfs
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
not container.securityContext.readOnlyRootFilesystem == true
msg := sprintf("Container '%v' must set readOnlyRootFilesystem to true", [container.name])
}
# constraints/readonly-rootfs.yaml
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sReadOnlyRootFs
metadata:
name: require-readonly-rootfs
spec:
enforcementAction: deny
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
excludedNamespaces:
- kube-system
- gatekeeper-system
Most applications need write access to specific paths (temp files, caches, PID files). Use emptyDir volumes for those paths instead of making the entire filesystem writable:
# Application pod with read-only root and writable temp dirs
spec:
containers:
- name: api
image: registry.internal/api:v3.1.0
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
volumeMounts:
- name: tmp
mountPath: /tmp
- name: app-cache
mountPath: /app/cache
volumes:
- name: tmp
emptyDir: {}
- name: app-cache
emptyDir:
sizeLimit: 100Mi
Troubleshooting Gatekeeper Webhook Failures
Gatekeeper runs as a webhook. When it fails, the impact depends on your failure policy. By default, Gatekeeper uses failurePolicy: Ignore, which means if the webhook is unreachable, all requests are admitted without policy checks. This is a silent security gap.
Detecting Webhook Failures
# Check if the webhook is healthy
kubectl get pods -n gatekeeper-system
kubectl logs -n gatekeeper-system -l control-plane=controller-manager --tail=30
# Check for webhook timeout events in the API server
kubectl get events --field-selector reason=FailedAdmission -A
Switching to Fail-Closed Mode
For security-critical clusters, switch the webhook to Fail mode so requests are rejected when Gatekeeper is unreachable. This is safer but requires Gatekeeper to be highly available:
kubectl get validatingwebhookconfigurations gatekeeper-validating-webhook-configuration -o yaml | \
sed 's/failurePolicy: Ignore/failurePolicy: Fail/' | \
kubectl apply -f -
When running in fail-closed mode, you must ensure Gatekeeper has enough replicas and resource headroom to handle admission traffic. A Gatekeeper outage in fail-closed mode blocks all cluster operations — deployments, scaling, even pod restarts. Run at least 3 replicas with proper PodDisruptionBudgets:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: gatekeeper-controller-manager
namespace: gatekeeper-system
spec:
minAvailable: 2
selector:
matchLabels:
control-plane: controller-manager
Exempting Critical System Namespaces
Always exempt kube-system and the Gatekeeper namespace from policies. If Gatekeeper blocks its own pods from being scheduled, you've created an irrecoverable deadlock:
# Verify exemptions are in place
kubectl get config config -n gatekeeper-system -o json | \
jq '.spec.match[].excludedNamespaces'
If you accidentally lock yourself out and Gatekeeper pods can't start, delete the webhook configuration as an emergency escape hatch:
# Emergency: remove Gatekeeper webhook to unblock cluster operations
kubectl delete validatingwebhookconfigurations gatekeeper-validating-webhook-configuration
# Then fix the policy and reinstall
This is a last resort. Document it in your runbook so the on-call engineer doesn't panic when nothing can be deployed.
Assume breach. Then make sure your cluster doesn't let the breach spread.
Related Articles
DevSecOps Lead
Security-first mindset in everything I ship. From zero-trust architectures to supply chain security, I make sure your pipeline doesn't become your weakest link.
Related Articles
Kubernetes Security Hardening for Production: The Complete Guide
Harden Kubernetes clusters for production with RBAC, network policies, pod security standards, secrets management, and admission controllers.
Security Headers & Configs: Cheat Sheet
Security headers and configuration reference — copy-paste snippets for Nginx, Kubernetes Ingress, Cloudflare, and Helmet.js.
Mozilla SOPS: Encrypted Secrets in Git for GitOps Workflows That Don't Leak
Use Mozilla SOPS to encrypt secrets in Git for secure GitOps workflows. Covers AGE, AWS KMS, and ArgoCD integration with real examples.