Kubernetes Security Hardening for Production: The Complete Guide
Your Cluster Is Not Secure by Default
Let me be direct: a default Kubernetes installation is wide open. Pods can talk to any other pod. Service accounts get auto-mounted tokens. Containers run as root. There are no network boundaries, no admission policies, and no runtime security controls.
I've audited Kubernetes clusters at financial institutions, healthcare companies, and SaaS platforms. The pattern is depressingly consistent: teams deploy applications, ship features, and never revisit the security posture. Then something gets compromised, and the blast radius is the entire cluster.
This guide is the hardening checklist I run on every production cluster. It covers five layers of defense: RBAC, network policies, pod security, secrets management, and admission controllers. Each layer reduces the blast radius of a breach. Together, they make your cluster defensible.
Layer 1: RBAC — Control Who Can Do What
Audit Existing Permissions First
Before you lock anything down, understand what exists:
# Find all ClusterRoleBindings granting cluster-admin
kubectl get clusterrolebindings -o json | \
jq -r '.items[] | select(.roleRef.name=="cluster-admin") |
.metadata.name + " -> " +
(.subjects[]? | .kind + "/" + .name)'
# Find all service accounts with cluster-admin
kubectl get clusterrolebindings -o json | \
jq -r '.items[] | select(.roleRef.name=="cluster-admin") |
.subjects[]? | select(.kind=="ServiceAccount") |
.namespace + "/" + .name'
If you see more than 2-3 entries in that second command, you have a problem. Most service accounts don't need cluster-wide admin privileges.
Principle of Least Privilege
Replace broad ClusterRoles with namespace-scoped Roles:
# WRONG: Giving a CI/CD service account cluster-admin
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cicd-admin
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: cicd-deployer
namespace: cicd
---
# RIGHT: Scoped to specific namespace and verbs
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: cicd-deployer
namespace: production
rules:
- apiGroups: ["apps"]
resources: ["deployments", "replicasets"]
verbs: ["get", "list", "watch", "update", "patch"]
- apiGroups: [""]
resources: ["services", "configmaps"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get", "list"] # No create/update — secrets managed separately
- apiGroups: ["networking.k8s.io"]
resources: ["ingresses"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: cicd-deployer
namespace: production
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: cicd-deployer
subjects:
- kind: ServiceAccount
name: cicd-deployer
namespace: cicd
Disable Auto-Mounting of Service Account Tokens
Most pods don't need to talk to the Kubernetes API. Disable the token mount by default:
apiVersion: v1
kind: ServiceAccount
metadata:
name: app-service-account
namespace: production
automountServiceAccountToken: false
For pods that genuinely need API access, enable it explicitly at the pod level and use a projected volume with a short TTL:
apiVersion: v1
kind: Pod
metadata:
name: api-consumer
spec:
serviceAccountName: app-service-account
automountServiceAccountToken: false
containers:
- name: app
image: myapp:latest
volumeMounts:
- name: token
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
readOnly: true
volumes:
- name: token
projected:
sources:
- serviceAccountToken:
expirationSeconds: 3600 # 1 hour, not infinite
audience: api
path: token
Layer 2: Network Policies — Segment Your Traffic
Default Deny Everything
Start from zero trust. Deny all traffic, then allowlist what's needed:
# Apply to every namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
This single manifest changes your security posture dramatically. Nothing can talk to anything unless explicitly permitted.
Allow Specific Communication Patterns
# Allow frontend to talk to backend on port 8080
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-to-backend
namespace: production
spec:
podSelector:
matchLabels:
app: backend
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 8080
---
# Allow backend to talk to database on port 5432
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-backend-to-db
namespace: production
spec:
podSelector:
matchLabels:
app: database
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: backend
ports:
- protocol: TCP
port: 5432
---
# Allow DNS resolution for all pods
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-dns
namespace: production
spec:
podSelector: {}
policyTypes:
- Egress
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
Critical: Don't forget the DNS egress policy. Without it, default-deny breaks DNS resolution and everything stops working. I've seen this take down production environments.
Cross-Namespace Isolation
# Only allow monitoring namespace to scrape metrics
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-prometheus-scrape
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: monitoring
podSelector:
matchLabels:
app: prometheus
ports:
- protocol: TCP
port: 9090
- protocol: TCP
port: 8080 # Application metrics port
Layer 3: Pod Security Standards
Enforce Restricted Security Profile
Kubernetes Pod Security Standards (PSS) replaced PodSecurityPolicies. Enforce them at the namespace level:
# Label namespaces with security enforcement
kubectl label namespace production \
pod-security.kubernetes.io/enforce=restricted \
pod-security.kubernetes.io/warn=restricted \
pod-security.kubernetes.io/audit=restricted
What restricted enforces:
- No privileged containers
- No host networking, PID, or IPC
- No hostPath volumes
- Must run as non-root
- Must drop ALL capabilities
- Read-only root filesystem (when combined with SecurityContext)
Security Context for Every Pod
apiVersion: apps/v1
kind: Deployment
metadata:
name: secure-app
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: secure-app
template:
metadata:
labels:
app: secure-app
spec:
securityContext:
runAsNonRoot: true
runAsUser: 10001
runAsGroup: 10001
fsGroup: 10001
seccompProfile:
type: RuntimeDefault
containers:
- name: app
image: myapp:v1.2.3 # Never use :latest in production
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "500m"
volumeMounts:
- name: tmp
mountPath: /tmp
- name: cache
mountPath: /app/cache
volumes:
- name: tmp
emptyDir:
sizeLimit: 100Mi
- name: cache
emptyDir:
sizeLimit: 200Mi
The readOnlyRootFilesystem with emptyDir volumes for /tmp is a pattern that breaks most malware. If an attacker gains code execution, they can't write to the filesystem outside the ephemeral mounts.
Layer 4: Secrets Management
Never Store Secrets in Kubernetes Secrets (Alone)
Kubernetes Secrets are base64-encoded, not encrypted. Anyone with get secrets permission can read them. Layer your defenses.
Enable Encryption at Rest
# /etc/kubernetes/encryption-config.yaml
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
- secrets
providers:
- aescbc:
keys:
- name: key1
secret: <base64-encoded-32-byte-key>
- identity: {}
External Secrets Operator + AWS Secrets Manager
The better pattern: don't store secrets in Kubernetes at all. Use External Secrets Operator to sync from a proper secrets manager.
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: aws-secrets
namespace: production
spec:
provider:
aws:
service: SecretsManager
region: us-east-1
auth:
jwt:
serviceAccountRef:
name: external-secrets-sa
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: app-secrets
namespace: production
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secrets
kind: SecretStore
target:
name: app-secrets
creationPolicy: Owner
deletionPolicy: Retain
data:
- secretKey: database-url
remoteRef:
key: production/app/database
property: url
- secretKey: api-key
remoteRef:
key: production/app/api-key
IRSA for AWS Access (No Long-Lived Credentials)
apiVersion: v1
kind: ServiceAccount
metadata:
name: app-sa
namespace: production
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/app-production-role
eks.amazonaws.com/audience: sts.amazonaws.com
The IAM role should follow least privilege:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"secretsmanager:GetSecretValue"
],
"Resource": "arn:aws:secretsmanager:us-east-1:123456789012:secret:production/app/*",
"Condition": {
"StringEquals": {
"aws:RequestedRegion": "us-east-1"
}
}
}
]
}
Layer 5: Admission Controllers
Kyverno Policies for Automated Enforcement
Admission controllers intercept every API request before it's persisted. This is where you enforce standards automatically.
# Require resource limits on every container
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-resource-limits
spec:
validationFailureAction: Enforce
background: true
rules:
- name: require-limits
match:
any:
- resources:
kinds: ["Pod"]
validate:
message: "CPU and memory limits are required for all containers."
pattern:
spec:
containers:
- resources:
limits:
memory: "?*"
cpu: "?*"
---
# Block latest tag
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: block-latest-tag
spec:
validationFailureAction: Enforce
background: true
rules:
- name: block-latest
match:
any:
- resources:
kinds: ["Pod"]
validate:
message: "Using ':latest' tag is not allowed. Specify a version tag."
pattern:
spec:
containers:
- image: "!*:latest"
---
# Require labels
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-labels
spec:
validationFailureAction: Enforce
background: true
rules:
- name: require-team-label
match:
any:
- resources:
kinds: ["Deployment", "StatefulSet", "DaemonSet"]
validate:
message: "The label 'team' is required."
pattern:
metadata:
labels:
team: "?*"
---
# Mutate: Add default seccomp profile
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: add-default-seccomp
spec:
rules:
- name: add-seccomp
match:
any:
- resources:
kinds: ["Pod"]
mutate:
patchStrategicMerge:
spec:
securityContext:
seccompProfile:
type: RuntimeDefault
Image Signature Verification
Don't deploy images you can't verify:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: verify-image-signatures
spec:
validationFailureAction: Enforce
webhookTimeoutSeconds: 30
rules:
- name: verify-cosign-signature
match:
any:
- resources:
kinds: ["Pod"]
verifyImages:
- imageReferences:
- "ghcr.io/myorg/*"
attestors:
- entries:
- keys:
publicKeys: |-
-----BEGIN PUBLIC KEY-----
MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE...
-----END PUBLIC KEY-----
Layer 6: API Server Hardening
The Kubernetes API server is the control plane's front door. Harden it to limit what attackers can do even if they gain initial access.
Restrict Anonymous Authentication
# Check if anonymous auth is enabled (it shouldn't be in production)
kubectl auth can-i --list --as=system:anonymous
# If you see anything besides "no", tighten the API server config
# Add to kube-apiserver flags:
# --anonymous-auth=false
Enable API Audit Logging
API audit logs record every request to the API server. They're essential for forensics after an incident:
# audit-policy.yaml
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
# Log all secret access at metadata level
- level: Metadata
resources:
- group: ""
resources: ["secrets"]
# Log exec into pods (potential shell access)
- level: Request
resources:
- group: ""
resources: ["pods/exec", "pods/attach"]
# Log RBAC changes
- level: RequestResponse
resources:
- group: "rbac.authorization.k8s.io"
# Don't log health checks or read-only system requests
- level: None
users: ["system:kube-proxy"]
verbs: ["watch"]
- level: None
nonResourceURLs: ["/healthz*", "/readyz*"]
Limit API Server Access
On managed Kubernetes (EKS, GKE, AKS), restrict which IP ranges can reach the API server:
# EKS: Restrict public API endpoint access
aws eks update-cluster-config \
--name production \
--resources-vpc-config \
endpointPublicAccess=true,\
publicAccessCidrs="10.0.0.0/8","203.0.113.0/24",\
endpointPrivateAccess=true
For the most secure setup, disable the public endpoint entirely and access the API server only through VPN or a bastion host in the VPC.
Layer 7: Runtime Security Monitoring
Prevention is essential, but detection is equally important. You need to know when something abnormal happens inside your containers.
Falco for Runtime Threat Detection
Falco monitors system calls at the kernel level and alerts on suspicious activity:
# Install Falco with Helm
helm repo add falcosecurity https://falcosecurity.github.io/charts
helm repo update
helm install falco falcosecurity/falco \
--namespace falco \
--create-namespace \
--set falcosidekick.enabled=true \
--set falcosidekick.config.slack.webhookurl="https://hooks.slack.com/services/..." \
--set driver.kind=modern_ebpf
Custom Falco rules for Kubernetes-specific threats:
# falco-custom-rules.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: falco-custom-rules
namespace: falco
data:
custom-rules.yaml: |
- rule: Shell Spawned in Container
desc: Detect shell execution inside a container (possible breach)
condition: >
spawned_process and container and
proc.name in (bash, sh, zsh, dash, ksh) and
not proc.pname in (crond, sshd, containerd-shim)
output: >
Shell spawned in container
(user=%user.name container=%container.name
image=%container.image.repository
pod=%k8s.pod.name ns=%k8s.ns.name
command=%proc.cmdline)
priority: WARNING
tags: [container, shell, mitre_execution]
- rule: Sensitive File Access
desc: Detect access to sensitive files inside container
condition: >
open_read and container and
(fd.name startswith /etc/shadow or
fd.name startswith /etc/passwd or
fd.name startswith /proc/1/environ)
output: >
Sensitive file accessed in container
(file=%fd.name user=%user.name
container=%container.name pod=%k8s.pod.name)
priority: CRITICAL
tags: [container, filesystem, mitre_credential_access]
- rule: Unexpected Outbound Connection
desc: Detect outbound connections to non-standard ports
condition: >
outbound and container and
not fd.sport in (80, 443, 53, 8080, 8443, 9090, 5432, 6379) and
not k8s.ns.name in (kube-system, monitoring)
output: >
Unexpected outbound connection
(port=%fd.sport ip=%fd.sip
container=%container.name pod=%k8s.pod.name)
priority: WARNING
tags: [container, network, mitre_exfiltration]
Audit Logging
Enable Kubernetes audit logging to track all API server activity:
# audit-policy.yaml
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
# Log all changes to secrets at the metadata level (don't log the data)
- level: Metadata
resources:
- group: ""
resources: ["secrets"]
# Log all RBAC changes at the request level
- level: Request
resources:
- group: "rbac.authorization.k8s.io"
resources: ["clusterroles", "clusterrolebindings", "roles", "rolebindings"]
# Log pod exec/attach at the request level (potential shell access)
- level: Request
resources:
- group: ""
resources: ["pods/exec", "pods/attach", "pods/portforward"]
# Log all changes to deployments and statefulsets
- level: Request
verbs: ["create", "update", "patch", "delete"]
resources:
- group: "apps"
resources: ["deployments", "statefulsets", "daemonsets"]
# Catch-all: log everything else at metadata level
- level: Metadata
omitStages:
- RequestReceived
Security Scanning in CI/CD
Shift security left by scanning images before they reach the cluster:
# Scan container images with Trivy
trivy image --severity HIGH,CRITICAL \
--exit-code 1 \
--ignore-unfixed \
myapp/api-server:v1.2.3
# Scan Kubernetes manifests for misconfigurations
trivy config --severity HIGH,CRITICAL \
--exit-code 1 \
k8s/
# GitHub Actions security scanning step
- name: Scan container image
uses: aquasecurity/trivy-action@master
with:
image-ref: ghcr.io/${{ github.repository }}:${{ github.sha }}
format: sarif
output: trivy-results.sarif
severity: HIGH,CRITICAL
exit-code: 1
- name: Upload scan results
uses: github/codeql-action/upload-sarif@v3
if: always()
with:
sarif_file: trivy-results.sarif
Automated Security Auditing
Don't rely on manual checks. Automate your security posture assessment.
kube-bench for CIS Benchmarks
# Run CIS Kubernetes Benchmark checks
kubectl apply -f https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job.yaml
# View results
kubectl logs job/kube-bench
# Common findings to address:
# [FAIL] 1.2.6 Ensure that the --kubelet-certificate-authority argument is set
# [FAIL] 4.2.6 Ensure that the --protect-kernel-defaults argument is set to true
# [WARN] 5.1.6 Ensure that Service Account Tokens are only mounted where necessary
Automated RBAC Review Script
Run this monthly to catch permission creep:
#!/bin/bash
# rbac-audit.sh - Monthly RBAC review
echo "=== RBAC Security Audit ==="
echo "Date: $(date)"
echo ""
echo "--- Cluster-Admin Bindings ---"
kubectl get clusterrolebindings -o json | \
jq -r '.items[] |
select(.roleRef.name=="cluster-admin") |
" \(.metadata.name): " +
(.subjects[]? | "\(.kind)/\(.name) (ns: \(.namespace // "cluster-wide"))")'
echo ""
echo "--- Service Accounts with Cluster Roles ---"
kubectl get clusterrolebindings -o json | \
jq -r '.items[] |
.metadata.name as $binding |
.roleRef.name as $role |
.subjects[]? |
select(.kind=="ServiceAccount") |
" \($binding) -> \($role) (SA: \(.namespace)/\(.name))"'
echo ""
echo "--- Pods with automountServiceAccountToken ---"
kubectl get pods --all-namespaces -o json | \
jq -r '.items[] |
select(.spec.automountServiceAccountToken != false) |
select(.metadata.namespace | test("^kube-") | not) |
" \(.metadata.namespace)/\(.metadata.name)"' | head -20
echo ""
echo "--- Pods Running as Root ---"
kubectl get pods --all-namespaces -o json | \
jq -r '.items[] |
select(.spec.securityContext.runAsNonRoot != true) |
select(.metadata.namespace | test("^kube-") | not) |
" \(.metadata.namespace)/\(.metadata.name)"' | head -20
The Hardening Checklist
Apply these in order. Each layer builds on the previous one.
| # | Layer | Action | Impact |
|---|---|---|---|
| 1 | RBAC | Remove unnecessary cluster-admin bindings | Prevents privilege escalation |
| 2 | RBAC | Disable auto-mount of SA tokens | Reduces token theft surface |
| 3 | RBAC | Scope CI/CD accounts to specific namespaces | Contains blast radius |
| 4 | Network | Apply default-deny in all namespaces | Prevents lateral movement |
| 5 | Network | Allowlist specific pod-to-pod traffic | Micro-segmentation |
| 6 | Network | Allow DNS egress explicitly | Required for name resolution |
| 7 | Pod Security | Enforce restricted PSS on production namespaces | Blocks privileged containers |
| 8 | Pod Security | Add SecurityContext to every deployment | Defense in depth |
| 9 | Secrets | Enable encryption at rest | Protects etcd |
| 10 | Secrets | Deploy External Secrets Operator | No secrets in Git, auto-rotation |
| 11 | Secrets | Use IRSA/workload identity | No long-lived credentials |
| 12 | Admission | Deploy Kyverno/OPA Gatekeeper | Automated policy enforcement |
| 13 | Admission | Block latest tag, require limits | Operational hygiene |
| 14 | Admission | Verify image signatures | Supply chain security |
| 15 | Runtime | Deploy Falco for threat detection | Detects active breaches |
| 16 | Runtime | Enable audit logging | Forensics and compliance |
| 17 | CI/CD | Scan images with Trivy before deployment | Prevents known vulnerabilities |
Security Is Layers, Not Walls
No single control prevents all attacks. What stops breaches is the combination: RBAC limits what an attacker can do, network policies limit where they can go, pod security limits what they can execute, secrets management limits what they can steal, admission controllers prevent misconfigurations from reaching the cluster, and runtime monitoring detects when all other layers have been bypassed.
The clusters that survive incidents are the ones where every layer works. The ones that don't are the ones that relied on a single perimeter and hoped for the best. Hope is not a security strategy.
Start with the audit. Find your cluster-admin bindings, your missing network policies, your pods running as root. Fix the critical findings first. Then work through the checklist methodically. Schedule monthly RBAC reviews and quarterly security assessments. Security isn't a state you achieve — it's a practice you maintain.
Your future incident responders will thank you. And when the inevitable compromise attempt happens, the difference between "we detected it in minutes and the blast radius was one namespace" and "they had cluster-admin for three weeks" is whether you invested in these layers.
Finally, run tabletop exercises. Walk through a scenario: "An attacker compromises a pod in the staging namespace. What can they access? What do they see? How do we detect it? How do we respond?" If you can't answer those questions confidently, you know which layer needs attention next. Security is a practice, not a destination, and regular testing is what separates a hardened cluster from a cluster that just looks hardened on paper.
Related Articles
DevSecOps Lead
Security-first mindset in everything I ship. From zero-trust architectures to supply chain security, I make sure your pipeline doesn't become your weakest link.
Related Articles
HashiCorp Vault and Kubernetes: Secrets Management That Actually Works
Integrate HashiCorp Vault with Kubernetes to eliminate static secrets from your cluster — with working manifests, threat models, and pipeline automation.
Kubernetes RBAC: A Practical Guide to Least-Privilege Access Control
Implement least-privilege RBAC in Kubernetes to prevent lateral movement and privilege escalation — with real threat models and pipeline-ready examples.
Security Headers & Configs: Cheat Sheet
Security headers and configuration reference — copy-paste snippets for Nginx, Kubernetes Ingress, Cloudflare, and Helmet.js.