DevOpsil
Security
89%
Fresh
Part 1 of 6 in Security Hardening

Kubernetes Security Hardening for Production: The Complete Guide

Amara OkaforAmara Okafor15 min read

Your Cluster Is Not Secure by Default

Let me be direct: a default Kubernetes installation is wide open. Pods can talk to any other pod. Service accounts get auto-mounted tokens. Containers run as root. There are no network boundaries, no admission policies, and no runtime security controls.

I've audited Kubernetes clusters at financial institutions, healthcare companies, and SaaS platforms. The pattern is depressingly consistent: teams deploy applications, ship features, and never revisit the security posture. Then something gets compromised, and the blast radius is the entire cluster.

This guide is the hardening checklist I run on every production cluster. It covers five layers of defense: RBAC, network policies, pod security, secrets management, and admission controllers. Each layer reduces the blast radius of a breach. Together, they make your cluster defensible.

Layer 1: RBAC — Control Who Can Do What

Audit Existing Permissions First

Before you lock anything down, understand what exists:

# Find all ClusterRoleBindings granting cluster-admin
kubectl get clusterrolebindings -o json | \
  jq -r '.items[] | select(.roleRef.name=="cluster-admin") |
    .metadata.name + " -> " +
    (.subjects[]? | .kind + "/" + .name)'

# Find all service accounts with cluster-admin
kubectl get clusterrolebindings -o json | \
  jq -r '.items[] | select(.roleRef.name=="cluster-admin") |
    .subjects[]? | select(.kind=="ServiceAccount") |
    .namespace + "/" + .name'

If you see more than 2-3 entries in that second command, you have a problem. Most service accounts don't need cluster-wide admin privileges.

Principle of Least Privilege

Replace broad ClusterRoles with namespace-scoped Roles:

# WRONG: Giving a CI/CD service account cluster-admin
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cicd-admin
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: cicd-deployer
    namespace: cicd

---
# RIGHT: Scoped to specific namespace and verbs
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: cicd-deployer
  namespace: production
rules:
  - apiGroups: ["apps"]
    resources: ["deployments", "replicasets"]
    verbs: ["get", "list", "watch", "update", "patch"]
  - apiGroups: [""]
    resources: ["services", "configmaps"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]
  - apiGroups: [""]
    resources: ["secrets"]
    verbs: ["get", "list"]  # No create/update — secrets managed separately
  - apiGroups: ["networking.k8s.io"]
    resources: ["ingresses"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: cicd-deployer
  namespace: production
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: cicd-deployer
subjects:
  - kind: ServiceAccount
    name: cicd-deployer
    namespace: cicd

Disable Auto-Mounting of Service Account Tokens

Most pods don't need to talk to the Kubernetes API. Disable the token mount by default:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: app-service-account
  namespace: production
automountServiceAccountToken: false

For pods that genuinely need API access, enable it explicitly at the pod level and use a projected volume with a short TTL:

apiVersion: v1
kind: Pod
metadata:
  name: api-consumer
spec:
  serviceAccountName: app-service-account
  automountServiceAccountToken: false
  containers:
    - name: app
      image: myapp:latest
      volumeMounts:
        - name: token
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount
          readOnly: true
  volumes:
    - name: token
      projected:
        sources:
          - serviceAccountToken:
              expirationSeconds: 3600  # 1 hour, not infinite
              audience: api
              path: token

Layer 2: Network Policies — Segment Your Traffic

Default Deny Everything

Start from zero trust. Deny all traffic, then allowlist what's needed:

# Apply to every namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress

This single manifest changes your security posture dramatically. Nothing can talk to anything unless explicitly permitted.

Allow Specific Communication Patterns

# Allow frontend to talk to backend on port 8080
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-backend
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: frontend
      ports:
        - protocol: TCP
          port: 8080

---
# Allow backend to talk to database on port 5432
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-backend-to-db
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: database
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: backend
      ports:
        - protocol: TCP
          port: 5432

---
# Allow DNS resolution for all pods
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns
  namespace: production
spec:
  podSelector: {}
  policyTypes:
    - Egress
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53

Critical: Don't forget the DNS egress policy. Without it, default-deny breaks DNS resolution and everything stops working. I've seen this take down production environments.

Cross-Namespace Isolation

# Only allow monitoring namespace to scrape metrics
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-prometheus-scrape
  namespace: production
spec:
  podSelector: {}
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: monitoring
          podSelector:
            matchLabels:
              app: prometheus
      ports:
        - protocol: TCP
          port: 9090
        - protocol: TCP
          port: 8080  # Application metrics port

Layer 3: Pod Security Standards

Enforce Restricted Security Profile

Kubernetes Pod Security Standards (PSS) replaced PodSecurityPolicies. Enforce them at the namespace level:

# Label namespaces with security enforcement
kubectl label namespace production \
  pod-security.kubernetes.io/enforce=restricted \
  pod-security.kubernetes.io/warn=restricted \
  pod-security.kubernetes.io/audit=restricted

What restricted enforces:

  • No privileged containers
  • No host networking, PID, or IPC
  • No hostPath volumes
  • Must run as non-root
  • Must drop ALL capabilities
  • Read-only root filesystem (when combined with SecurityContext)

Security Context for Every Pod

apiVersion: apps/v1
kind: Deployment
metadata:
  name: secure-app
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: secure-app
  template:
    metadata:
      labels:
        app: secure-app
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 10001
        runAsGroup: 10001
        fsGroup: 10001
        seccompProfile:
          type: RuntimeDefault
      containers:
        - name: app
          image: myapp:v1.2.3  # Never use :latest in production
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop: ["ALL"]
          resources:
            requests:
              memory: "128Mi"
              cpu: "100m"
            limits:
              memory: "256Mi"
              cpu: "500m"
          volumeMounts:
            - name: tmp
              mountPath: /tmp
            - name: cache
              mountPath: /app/cache
      volumes:
        - name: tmp
          emptyDir:
            sizeLimit: 100Mi
        - name: cache
          emptyDir:
            sizeLimit: 200Mi

The readOnlyRootFilesystem with emptyDir volumes for /tmp is a pattern that breaks most malware. If an attacker gains code execution, they can't write to the filesystem outside the ephemeral mounts.

Layer 4: Secrets Management

Never Store Secrets in Kubernetes Secrets (Alone)

Kubernetes Secrets are base64-encoded, not encrypted. Anyone with get secrets permission can read them. Layer your defenses.

Enable Encryption at Rest

# /etc/kubernetes/encryption-config.yaml
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
  - resources:
      - secrets
    providers:
      - aescbc:
          keys:
            - name: key1
              secret: <base64-encoded-32-byte-key>
      - identity: {}

External Secrets Operator + AWS Secrets Manager

The better pattern: don't store secrets in Kubernetes at all. Use External Secrets Operator to sync from a proper secrets manager.

apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: aws-secrets
  namespace: production
spec:
  provider:
    aws:
      service: SecretsManager
      region: us-east-1
      auth:
        jwt:
          serviceAccountRef:
            name: external-secrets-sa

---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: app-secrets
  namespace: production
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets
    kind: SecretStore
  target:
    name: app-secrets
    creationPolicy: Owner
    deletionPolicy: Retain
  data:
    - secretKey: database-url
      remoteRef:
        key: production/app/database
        property: url
    - secretKey: api-key
      remoteRef:
        key: production/app/api-key

IRSA for AWS Access (No Long-Lived Credentials)

apiVersion: v1
kind: ServiceAccount
metadata:
  name: app-sa
  namespace: production
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/app-production-role
    eks.amazonaws.com/audience: sts.amazonaws.com

The IAM role should follow least privilege:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "secretsmanager:GetSecretValue"
      ],
      "Resource": "arn:aws:secretsmanager:us-east-1:123456789012:secret:production/app/*",
      "Condition": {
        "StringEquals": {
          "aws:RequestedRegion": "us-east-1"
        }
      }
    }
  ]
}

Layer 5: Admission Controllers

Kyverno Policies for Automated Enforcement

Admission controllers intercept every API request before it's persisted. This is where you enforce standards automatically.

# Require resource limits on every container
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-resource-limits
spec:
  validationFailureAction: Enforce
  background: true
  rules:
    - name: require-limits
      match:
        any:
          - resources:
              kinds: ["Pod"]
      validate:
        message: "CPU and memory limits are required for all containers."
        pattern:
          spec:
            containers:
              - resources:
                  limits:
                    memory: "?*"
                    cpu: "?*"

---
# Block latest tag
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: block-latest-tag
spec:
  validationFailureAction: Enforce
  background: true
  rules:
    - name: block-latest
      match:
        any:
          - resources:
              kinds: ["Pod"]
      validate:
        message: "Using ':latest' tag is not allowed. Specify a version tag."
        pattern:
          spec:
            containers:
              - image: "!*:latest"

---
# Require labels
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-labels
spec:
  validationFailureAction: Enforce
  background: true
  rules:
    - name: require-team-label
      match:
        any:
          - resources:
              kinds: ["Deployment", "StatefulSet", "DaemonSet"]
      validate:
        message: "The label 'team' is required."
        pattern:
          metadata:
            labels:
              team: "?*"

---
# Mutate: Add default seccomp profile
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: add-default-seccomp
spec:
  rules:
    - name: add-seccomp
      match:
        any:
          - resources:
              kinds: ["Pod"]
      mutate:
        patchStrategicMerge:
          spec:
            securityContext:
              seccompProfile:
                type: RuntimeDefault

Image Signature Verification

Don't deploy images you can't verify:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: verify-image-signatures
spec:
  validationFailureAction: Enforce
  webhookTimeoutSeconds: 30
  rules:
    - name: verify-cosign-signature
      match:
        any:
          - resources:
              kinds: ["Pod"]
      verifyImages:
        - imageReferences:
            - "ghcr.io/myorg/*"
          attestors:
            - entries:
                - keys:
                    publicKeys: |-
                      -----BEGIN PUBLIC KEY-----
                      MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE...
                      -----END PUBLIC KEY-----

Layer 6: API Server Hardening

The Kubernetes API server is the control plane's front door. Harden it to limit what attackers can do even if they gain initial access.

Restrict Anonymous Authentication

# Check if anonymous auth is enabled (it shouldn't be in production)
kubectl auth can-i --list --as=system:anonymous

# If you see anything besides "no", tighten the API server config
# Add to kube-apiserver flags:
# --anonymous-auth=false

Enable API Audit Logging

API audit logs record every request to the API server. They're essential for forensics after an incident:

# audit-policy.yaml
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
  # Log all secret access at metadata level
  - level: Metadata
    resources:
      - group: ""
        resources: ["secrets"]
  # Log exec into pods (potential shell access)
  - level: Request
    resources:
      - group: ""
        resources: ["pods/exec", "pods/attach"]
  # Log RBAC changes
  - level: RequestResponse
    resources:
      - group: "rbac.authorization.k8s.io"
  # Don't log health checks or read-only system requests
  - level: None
    users: ["system:kube-proxy"]
    verbs: ["watch"]
  - level: None
    nonResourceURLs: ["/healthz*", "/readyz*"]

Limit API Server Access

On managed Kubernetes (EKS, GKE, AKS), restrict which IP ranges can reach the API server:

# EKS: Restrict public API endpoint access
aws eks update-cluster-config \
  --name production \
  --resources-vpc-config \
    endpointPublicAccess=true,\
    publicAccessCidrs="10.0.0.0/8","203.0.113.0/24",\
    endpointPrivateAccess=true

For the most secure setup, disable the public endpoint entirely and access the API server only through VPN or a bastion host in the VPC.

Layer 7: Runtime Security Monitoring

Prevention is essential, but detection is equally important. You need to know when something abnormal happens inside your containers.

Falco for Runtime Threat Detection

Falco monitors system calls at the kernel level and alerts on suspicious activity:

# Install Falco with Helm
helm repo add falcosecurity https://falcosecurity.github.io/charts
helm repo update

helm install falco falcosecurity/falco \
  --namespace falco \
  --create-namespace \
  --set falcosidekick.enabled=true \
  --set falcosidekick.config.slack.webhookurl="https://hooks.slack.com/services/..." \
  --set driver.kind=modern_ebpf

Custom Falco rules for Kubernetes-specific threats:

# falco-custom-rules.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: falco-custom-rules
  namespace: falco
data:
  custom-rules.yaml: |
    - rule: Shell Spawned in Container
      desc: Detect shell execution inside a container (possible breach)
      condition: >
        spawned_process and container and
        proc.name in (bash, sh, zsh, dash, ksh) and
        not proc.pname in (crond, sshd, containerd-shim)
      output: >
        Shell spawned in container
        (user=%user.name container=%container.name
        image=%container.image.repository
        pod=%k8s.pod.name ns=%k8s.ns.name
        command=%proc.cmdline)
      priority: WARNING
      tags: [container, shell, mitre_execution]

    - rule: Sensitive File Access
      desc: Detect access to sensitive files inside container
      condition: >
        open_read and container and
        (fd.name startswith /etc/shadow or
         fd.name startswith /etc/passwd or
         fd.name startswith /proc/1/environ)
      output: >
        Sensitive file accessed in container
        (file=%fd.name user=%user.name
        container=%container.name pod=%k8s.pod.name)
      priority: CRITICAL
      tags: [container, filesystem, mitre_credential_access]

    - rule: Unexpected Outbound Connection
      desc: Detect outbound connections to non-standard ports
      condition: >
        outbound and container and
        not fd.sport in (80, 443, 53, 8080, 8443, 9090, 5432, 6379) and
        not k8s.ns.name in (kube-system, monitoring)
      output: >
        Unexpected outbound connection
        (port=%fd.sport ip=%fd.sip
        container=%container.name pod=%k8s.pod.name)
      priority: WARNING
      tags: [container, network, mitre_exfiltration]

Audit Logging

Enable Kubernetes audit logging to track all API server activity:

# audit-policy.yaml
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
  # Log all changes to secrets at the metadata level (don't log the data)
  - level: Metadata
    resources:
      - group: ""
        resources: ["secrets"]

  # Log all RBAC changes at the request level
  - level: Request
    resources:
      - group: "rbac.authorization.k8s.io"
        resources: ["clusterroles", "clusterrolebindings", "roles", "rolebindings"]

  # Log pod exec/attach at the request level (potential shell access)
  - level: Request
    resources:
      - group: ""
        resources: ["pods/exec", "pods/attach", "pods/portforward"]

  # Log all changes to deployments and statefulsets
  - level: Request
    verbs: ["create", "update", "patch", "delete"]
    resources:
      - group: "apps"
        resources: ["deployments", "statefulsets", "daemonsets"]

  # Catch-all: log everything else at metadata level
  - level: Metadata
    omitStages:
      - RequestReceived

Security Scanning in CI/CD

Shift security left by scanning images before they reach the cluster:

# Scan container images with Trivy
trivy image --severity HIGH,CRITICAL \
  --exit-code 1 \
  --ignore-unfixed \
  myapp/api-server:v1.2.3

# Scan Kubernetes manifests for misconfigurations
trivy config --severity HIGH,CRITICAL \
  --exit-code 1 \
  k8s/
# GitHub Actions security scanning step
- name: Scan container image
  uses: aquasecurity/trivy-action@master
  with:
    image-ref: ghcr.io/${{ github.repository }}:${{ github.sha }}
    format: sarif
    output: trivy-results.sarif
    severity: HIGH,CRITICAL
    exit-code: 1

- name: Upload scan results
  uses: github/codeql-action/upload-sarif@v3
  if: always()
  with:
    sarif_file: trivy-results.sarif

Automated Security Auditing

Don't rely on manual checks. Automate your security posture assessment.

kube-bench for CIS Benchmarks

# Run CIS Kubernetes Benchmark checks
kubectl apply -f https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job.yaml

# View results
kubectl logs job/kube-bench

# Common findings to address:
# [FAIL] 1.2.6 Ensure that the --kubelet-certificate-authority argument is set
# [FAIL] 4.2.6 Ensure that the --protect-kernel-defaults argument is set to true
# [WARN] 5.1.6 Ensure that Service Account Tokens are only mounted where necessary

Automated RBAC Review Script

Run this monthly to catch permission creep:

#!/bin/bash
# rbac-audit.sh - Monthly RBAC review

echo "=== RBAC Security Audit ==="
echo "Date: $(date)"
echo ""

echo "--- Cluster-Admin Bindings ---"
kubectl get clusterrolebindings -o json | \
  jq -r '.items[] |
    select(.roleRef.name=="cluster-admin") |
    "  \(.metadata.name): " +
    (.subjects[]? | "\(.kind)/\(.name) (ns: \(.namespace // "cluster-wide"))")'

echo ""
echo "--- Service Accounts with Cluster Roles ---"
kubectl get clusterrolebindings -o json | \
  jq -r '.items[] |
    .metadata.name as $binding |
    .roleRef.name as $role |
    .subjects[]? |
    select(.kind=="ServiceAccount") |
    "  \($binding) -> \($role) (SA: \(.namespace)/\(.name))"'

echo ""
echo "--- Pods with automountServiceAccountToken ---"
kubectl get pods --all-namespaces -o json | \
  jq -r '.items[] |
    select(.spec.automountServiceAccountToken != false) |
    select(.metadata.namespace | test("^kube-") | not) |
    "  \(.metadata.namespace)/\(.metadata.name)"' | head -20

echo ""
echo "--- Pods Running as Root ---"
kubectl get pods --all-namespaces -o json | \
  jq -r '.items[] |
    select(.spec.securityContext.runAsNonRoot != true) |
    select(.metadata.namespace | test("^kube-") | not) |
    "  \(.metadata.namespace)/\(.metadata.name)"' | head -20

The Hardening Checklist

Apply these in order. Each layer builds on the previous one.

#LayerActionImpact
1RBACRemove unnecessary cluster-admin bindingsPrevents privilege escalation
2RBACDisable auto-mount of SA tokensReduces token theft surface
3RBACScope CI/CD accounts to specific namespacesContains blast radius
4NetworkApply default-deny in all namespacesPrevents lateral movement
5NetworkAllowlist specific pod-to-pod trafficMicro-segmentation
6NetworkAllow DNS egress explicitlyRequired for name resolution
7Pod SecurityEnforce restricted PSS on production namespacesBlocks privileged containers
8Pod SecurityAdd SecurityContext to every deploymentDefense in depth
9SecretsEnable encryption at restProtects etcd
10SecretsDeploy External Secrets OperatorNo secrets in Git, auto-rotation
11SecretsUse IRSA/workload identityNo long-lived credentials
12AdmissionDeploy Kyverno/OPA GatekeeperAutomated policy enforcement
13AdmissionBlock latest tag, require limitsOperational hygiene
14AdmissionVerify image signaturesSupply chain security
15RuntimeDeploy Falco for threat detectionDetects active breaches
16RuntimeEnable audit loggingForensics and compliance
17CI/CDScan images with Trivy before deploymentPrevents known vulnerabilities

Security Is Layers, Not Walls

No single control prevents all attacks. What stops breaches is the combination: RBAC limits what an attacker can do, network policies limit where they can go, pod security limits what they can execute, secrets management limits what they can steal, admission controllers prevent misconfigurations from reaching the cluster, and runtime monitoring detects when all other layers have been bypassed.

The clusters that survive incidents are the ones where every layer works. The ones that don't are the ones that relied on a single perimeter and hoped for the best. Hope is not a security strategy.

Start with the audit. Find your cluster-admin bindings, your missing network policies, your pods running as root. Fix the critical findings first. Then work through the checklist methodically. Schedule monthly RBAC reviews and quarterly security assessments. Security isn't a state you achieve — it's a practice you maintain.

Your future incident responders will thank you. And when the inevitable compromise attempt happens, the difference between "we detected it in minutes and the blast radius was one namespace" and "they had cluster-admin for three weeks" is whether you invested in these layers.

Finally, run tabletop exercises. Walk through a scenario: "An attacker compromises a pod in the staging namespace. What can they access? What do they see? How do we detect it? How do we respond?" If you can't answer those questions confidently, you know which layer needs attention next. Security is a practice, not a destination, and regular testing is what separates a hardened cluster from a cluster that just looks hardened on paper.

Share:
Amara Okafor
Amara Okafor

DevSecOps Lead

Security-first mindset in everything I ship. From zero-trust architectures to supply chain security, I make sure your pipeline doesn't become your weakest link.

Related Articles