DevOpsil
Kubernetes
94%
Fresh
Part 5 of 8 in Kubernetes from Zero to Hero

Zero-Trust Networking in Kubernetes with Network Policies

Aareez AsifAareez Asif15 min read

Your Cluster Is a Flat Network (and That's Terrifying)

Here's the thing about a default Kubernetes cluster — every pod can talk to every other pod. No questions asked. Your frontend can reach your database directly. Your monitoring stack can hit your payment service. A compromised pod in a dev namespace can probe every service in production.

This is not a theoretical risk. I've been on incident calls where a vulnerable sidecar container was used to pivot across namespaces and exfiltrate data from a completely unrelated service. The attacker didn't need to escape the cluster. The flat network gave them everything.

Zero-trust networking means: no traffic is allowed unless explicitly permitted. In Kubernetes, NetworkPolicies are how you enforce that. Let me tell you why you should treat them as non-negotiable infrastructure, not an afterthought.

Prerequisites: You Need a CNI That Enforces Policies

Before writing a single NetworkPolicy, check your CNI plugin. Here's the thing most tutorials don't mention upfront: the default Kubernetes networking (kubenet) does not enforce NetworkPolicies. You apply them, Kubernetes accepts them, and absolutely nothing happens.

CNI plugins that enforce NetworkPolicies:

CNI PluginPolicy SupportExtra Features
CalicoFullGlobalNetworkPolicy, DNS policies
CiliumFullL7 policies, eBPF-based
Weave NetFullEncrypted overlay
AntreaFullTiered policies
FlannelNoneNo policy enforcement

If you're on EKS with the default VPC CNI, you need to enable network policy support explicitly or run Calico alongside it. On GKE, enable Dataplane V2 (Cilium-based). On AKS, use Azure CNI with Calico.

Verify enforcement is working before you rely on it:

# Deploy two test pods
kubectl run client --image=busybox --restart=Never -- sleep 3600
kubectl run server --image=nginx --restart=Never --labels="app=server"

# Verify connectivity works before any policy
kubectl exec client -- wget -qO- --timeout=3 http://$(kubectl get pod server -o jsonpath='{.status.podIP}')

# Apply a deny-all policy (see next section)
# Then test again — if the connection still succeeds, your CNI isn't enforcing

Step 1: Default Deny Everything

Zero trust starts with denying all traffic. Apply this to every namespace:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

The empty podSelector: {} means this applies to every pod in the namespace. Both ingress and egress are denied. After applying this, nothing can talk to anything in the namespace — including DNS resolution.

Let me tell you why denying egress matters just as much as ingress. Without egress restrictions, a compromised pod can phone home to a command-and-control server, exfiltrate data to external endpoints, or scan your internal network. Ingress-only policies give you half the protection.

Step 2: Allow DNS (or Everything Breaks)

The moment you deny egress, every pod loses DNS resolution. Your apps can't resolve service names, health checks fail, and everything cascades. Allow DNS first:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Egress
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
    ports:
    - protocol: UDP
      port: 53
    - protocol: TCP
      port: 53

Notice I'm scoping DNS egress to the kube-system namespace where CoreDNS runs. You could use a broader rule, but in zero-trust we allow the minimum necessary.

Step 3: Build Allow Rules for Each Service

Now you selectively open the paths your services need. Here's a real-world example for a typical web application stack:

# Allow frontend to receive external traffic via ingress controller
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-ingress
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: frontend
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: ingress-nginx
      podSelector:
        matchLabels:
          app.kubernetes.io/name: ingress-nginx
    ports:
    - protocol: TCP
      port: 8080
---
# Allow frontend to talk to the API service
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-api
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: frontend
  policyTypes:
  - Egress
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: api-server
    ports:
    - protocol: TCP
      port: 3000
---
# Allow API to talk to the database
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-api-to-database
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api-server
  policyTypes:
  - Egress
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: postgres
    ports:
    - protocol: TCP
      port: 5432
---
# Allow database to receive traffic only from the API
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-database-ingress
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: postgres
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: api-server
    ports:
    - protocol: TCP
      port: 5432

Here's the thing about this approach: you're building an explicit map of allowed communication paths. If someone deploys a new service that tries to hit the database directly, it gets blocked. That's not a bug — that's the policy working exactly as intended.

Cross-Namespace Policies

Real clusters have services spread across namespaces. Here's how the monitoring namespace gets access to scrape metrics:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-prometheus-scrape
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: monitoring
      podSelector:
        matchLabels:
          app: prometheus
    ports:
    - protocol: TCP
      port: 9090
    - protocol: TCP
      port: 8080

The combination of namespaceSelector and podSelector in the same from entry means BOTH conditions must match. This is a common mistake — if you put them as separate list items, they become OR conditions, which is far more permissive than you intended.

# WRONG - This allows ALL pods from monitoring namespace
# OR any pod labeled app=prometheus from ANY namespace
ingress:
- from:
  - namespaceSelector:
      matchLabels:
        kubernetes.io/metadata.name: monitoring
  - podSelector:
      matchLabels:
        app: prometheus

# CORRECT - Only prometheus pods in the monitoring namespace
ingress:
- from:
  - namespaceSelector:
      matchLabels:
        kubernetes.io/metadata.name: monitoring
    podSelector:
      matchLabels:
        app: prometheus

I cannot stress this enough. That YAML indentation difference has caused security incidents. The difference between a list of selectors (OR) and a compound selector (AND) is a single hyphen.

Automating Policy Generation

Writing policies by hand is tedious and error-prone. Here's my recommended workflow:

  1. Observe first: Deploy a tool like Hubble (for Cilium) or Calico's flow logs to capture real traffic patterns.

  2. Generate baseline policies: Use the observed traffic to auto-generate policies.

# With Cilium/Hubble, export observed flows
hubble observe --namespace production --output json > flows.json

# Tools like https://editor.networkpolicy.io/ can help visualize policies
  1. Apply in audit mode first: If using Calico, you can set policies to log instead of deny:
apiVersion: projectcalico.org/v3
kind: NetworkPolicy
metadata:
  name: audit-default-deny
  namespace: production
spec:
  selector: all()
  types:
  - Ingress
  - Egress
  ingress:
  - action: Log
  egress:
  - action: Log

Testing Your Policies

Never deploy network policies without testing. Here's a testing script I use on every cluster:

#!/bin/bash
# test-network-policies.sh

NAMESPACE="production"

echo "=== Testing Network Policy Enforcement ==="

# Test 1: Frontend should reach API
echo -n "Frontend -> API (should PASS): "
kubectl exec -n $NAMESPACE deploy/frontend -- \
  wget -qO- --timeout=3 http://api-server:3000/health 2>/dev/null && echo "PASS" || echo "FAIL"

# Test 2: Frontend should NOT reach database directly
echo -n "Frontend -> Database (should BLOCK): "
kubectl exec -n $NAMESPACE deploy/frontend -- \
  wget -qO- --timeout=3 http://postgres:5432 2>/dev/null && echo "FAIL (not blocked!)" || echo "PASS (blocked)"

# Test 3: API should reach database
echo -n "API -> Database (should PASS): "
kubectl exec -n $NAMESPACE deploy/api-server -- \
  pg_isready -h postgres -p 5432 2>/dev/null && echo "PASS" || echo "FAIL"

# Test 4: Random pod should not reach anything
echo -n "Rogue pod -> API (should BLOCK): "
kubectl run rogue-test --rm -i --restart=Never --image=busybox -n $NAMESPACE -- \
  wget -qO- --timeout=3 http://api-server:3000/health 2>/dev/null && echo "FAIL (not blocked!)" || echo "PASS (blocked)"

Common Pitfalls I've Seen in Production

  1. Forgetting about health check traffic. If your readiness probes come from the kubelet (on the node), you need to allow ingress from the node's IP range or the pod will never become ready.

  2. Breaking kube-system traffic. The API server, CoreDNS, and other system components need connectivity. Be careful applying default-deny to kube-system.

  3. StatefulSet headless services. Pods in a StatefulSet need to talk to each other for clustering (e.g., etcd, Kafka). Don't forget intra-service communication.

  4. Init containers with different network needs. If your init container pulls config from a vault or external service, it needs egress rules too.

  5. Policy ordering confusion. NetworkPolicies are additive. There is no deny rule — only the absence of an allow. If any policy allows the traffic, it flows.

Egress Policies for External Services

In a real environment, your pods need to talk to external APIs, cloud services, and third-party endpoints. Here's how to control that egress without blocking legitimate traffic.

Allow Egress to Specific External CIDRs

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-api-external-egress
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api-server
  policyTypes:
  - Egress
  egress:
  # Allow egress to AWS services via VPC endpoints
  - to:
    - ipBlock:
        cidr: 10.0.0.0/8  # Internal VPC range
    ports:
    - protocol: TCP
      port: 443
  # Allow egress to a specific partner API
  - to:
    - ipBlock:
        cidr: 203.0.113.0/24
    ports:
    - protocol: TCP
      port: 443
  # Allow egress to payment processor
  - to:
    - ipBlock:
        cidr: 198.51.100.0/24
    ports:
    - protocol: TCP
      port: 443

Controlling Egress to the Internet

If a pod needs internet access (for pulling packages, calling external APIs), be explicit about it:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-limited-internet
  namespace: production
spec:
  podSelector:
    matchLabels:
      internet-access: required
  policyTypes:
  - Egress
  egress:
  # Allow HTTPS to any destination except internal ranges
  - to:
    - ipBlock:
        cidr: 0.0.0.0/0
        except:
          - 10.0.0.0/8
          - 172.16.0.0/12
          - 192.168.0.0/16
    ports:
    - protocol: TCP
      port: 443

The except block prevents a pod with internet access from using that egress rule to reach internal services. This matters because without it, a compromised pod could bypass your internal network policies by routing through the internet-facing path.

Advanced: Cilium L7 Network Policies

Standard Kubernetes NetworkPolicies operate at L3/L4 — they filter based on IP addresses and ports. Cilium extends this to L7, letting you write policies based on HTTP methods, paths, and headers.

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: api-l7-policy
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: api-server
  ingress:
    - fromEndpoints:
        - matchLabels:
            app: frontend
      toPorts:
        - ports:
            - port: "8080"
              protocol: TCP
          rules:
            http:
              - method: GET
                path: "/api/v1/products.*"
              - method: POST
                path: "/api/v1/orders"
                headers:
                  - 'Content-Type: application/json'

This policy allows the frontend to GET product data and POST orders, but nothing else. A compromised frontend can't DELETE resources or access admin endpoints, even though it has network connectivity to the API server. This is defense in depth at the protocol level.

Cilium DNS-Based Policies

Instead of maintaining IP allowlists for external services, Cilium lets you write policies based on DNS names:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-external-apis
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: api-server
  egress:
    - toFQDNs:
        - matchName: "api.stripe.com"
        - matchName: "api.sendgrid.com"
        - matchPattern: "*.amazonaws.com"
      toPorts:
        - ports:
            - port: "443"
              protocol: TCP
    - toEndpoints:
        - matchLabels:
            k8s:io.kubernetes.pod.namespace: kube-system
            k8s-app: kube-dns
      toPorts:
        - ports:
            - port: "53"
              protocol: UDP

This is significantly easier to maintain than IP-based egress rules. When Stripe adds a new IP range, your policy still works because it's filtering on the DNS name, not the resolved address.

Monitoring Network Policy Enforcement

You can't improve what you can't measure. Set up monitoring for dropped traffic so you know when policies are blocking legitimate requests.

Calico Flow Logs

# Enable flow logs in Calico
kubectl patch felixconfiguration default \
  --type merge \
  -p '{"spec":{"flowLogsFlushInterval":"15s","flowLogsFileEnabled":true}}'

# View denied flows
kubectl logs -n calico-system -l k8s-app=calico-node --tail=100 | grep "action=deny"

Cilium Hubble for Network Observability

# Install Hubble UI
cilium hubble enable --ui

# Port-forward to Hubble UI
kubectl port-forward -n kube-system svc/hubble-ui 12000:80

# CLI: Watch dropped flows in real time
hubble observe --namespace production --verdict DROPPED

# CLI: Show all flows between two services
hubble observe --from-label app=frontend --to-label app=api-server

Prometheus Metrics for Network Policies

If you're running Calico or Cilium, export policy metrics to Prometheus:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: network-policy-alerts
  namespace: monitoring
spec:
  groups:
    - name: network-policy.rules
      rules:
        - alert: HighNetworkPolicyDenials
          expr: |
            sum(rate(
              cilium_policy_verdict_total{verdict="denied"}[5m]
            )) by (namespace) > 10
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "High rate of network policy denials in namespace {{ $labels.namespace }}"
            description: "More than 10 denied connections per second. This may indicate a missing allow rule or an attack."

Network Policy Patterns for Common Architectures

Microservices with API Gateway

                ┌──────────────┐
   Internet ──> │ API Gateway  │
                └──────┬───────┘
            ┌──────────┼──────────┐
            │          │          │
      ┌─────▼────┐ ┌──▼──────┐ ┌▼─────────┐
      │ Auth Svc  │ │ Product │ │ Order Svc │
      └──────────┘ │ Svc     │ └─────┬─────┘
                   └─────────┘       │
                                ┌────▼────┐
                                │ Payment │
                                │ Svc     │
                                └────┬────┘
                                ┌────▼────┐
                                │ Database│
                                └─────────┘
# Only API gateway receives external traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: gateway-ingress
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api-gateway
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: ingress-nginx
    ports:
    - protocol: TCP
      port: 8080
---
# Auth service only accepts traffic from gateway
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: auth-ingress
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: auth-service
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: api-gateway
    ports:
    - protocol: TCP
      port: 8080
---
# Payment service only accepts traffic from order service
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: payment-ingress
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: payment-service
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: order-service
    ports:
    - protocol: TCP
      port: 8080

Multi-Tenant Namespace Isolation

For teams sharing a cluster, isolate namespaces completely:

# Template: Apply to every tenant namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-cross-namespace
  namespace: tenant-a
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
  ingress:
  # Only allow traffic from same namespace
  - from:
    - podSelector: {}
  egress:
  # Allow DNS
  - to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
    ports:
    - protocol: UDP
      port: 53
    - protocol: TCP
      port: 53
  # Allow traffic within namespace
  - to:
    - podSelector: {}
  # Allow traffic to shared services namespace
  - to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: shared-services
    ports:
    - protocol: TCP
      port: 443

Scaling This to a Real Organization

For teams managing many namespaces, I recommend:

  • Use a policy-as-code tool like OPA/Gatekeeper to mandate that every namespace has a default-deny policy
  • Template common patterns — create Helm charts or Kustomize overlays for standard policy sets
  • Label everything consistently — network policies are only as good as your labeling discipline
  • Review policies in CI — tools like kubeval and conftest can validate policies before they're applied

Enforcing Default-Deny with Kyverno

Automate the requirement that every namespace must have a default-deny policy:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-default-deny
spec:
  validationFailureAction: Audit
  background: true
  rules:
    - name: require-network-policy
      match:
        any:
          - resources:
              kinds: ["Namespace"]
      exclude:
        any:
          - resources:
              namespaces: ["kube-system", "kube-public", "kube-node-lease"]
      generate:
        kind: NetworkPolicy
        apiVersion: networking.k8s.io/v1
        name: default-deny-all
        namespace: "{{request.object.metadata.name}}"
        data:
          spec:
            podSelector: {}
            policyTypes:
            - Ingress
            - Egress

This Kyverno policy automatically generates a default-deny NetworkPolicy in every new namespace. No one can forget to add it because it's created automatically.

Troubleshooting Network Policies

When connectivity breaks after applying policies, follow this systematic approach:

# 1. List all network policies in the namespace
kubectl get networkpolicies -n production -o wide

# 2. Describe each policy to understand what's allowed
kubectl describe networkpolicy <name> -n production

# 3. Check if the pod's labels match the policy's podSelector
kubectl get pod <pod-name> -n production --show-labels

# 4. Test connectivity from inside a pod
kubectl exec -n production deploy/frontend -- \
  wget -qO- --timeout=3 http://api-server:3000/health

# 5. If using Cilium, check the endpoint policy status
kubectl get ciliumendpoints -n production

# 6. If using Calico, check policy evaluation
calicoctl get workloadEndpoint --namespace production -o yaml

The most common issues I encounter:

  1. Labels don't match: The policy selects app: api but the pod is labeled app: api-server.
  2. Port mismatch: The policy allows port 80 but the service runs on 8080.
  3. Missing egress rule: Ingress is allowed but the source pod doesn't have an egress rule to the destination.
  4. DNS not allowed: Default-deny blocks DNS, and nobody added the DNS egress rule.

Conclusion

Zero-trust networking isn't a project you finish. It's a practice you maintain. Every new service, every new integration path needs a corresponding network policy. Build that into your deployment pipeline, and it becomes second nature instead of an afterthought.

The payoff is significant: when a container gets compromised — and eventually one will — the attacker finds themselves in a box. They can't reach the database. They can't probe other namespaces. They can't exfiltrate data to external endpoints. That containment is the difference between a minor security event and a major breach.

Start with default-deny in one namespace. Add allow rules for its known traffic patterns. Test that everything works. Then expand to the next namespace. Within a few weeks, you'll have a cluster where every communication path is explicit, documented, and enforced. That's zero trust in practice.

One final piece of advice: document your network policies. A diagram showing allowed traffic flows is worth more than reading 50 YAML files. Tools like Cilium's Hubble UI or the Network Policy Editor (editor.networkpolicy.io) can visualize your policies and show you exactly what's allowed and what's blocked. Use them during incident response, during onboarding, and during every architecture review. When someone asks "can service A talk to service B?", the answer should be visible in seconds, not buried in a namespace you haven't looked at in months.

Share:
Aareez Asif
Aareez Asif

Senior Kubernetes Architect

10+ years orchestrating containers in production. Battle-tested opinions on everything from pod scheduling to service mesh. I've seen clusters burn and helped rebuild them better.

Related Articles