DevOpsil
Istio
93%
Fresh

Istio mTLS & Security: Zero-Trust Service Communication

Riku TanakaRiku Tanaka19 min read

In a traditional network, services behind the firewall trust each other implicitly. A compromised service can freely communicate with any other internal service, move laterally, and exfiltrate data. This perimeter-based security model has been proven inadequate repeatedly --- from the Target breach in 2013 to the SolarWinds attack in 2020, attackers who gain a foothold inside the network face few barriers to lateral movement. Zero-trust networking eliminates this assumption by requiring every service to prove its identity and have explicit authorization for each request. Istio implements zero-trust through mutual TLS for identity verification and AuthorizationPolicies for fine-grained access control, all without requiring changes to your application code.

This guide covers how to enable and enforce mTLS across your mesh, build a comprehensive authorization policy layer, integrate external authentication providers, handle edge cases like non-mesh services and legacy protocols, and debug security issues in production environments.

Why mTLS Matters

Standard TLS (what your browser uses) is one-directional: the client verifies the server's identity, but the server does not verify the client. Mutual TLS adds client-side certificate verification, meaning both parties authenticate each other. In the context of a service mesh, this distinction is critical.

In a service mesh context, mTLS provides:

  • Identity --- Each service has a cryptographic identity (SPIFFE ID) issued by Istio's certificate authority. This identity is bound to the Kubernetes service account, not to network addresses.
  • Encryption --- All traffic between services is encrypted with TLS 1.3, even within the cluster network. Anyone sniffing the network sees only encrypted traffic.
  • Integrity --- Tampering with in-flight data is detected through cryptographic message authentication.
  • No code changes --- The Envoy sidecar handles all TLS operations transparently. Your application communicates over plaintext localhost, and the sidecar encrypts/decrypts at the pod boundary.
  • Automatic rotation --- Certificates are rotated automatically (default: every 24 hours), eliminating the operational burden of manual certificate management.

Without mTLS, anyone who gains access to your cluster network can sniff inter-service traffic, impersonate services, and inject malicious responses. With mTLS, even a compromised node can only see encrypted traffic for which it does not have the private keys.

Zero-Trust Security Model

The zero-trust model that Istio implements follows these principles:

PrincipleHow Istio Implements It
Never trust, always verifymTLS verifies identity on every connection
Least privilege accessAuthorizationPolicies restrict what each service can access
Assume breachEncryption prevents lateral eavesdropping
Verify explicitlyJWT validation for external requests
Continuous monitoringAccess logs and metrics for all traffic

PeerAuthentication: Controlling mTLS Mode

The PeerAuthentication resource controls whether services require mTLS for incoming connections. It operates at three levels of granularity: mesh-wide, namespace, and workload.

Modes

ModeBehaviorUse When
PERMISSIVEAccept both plaintext and mTLS connections (default)During migration to mTLS
STRICTOnly accept mTLS connectionsProduction, fully meshed namespaces
DISABLEAccept only plaintext connectionsDebugging, or services that cannot use mTLS
UNSETInherit from parent scopeUsing the hierarchy for configuration

Mesh-Wide mTLS

Enable STRICT mTLS for the entire mesh by placing the policy in the istio-system namespace:

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system  # Mesh-wide when in istio-system
spec:
  mtls:
    mode: STRICT

This is the target state for a production mesh. Every connection between meshed services must use mTLS. Any plaintext connection attempt will be rejected.

Namespace-Level mTLS

Override the mesh-wide setting for a specific namespace:

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default
  namespace: payments  # Only applies to this namespace
spec:
  mtls:
    mode: STRICT

This is useful when migrating namespace by namespace. You can enable STRICT for namespaces that are fully meshed while keeping others in PERMISSIVE mode.

Workload-Level mTLS

Target a specific workload within a namespace:

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: mysql-permissive
  namespace: data
spec:
  selector:
    matchLabels:
      app: mysql
  mtls:
    mode: PERMISSIVE  # MySQL clients outside the mesh need plaintext access

Port-Level mTLS

Disable mTLS on specific ports while keeping it strict on others:

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: api-service
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-service
  mtls:
    mode: STRICT
  portLevelMtls:
    8080:
      mode: STRICT      # Application traffic
    9090:
      mode: PERMISSIVE  # Prometheus scrape port
    15021:
      mode: DISABLE     # Health check port

Policy Precedence

When multiple PeerAuthentication policies overlap, the most specific one wins:

Workload-level (most specific)
    |
    v
Namespace-level
    |
    v
Mesh-wide (least specific)

If a workload-level policy exists for a pod, it takes full precedence over namespace and mesh-wide policies. There is no merging --- the most specific policy replaces everything above it.

Certificate Management

Istio's control plane (istiod) acts as a certificate authority (CA). It automatically issues X.509 certificates to every workload in the mesh without any manual intervention.

How Certificate Provisioning Works

  1. When a pod with an Envoy sidecar starts, the sidecar generates a private key and a Certificate Signing Request (CSR)
  2. The CSR is sent to istiod over a secure gRPC channel, authenticated using the Kubernetes service account token
  3. istiod validates the CSR against the pod's service account and namespace
  4. istiod signs the certificate using its CA key and returns the signed certificate
  5. The sidecar loads the certificate and begins accepting mTLS connections
  6. Before the certificate expires (default: 24 hours), the sidecar automatically generates a new CSR and repeats the process

This entire flow happens without any operator intervention.

SPIFFE Identity

Each workload receives a SPIFFE-compliant identity encoded in the certificate's Subject Alternative Name (SAN):

spiffe://cluster.local/ns/NAMESPACE/sa/SERVICE_ACCOUNT

For example, a service running as the reviews service account in the default namespace gets:

spiffe://cluster.local/ns/default/sa/reviews

This identity is what AuthorizationPolicies use to control access. The identity is cryptographically bound to the workload through the certificate chain, making it unforgeable.

Certificate Rotation and Lifetimes

SettingDefaultRecommended ProductionHow to Change
Workload cert lifetime24 hours12-24 hours--set values.pilot.env.DEFAULT_WORKLOAD_CERT_TTL=12h
CA cert lifetime10 years (self-signed)1-3 years (custom CA)Use custom CA certificates
Root cert lifetime10 years5-10 yearsGenerate with long expiry
Grace period for rotation50% of lifetime50% of lifetimeAutomatic, not configurable

Short-lived workload certificates are a security advantage: even if a certificate is compromised, it expires quickly and cannot be reused.

Using Custom CA Certificates

For production, you typically want to use your own root CA instead of Istio's self-signed one. This is critical when:

  • You need certificates trusted by external systems
  • Compliance requires a specific certificate chain
  • You use multiple clusters that need to trust each other
  • You have an existing PKI infrastructure
# Generate a root CA (if you don't have one)
openssl req -new -newkey rsa:4096 -x509 -sha256 \
  -days 3650 -nodes \
  -subj "/O=Company Inc./CN=Root CA" \
  -keyout root-key.pem -out root-cert.pem

# Generate an intermediate CA for Istio
openssl req -new -newkey rsa:4096 -sha256 -nodes \
  -subj "/O=Company Inc./CN=Istio Intermediate CA" \
  -keyout ca-key.pem -out ca-csr.pem

openssl x509 -req -days 730 -sha256 \
  -CA root-cert.pem -CAkey root-key.pem -CAcreateserial \
  -in ca-csr.pem -out ca-cert.pem \
  -extfile <(printf "basicConstraints=CA:TRUE\nkeyUsage=critical,digitalSignature,keyCertSign,cRLSign")

# Create the certificate chain
cat ca-cert.pem root-cert.pem > cert-chain.pem

# Create the Kubernetes secret
kubectl create secret generic cacerts -n istio-system \
  --from-file=ca-cert.pem \
  --from-file=ca-key.pem \
  --from-file=root-cert.pem \
  --from-file=cert-chain.pem

# Restart istiod to pick up the new CA
kubectl rollout restart deployment istiod -n istio-system

# Verify the new CA is being used
istioctl proxy-config secret deploy/myapp -o json | \
  jq -r '.dynamicActiveSecrets[0].secret.tlsCertificate.certificateChain.inlineBytes' | \
  base64 -d | openssl x509 -text -noout | grep "Issuer:"

Multi-Cluster Certificate Trust

For multi-cluster meshes where services in different clusters need to communicate with mTLS, all clusters must share the same root CA:

# Use the same root-cert.pem and generate unique intermediate CAs per cluster
# Cluster 1
kubectl create secret generic cacerts -n istio-system \
  --from-file=ca-cert.pem=cluster1-ca-cert.pem \
  --from-file=ca-key.pem=cluster1-ca-key.pem \
  --from-file=root-cert.pem \
  --from-file=cert-chain.pem=cluster1-cert-chain.pem \
  --context=cluster1

# Cluster 2
kubectl create secret generic cacerts -n istio-system \
  --from-file=ca-cert.pem=cluster2-ca-cert.pem \
  --from-file=ca-key.pem=cluster2-ca-key.pem \
  --from-file=root-cert.pem \
  --from-file=cert-chain.pem=cluster2-cert-chain.pem \
  --context=cluster2

AuthorizationPolicy: Access Control

AuthorizationPolicies define who can access what. They operate on the identity established by mTLS (for in-mesh traffic) or JWT tokens (for external clients).

Policy Actions

ActionBehaviorEvaluation Order
DENYDeny matching requestsEvaluated first
CUSTOMDelegate to an external authorization serviceEvaluated second
ALLOWAllow matching requests (deny all others when any ALLOW policy exists)Evaluated third
AUDITLog matching requests (does not affect allow/deny)Evaluated alongside others

When multiple policies exist for a workload, the evaluation order is: CUSTOM, DENY, ALLOW. If a DENY policy matches, the request is denied regardless of any ALLOW policies. If no ALLOW policy exists, all traffic is allowed (unless a DENY matches).

Deny-All Baseline

Start with a deny-all policy, then explicitly allow required communication. This is the foundation of zero-trust:

# Deny all traffic in the namespace
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: deny-all
  namespace: production
spec:
  {}  # Empty spec means deny all

After applying this, all traffic to services in the production namespace will be denied. You then add ALLOW policies for each legitimate communication path.

Allow Specific Service Communication

# Allow frontend to call the API service on specific paths
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: allow-frontend-to-api
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-service
  action: ALLOW
  rules:
    - from:
        - source:
            principals:
              - "cluster.local/ns/production/sa/frontend"
      to:
        - operation:
            methods: ["GET", "POST"]
            paths: ["/api/*"]

Allow by Namespace

# Allow any service in the monitoring namespace to scrape metrics
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: allow-monitoring
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-service
  action: ALLOW
  rules:
    - from:
        - source:
            namespaces: ["monitoring", "istio-system"]
      to:
        - operation:
            methods: ["GET"]
            paths: ["/metrics", "/healthz", "/readyz"]

Deny Specific Sources

# Block a compromised service immediately
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: block-compromised
  namespace: production
spec:
  selector:
    matchLabels:
      app: database
  action: DENY
  rules:
    - from:
        - source:
            principals:
              - "cluster.local/ns/production/sa/compromised-service"

Complex Authorization Rules

apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: api-access
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-service
  action: ALLOW
  rules:
    # Internal services can call any endpoint
    - from:
        - source:
            principals:
              - "cluster.local/ns/production/sa/frontend"
              - "cluster.local/ns/production/sa/mobile-bff"
      to:
        - operation:
            methods: ["GET", "POST", "PUT", "DELETE"]

    # Batch processing service can only access batch endpoints
    - from:
        - source:
            principals:
              - "cluster.local/ns/batch/sa/batch-processor"
      to:
        - operation:
            methods: ["POST"]
            paths: ["/api/v2/batch/*"]

    # External JWT-authenticated users can only read with v2 API
    - from:
        - source:
            requestPrincipals: ["https://auth.example.com/*"]
      to:
        - operation:
            methods: ["GET"]
      when:
        - key: request.headers[x-api-version]
          values: ["v2"]

    # Allow health checks from anywhere (no source restriction)
    - to:
        - operation:
            methods: ["GET"]
            paths: ["/healthz", "/readyz"]

Production Authorization Policy Pattern

For a typical microservices application, build policies layer by layer:

# 1. Deny all by default
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: deny-all
  namespace: production
spec: {}
---
# 2. Allow ingress gateway to reach frontend
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: allow-ingress-to-frontend
  namespace: production
spec:
  selector:
    matchLabels:
      app: frontend
  action: ALLOW
  rules:
    - from:
        - source:
            namespaces: ["istio-ingress"]
---
# 3. Allow frontend to reach API
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: allow-frontend-to-api
  namespace: production
spec:
  selector:
    matchLabels:
      app: api
  action: ALLOW
  rules:
    - from:
        - source:
            principals: ["cluster.local/ns/production/sa/frontend"]
---
# 4. Allow API to reach database
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: allow-api-to-db
  namespace: production
spec:
  selector:
    matchLabels:
      app: database
  action: ALLOW
  rules:
    - from:
        - source:
            principals: ["cluster.local/ns/production/sa/api"]
      to:
        - operation:
            ports: ["5432"]
---
# 5. Allow monitoring everywhere
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: allow-monitoring
  namespace: production
spec:
  action: ALLOW
  rules:
    - from:
        - source:
            namespaces: ["monitoring"]
      to:
        - operation:
            methods: ["GET"]
            paths: ["/metrics"]

JWT Authentication with RequestAuthentication

For external traffic (from outside the mesh), use JWT tokens for authentication:

apiVersion: security.istio.io/v1
kind: RequestAuthentication
metadata:
  name: jwt-auth
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-service
  jwtRules:
    - issuer: "https://auth.example.com"
      jwksUri: "https://auth.example.com/.well-known/jwks.json"
      audiences:
        - "api.example.com"
      forwardOriginalToken: true
      outputPayloadToHeader: "x-jwt-payload"
      fromHeaders:
        - name: Authorization
          prefix: "Bearer "
      fromParams:
        - "access_token"
    # Support multiple identity providers
    - issuer: "https://accounts.google.com"
      jwksUri: "https://www.googleapis.com/oauth2/v3/certs"
      audiences:
        - "your-google-client-id.apps.googleusercontent.com"

Then combine with an AuthorizationPolicy to enforce JWT claims:

apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: require-jwt
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-service
  action: ALLOW
  rules:
    # Admin users can access everything
    - from:
        - source:
            requestPrincipals: ["https://auth.example.com/*"]
      when:
        - key: request.auth.claims[role]
          values: ["admin"]

    # Editor users can read and write
    - from:
        - source:
            requestPrincipals: ["https://auth.example.com/*"]
      to:
        - operation:
            methods: ["GET", "POST", "PUT"]
      when:
        - key: request.auth.claims[role]
          values: ["editor"]

    # Viewer users can only read
    - from:
        - source:
            requestPrincipals: ["https://auth.example.com/*"]
      to:
        - operation:
            methods: ["GET"]
      when:
        - key: request.auth.claims[role]
          values: ["viewer"]

    # Reject requests without valid JWT (this is implicit when
    # RequestAuthentication rejects invalid tokens, but we need
    # to ensure requests without tokens are also rejected)

Important: RequestAuthentication only validates tokens that are present. It does not reject requests without tokens. To require a token, pair it with an AuthorizationPolicy that checks for requestPrincipals.

External Authorization with OPA

For complex authorization logic that cannot be expressed in AuthorizationPolicy, delegate to Open Policy Agent (OPA):

Deploy OPA

apiVersion: apps/v1
kind: Deployment
metadata:
  name: opa-authorizer
  namespace: istio-system
spec:
  replicas: 2
  selector:
    matchLabels:
      app: opa-authorizer
  template:
    metadata:
      labels:
        app: opa-authorizer
      annotations:
        sidecar.istio.io/inject: "false"  # OPA doesn't need a sidecar
    spec:
      containers:
        - name: opa
          image: openpolicyagent/opa:latest-envoy
          ports:
            - containerPort: 9191  # gRPC for Envoy ext_authz
            - containerPort: 8181  # HTTP API for policy management
            - containerPort: 8282  # Diagnostics
          args:
            - "run"
            - "--server"
            - "--addr=0.0.0.0:8181"
            - "--diagnostic-addr=0.0.0.0:8282"
            - "--set=plugins.envoy_ext_authz_grpc.addr=0.0.0.0:9191"
            - "--set=plugins.envoy_ext_authz_grpc.path=istio/authz/allow"
            - "--set=decision_logs.console=true"
            - "/policies"
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 256Mi
          volumeMounts:
            - name: policies
              mountPath: /policies
          livenessProbe:
            httpGet:
              path: /health?plugins
              port: 8282
            initialDelaySeconds: 5
          readinessProbe:
            httpGet:
              path: /health?plugins
              port: 8282
            initialDelaySeconds: 5
      volumes:
        - name: policies
          configMap:
            name: opa-policies
---
apiVersion: v1
kind: Service
metadata:
  name: opa-authorizer
  namespace: istio-system
spec:
  selector:
    app: opa-authorizer
  ports:
    - name: grpc
      port: 9191
      targetPort: 9191
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: opa-policies
  namespace: istio-system
data:
  policy.rego: |
    package istio.authz

    import input.attributes.request.http as http_request
    import input.attributes.source.principal as source_principal

    default allow = false

    # Allow health checks
    allow {
        http_request.method == "GET"
        http_request.path == "/healthz"
    }

    # Allow requests from known services during business hours
    allow {
        source_principal != ""
        is_business_hours
    }

    # Rate limit: deny if source has made too many requests
    # (simplified, real implementation would check a shared counter)
    allow {
        source_principal != ""
        not is_rate_limited
    }

    is_business_hours {
        time.clock(time.now_ns()) == [h, m, s]
        h >= 8
        h < 22
    }

    is_rate_limited = false

Configure Istio to Use OPA

Register the provider in the Istio mesh configuration:

# In IstioOperator or Helm values
meshConfig:
  extensionProviders:
    - name: opa-authorizer
      envoyExtAuthzGrpc:
        service: opa-authorizer.istio-system.svc.cluster.local
        port: 9191
        timeout: 500ms
        failOpen: false  # Deny if OPA is unreachable

Then create the AuthorizationPolicy:

apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: opa-ext-auth
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-service
  action: CUSTOM
  provider:
    name: opa-authorizer
  rules:
    - to:
        - operation:
            paths: ["/api/*"]

Migrating to Strict mTLS Without Downtime

Moving from PERMISSIVE to STRICT mTLS requires careful planning to avoid breaking non-mesh services. This is a multi-step process that should be executed over days or weeks, not hours.

Step 1: Audit Current State

# Check which services are using mTLS
istioctl x describe pod -n production deploy/api-service

# Check all PeerAuthentication policies
kubectl get peerauthentication --all-namespaces

# Find pods without sidecars (these will break under STRICT mTLS)
kubectl get pods --all-namespaces -o jsonpath='{range .items[*]}{.metadata.namespace}/{.metadata.name} {.spec.containers[*].name}{"\n"}{end}' | grep -v istio-proxy | grep -v kube-system

# Check actual mTLS status between services
istioctl proxy-config listeners deploy/api-service -o json | \
  jq '.[].filterChains[].transportSocket.typedConfig.commonTlsContext'

Step 2: Ensure PERMISSIVE Mode is the Starting State

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system
spec:
  mtls:
    mode: PERMISSIVE

Step 3: Add Sidecars to All Workloads

Label all application namespaces for injection and restart deployments:

# Label namespaces
for ns in production staging batch; do
  kubectl label namespace $ns istio-injection=enabled --overwrite
done

# Restart deployments to inject sidecars
for ns in production staging batch; do
  kubectl rollout restart deployment -n $ns
  kubectl rollout status deployment --all -n $ns --timeout=300s
done

Step 4: Verify mTLS is Working (While Still Permissive)

# Use Kiali to verify mTLS connections
istioctl dashboard kiali

# Check mTLS status between specific services
istioctl proxy-config endpoints deploy/frontend -n production | grep reviews

# Look for mTLS indicators in access logs
kubectl logs deploy/api-service -c istio-proxy -n production | \
  grep -o '"upstream_transport_failure_reason":"[^"]*"' | sort | uniq -c

Step 5: Migrate Namespace by Namespace

Start with the least critical namespace:

# Enable strict mTLS for staging first
kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default
  namespace: staging
spec:
  mtls:
    mode: STRICT
EOF

# Verify no broken connections
kubectl logs -n staging -l app=api-service -c istio-proxy --tail=50 | grep -i "tls\|error\|refused"

# Run integration tests against staging
# If tests pass, move to the next namespace

Step 6: Handle Non-Mesh Services

For services that cannot run sidecars (databases, legacy systems, external services), keep specific workloads in PERMISSIVE mode:

# Allow plaintext from external MySQL client
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: mysql-permissive
  namespace: production
spec:
  selector:
    matchLabels:
      app: mysql
  mtls:
    mode: PERMISSIVE

Also create a DestinationRule to configure how the mesh communicates with non-mesh services:

# Disable mTLS when talking to external database
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: external-database
  namespace: production
spec:
  host: database.legacy.svc.cluster.local
  trafficPolicy:
    tls:
      mode: DISABLE

Step 7: Enable Mesh-Wide STRICT

Once all namespaces are verified:

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system
spec:
  mtls:
    mode: STRICT

Debugging mTLS and Security Issues

Connection Refused

# Check if the destination has a sidecar
kubectl get pod -n production -l app=api-service -o jsonpath='{.items[0].spec.containers[*].name}'
# If istio-proxy is missing, the pod cannot terminate mTLS

# Check mTLS mode mismatch
istioctl proxy-config listeners deploy/frontend -n production --port 8080

# Check the actual TLS handshake
kubectl exec deploy/frontend -c istio-proxy -n production -- \
  openssl s_client -connect api-service.production:8080 -servername api-service.production

Certificate Errors

# Check certificate validity and issuer
istioctl proxy-config secret deploy/api-service -n production -o json | \
  jq -r '.dynamicActiveSecrets[0].secret.tlsCertificate.certificateChain.inlineBytes' | \
  base64 -d | openssl x509 -text -noout

# Check certificate expiration
istioctl proxy-config secret deploy/api-service -n production -o json | \
  jq -r '.dynamicActiveSecrets[0].secret.tlsCertificate.certificateChain.inlineBytes' | \
  base64 -d | openssl x509 -enddate -noout

# Verify CA certificates match between services
istioctl proxy-config secret deploy/frontend -n production
istioctl proxy-config secret deploy/api-service -n production

Authorization Policy Not Taking Effect

# Evaluation order: CUSTOM -> DENY -> ALLOW
# DENY policies always override ALLOW

# Verify the policy selects the right workload
kubectl get authorizationpolicy -n production -o yaml | grep -A5 "selector"

# Check Envoy logs for RBAC denials
kubectl logs deploy/api-service -c istio-proxy -n production | grep "rbac_access_denied"

# Get detailed RBAC debug info
kubectl logs deploy/api-service -c istio-proxy -n production | grep "enforced_policy"

# Analyze all policies affecting a workload
istioctl x authz check deploy/api-service -n production

Common Issues and Solutions

IssueSymptomSolution
mTLS mode mismatchConnection reset, 503 errorsEnsure both source and destination have matching TLS settings
Missing sidecarCannot terminate mTLSAdd injection label, restart deployment
Stale certificatesTLS handshake failureRestart istiod, check CA secret
Wrong principal in policyRBAC deniedVerify service account name matches principals
Policy not appliedAll traffic allowed/deniedCheck selector labels match pod labels
JWT validation failure401 UnauthorizedVerify JWKS URI is accessible, check token expiration

Security Best Practices Checklist

  1. Enable STRICT mTLS mesh-wide after migrating all workloads
  2. Use deny-all baseline in every namespace, then add specific ALLOW policies
  3. Use custom CA certificates for production --- do not rely on Istio's self-signed CA
  4. Rotate CA certificates before they expire, with overlap period
  5. Apply AuthorizationPolicies per service, not per namespace
  6. Use SPIFFE principals, not source IPs, in policies (IPs change, identities do not)
  7. Require JWT for external traffic via RequestAuthentication
  8. Set failOpen: false for external authorization providers
  9. Monitor RBAC denials and alert on unexpected patterns
  10. Audit policies regularly --- use istioctl analyze and Kiali's validation
  11. Use separate service accounts for each deployment (not the default SA)
  12. Keep workload certificate TTL short (24 hours or less)

Summary

Istio's security model gives you zero-trust networking without application changes. Start with PERMISSIVE mTLS to verify everything works, migrate to STRICT namespace by namespace, and layer AuthorizationPolicies on top to control exactly which services can communicate. Use RequestAuthentication for external JWT validation, delegate complex authorization to OPA when Istio's built-in policies are not expressive enough, and always maintain a deny-all baseline policy. The goal is a mesh where every connection is encrypted, every identity is verified, and every request is explicitly authorized. The migration path from a permissive network to full zero-trust is incremental --- there is no need for a big-bang switch that risks breaking everything at once. Take it namespace by namespace, verify at each step, and use Istio's observability tools to confirm that mTLS and authorization are working as expected before moving forward.

Share:
Riku Tanaka
Riku Tanaka

SRE & Observability Engineer

If it's not measured, it doesn't exist. SLO-driven, metrics-obsessed, and the person who gets paged at 3 AM so you don't have to. Observability isn't optional.

Related Articles