Istio mTLS & Security: Zero-Trust Service Communication

In a traditional network, services behind the firewall trust each other implicitly. A compromised service can freely communicate with any other internal service, move laterally, and exfiltrate data. This perimeter-based security model has been proven inadequate repeatedly --- from the Target breach in 2013 to the SolarWinds attack in 2020, attackers who gain a foothold inside the network face few barriers to lateral movement. Zero-trust networking eliminates this assumption by requiring every service to prove its identity and have explicit authorization for each request. Istio implements zero-trust through mutual TLS for identity verification and AuthorizationPolicies for fine-grained access control, all without requiring changes to your application code.

This guide covers how to enable and enforce mTLS across your mesh, build a comprehensive authorization policy layer, integrate external authentication providers, handle edge cases like non-mesh services and legacy protocols, and debug security issues in production environments.

Why mTLS Matters

Standard TLS (what your browser uses) is one-directional: the client verifies the server's identity, but the server does not verify the client. Mutual TLS adds client-side certificate verification, meaning both parties authenticate each other. In the context of a service mesh, this distinction is critical.

In a service mesh context, mTLS provides:

Identity --- Each service has a cryptographic identity (SPIFFE ID) issued by Istio's certificate authority. This identity is bound to the Kubernetes service account, not to network addresses.
Encryption --- All traffic between services is encrypted with TLS 1.3, even within the cluster network. Anyone sniffing the network sees only encrypted traffic.
Integrity --- Tampering with in-flight data is detected through cryptographic message authentication.
No code changes --- The Envoy sidecar handles all TLS operations transparently. Your application communicates over plaintext localhost, and the sidecar encrypts/decrypts at the pod boundary.
Automatic rotation --- Certificates are rotated automatically (default: every 24 hours), eliminating the operational burden of manual certificate management.

Without mTLS, anyone who gains access to your cluster network can sniff inter-service traffic, impersonate services, and inject malicious responses. With mTLS, even a compromised node can only see encrypted traffic for which it does not have the private keys.

Zero-Trust Security Model

The zero-trust model that Istio implements follows these principles:

Principle	How Istio Implements It
Never trust, always verify	mTLS verifies identity on every connection
Least privilege access	AuthorizationPolicies restrict what each service can access
Assume breach	Encryption prevents lateral eavesdropping
Verify explicitly	JWT validation for external requests
Continuous monitoring	Access logs and metrics for all traffic

PeerAuthentication: Controlling mTLS Mode

The PeerAuthentication resource controls whether services require mTLS for incoming connections. It operates at three levels of granularity: mesh-wide, namespace, and workload.

Modes

Mode	Behavior	Use When
PERMISSIVE	Accept both plaintext and mTLS connections (default)	During migration to mTLS
STRICT	Only accept mTLS connections	Production, fully meshed namespaces
DISABLE	Accept only plaintext connections	Debugging, or services that cannot use mTLS
UNSET	Inherit from parent scope	Using the hierarchy for configuration

Mesh-Wide mTLS

Enable STRICT mTLS for the entire mesh by placing the policy in the istio-system namespace:

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system  # Mesh-wide when in istio-system
spec:
  mtls:
    mode: STRICT

This is the target state for a production mesh. Every connection between meshed services must use mTLS. Any plaintext connection attempt will be rejected.

Namespace-Level mTLS

Override the mesh-wide setting for a specific namespace:

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default
  namespace: payments  # Only applies to this namespace
spec:
  mtls:
    mode: STRICT

This is useful when migrating namespace by namespace. You can enable STRICT for namespaces that are fully meshed while keeping others in PERMISSIVE mode.

Workload-Level mTLS

Target a specific workload within a namespace:

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: mysql-permissive
  namespace: data
spec:
  selector:
    matchLabels:
      app: mysql
  mtls:
    mode: PERMISSIVE  # MySQL clients outside the mesh need plaintext access

Port-Level mTLS

Disable mTLS on specific ports while keeping it strict on others:

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: api-service
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-service
  mtls:
    mode: STRICT
  portLevelMtls:
    8080:
      mode: STRICT      # Application traffic
    9090:
      mode: PERMISSIVE  # Prometheus scrape port
    15021:
      mode: DISABLE     # Health check port

Policy Precedence

When multiple PeerAuthentication policies overlap, the most specific one wins:

Workload-level (most specific)
    |
    v
Namespace-level
    |
    v
Mesh-wide (least specific)

If a workload-level policy exists for a pod, it takes full precedence over namespace and mesh-wide policies. There is no merging --- the most specific policy replaces everything above it.

Certificate Management

Istio's control plane (istiod) acts as a certificate authority (CA). It automatically issues X.509 certificates to every workload in the mesh without any manual intervention.

How Certificate Provisioning Works

When a pod with an Envoy sidecar starts, the sidecar generates a private key and a Certificate Signing Request (CSR)
The CSR is sent to istiod over a secure gRPC channel, authenticated using the Kubernetes service account token
istiod validates the CSR against the pod's service account and namespace
istiod signs the certificate using its CA key and returns the signed certificate
The sidecar loads the certificate and begins accepting mTLS connections
Before the certificate expires (default: 24 hours), the sidecar automatically generates a new CSR and repeats the process

This entire flow happens without any operator intervention.

SPIFFE Identity

Each workload receives a SPIFFE-compliant identity encoded in the certificate's Subject Alternative Name (SAN):

spiffe://cluster.local/ns/NAMESPACE/sa/SERVICE_ACCOUNT

For example, a service running as the reviews service account in the default namespace gets:

spiffe://cluster.local/ns/default/sa/reviews

This identity is what AuthorizationPolicies use to control access. The identity is cryptographically bound to the workload through the certificate chain, making it unforgeable.

Certificate Rotation and Lifetimes

Setting	Default	Recommended Production	How to Change
Workload cert lifetime	24 hours	12-24 hours	`--set values.pilot.env.DEFAULT_WORKLOAD_CERT_TTL=12h`
CA cert lifetime	10 years (self-signed)	1-3 years (custom CA)	Use custom CA certificates
Root cert lifetime	10 years	5-10 years	Generate with long expiry
Grace period for rotation	50% of lifetime	50% of lifetime	Automatic, not configurable

Short-lived workload certificates are a security advantage: even if a certificate is compromised, it expires quickly and cannot be reused.

Using Custom CA Certificates

For production, you typically want to use your own root CA instead of Istio's self-signed one. This is critical when:

You need certificates trusted by external systems
Compliance requires a specific certificate chain
You use multiple clusters that need to trust each other
You have an existing PKI infrastructure

# Generate a root CA (if you don't have one)
openssl req -new -newkey rsa:4096 -x509 -sha256 \
  -days 3650 -nodes \
  -subj "/O=Company Inc./CN=Root CA" \
  -keyout root-key.pem -out root-cert.pem

# Generate an intermediate CA for Istio
openssl req -new -newkey rsa:4096 -sha256 -nodes \
  -subj "/O=Company Inc./CN=Istio Intermediate CA" \
  -keyout ca-key.pem -out ca-csr.pem

openssl x509 -req -days 730 -sha256 \
  -CA root-cert.pem -CAkey root-key.pem -CAcreateserial \
  -in ca-csr.pem -out ca-cert.pem \
  -extfile <(printf "basicConstraints=CA:TRUE\nkeyUsage=critical,digitalSignature,keyCertSign,cRLSign")

# Create the certificate chain
cat ca-cert.pem root-cert.pem > cert-chain.pem

# Create the Kubernetes secret
kubectl create secret generic cacerts -n istio-system \
  --from-file=ca-cert.pem \
  --from-file=ca-key.pem \
  --from-file=root-cert.pem \
  --from-file=cert-chain.pem

# Restart istiod to pick up the new CA
kubectl rollout restart deployment istiod -n istio-system

# Verify the new CA is being used
istioctl proxy-config secret deploy/myapp -o json | \
  jq -r '.dynamicActiveSecrets[0].secret.tlsCertificate.certificateChain.inlineBytes' | \
  base64 -d | openssl x509 -text -noout | grep "Issuer:"

Multi-Cluster Certificate Trust

For multi-cluster meshes where services in different clusters need to communicate with mTLS, all clusters must share the same root CA:

# Use the same root-cert.pem and generate unique intermediate CAs per cluster
# Cluster 1
kubectl create secret generic cacerts -n istio-system \
  --from-file=ca-cert.pem=cluster1-ca-cert.pem \
  --from-file=ca-key.pem=cluster1-ca-key.pem \
  --from-file=root-cert.pem \
  --from-file=cert-chain.pem=cluster1-cert-chain.pem \
  --context=cluster1

# Cluster 2
kubectl create secret generic cacerts -n istio-system \
  --from-file=ca-cert.pem=cluster2-ca-cert.pem \
  --from-file=ca-key.pem=cluster2-ca-key.pem \
  --from-file=root-cert.pem \
  --from-file=cert-chain.pem=cluster2-cert-chain.pem \
  --context=cluster2

AuthorizationPolicy: Access Control

AuthorizationPolicies define who can access what. They operate on the identity established by mTLS (for in-mesh traffic) or JWT tokens (for external clients).

Policy Actions

Action	Behavior	Evaluation Order
DENY	Deny matching requests	Evaluated first
CUSTOM	Delegate to an external authorization service	Evaluated second
ALLOW	Allow matching requests (deny all others when any ALLOW policy exists)	Evaluated third
AUDIT	Log matching requests (does not affect allow/deny)	Evaluated alongside others

When multiple policies exist for a workload, the evaluation order is: CUSTOM, DENY, ALLOW. If a DENY policy matches, the request is denied regardless of any ALLOW policies. If no ALLOW policy exists, all traffic is allowed (unless a DENY matches).

Deny-All Baseline

Start with a deny-all policy, then explicitly allow required communication. This is the foundation of zero-trust:

# Deny all traffic in the namespace
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: deny-all
  namespace: production
spec:
  {}  # Empty spec means deny all

After applying this, all traffic to services in the production namespace will be denied. You then add ALLOW policies for each legitimate communication path.

Allow Specific Service Communication

# Allow frontend to call the API service on specific paths
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: allow-frontend-to-api
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-service
  action: ALLOW
  rules:
    - from:
        - source:
            principals:
              - "cluster.local/ns/production/sa/frontend"
      to:
        - operation:
            methods: ["GET", "POST"]
            paths: ["/api/*"]

Allow by Namespace

# Allow any service in the monitoring namespace to scrape metrics
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: allow-monitoring
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-service
  action: ALLOW
  rules:
    - from:
        - source:
            namespaces: ["monitoring", "istio-system"]
      to:
        - operation:
            methods: ["GET"]
            paths: ["/metrics", "/healthz", "/readyz"]

Deny Specific Sources

# Block a compromised service immediately
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: block-compromised
  namespace: production
spec:
  selector:
    matchLabels:
      app: database
  action: DENY
  rules:
    - from:
        - source:
            principals:
              - "cluster.local/ns/production/sa/compromised-service"

Complex Authorization Rules

apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: api-access
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-service
  action: ALLOW
  rules:
    # Internal services can call any endpoint
    - from:
        - source:
            principals:
              - "cluster.local/ns/production/sa/frontend"
              - "cluster.local/ns/production/sa/mobile-bff"
      to:
        - operation:
            methods: ["GET", "POST", "PUT", "DELETE"]

    # Batch processing service can only access batch endpoints
    - from:
        - source:
            principals:
              - "cluster.local/ns/batch/sa/batch-processor"
      to:
        - operation:
            methods: ["POST"]
            paths: ["/api/v2/batch/*"]

    # External JWT-authenticated users can only read with v2 API
    - from:
        - source:
            requestPrincipals: ["https://auth.example.com/*"]
      to:
        - operation:
            methods: ["GET"]
      when:
        - key: request.headers[x-api-version]
          values: ["v2"]

    # Allow health checks from anywhere (no source restriction)
    - to:
        - operation:
            methods: ["GET"]
            paths: ["/healthz", "/readyz"]

Production Authorization Policy Pattern

For a typical microservices application, build policies layer by layer:

# 1. Deny all by default
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: deny-all
  namespace: production
spec: {}
---
# 2. Allow ingress gateway to reach frontend
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: allow-ingress-to-frontend
  namespace: production
spec:
  selector:
    matchLabels:
      app: frontend
  action: ALLOW
  rules:
    - from:
        - source:
            namespaces: ["istio-ingress"]
---
# 3. Allow frontend to reach API
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: allow-frontend-to-api
  namespace: production
spec:
  selector:
    matchLabels:
      app: api
  action: ALLOW
  rules:
    - from:
        - source:
            principals: ["cluster.local/ns/production/sa/frontend"]
---
# 4. Allow API to reach database
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: allow-api-to-db
  namespace: production
spec:
  selector:
    matchLabels:
      app: database
  action: ALLOW
  rules:
    - from:
        - source:
            principals: ["cluster.local/ns/production/sa/api"]
      to:
        - operation:
            ports: ["5432"]
---
# 5. Allow monitoring everywhere
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: allow-monitoring
  namespace: production
spec:
  action: ALLOW
  rules:
    - from:
        - source:
            namespaces: ["monitoring"]
      to:
        - operation:
            methods: ["GET"]
            paths: ["/metrics"]

JWT Authentication with RequestAuthentication

For external traffic (from outside the mesh), use JWT tokens for authentication:

apiVersion: security.istio.io/v1
kind: RequestAuthentication
metadata:
  name: jwt-auth
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-service
  jwtRules:
    - issuer: "https://auth.example.com"
      jwksUri: "https://auth.example.com/.well-known/jwks.json"
      audiences:
        - "api.example.com"
      forwardOriginalToken: true
      outputPayloadToHeader: "x-jwt-payload"
      fromHeaders:
        - name: Authorization
          prefix: "Bearer "
      fromParams:
        - "access_token"
    # Support multiple identity providers
    - issuer: "https://accounts.google.com"
      jwksUri: "https://www.googleapis.com/oauth2/v3/certs"
      audiences:
        - "your-google-client-id.apps.googleusercontent.com"

Then combine with an AuthorizationPolicy to enforce JWT claims:

apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: require-jwt
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-service
  action: ALLOW
  rules:
    # Admin users can access everything
    - from:
        - source:
            requestPrincipals: ["https://auth.example.com/*"]
      when:
        - key: request.auth.claims[role]
          values: ["admin"]

    # Editor users can read and write
    - from:
        - source:
            requestPrincipals: ["https://auth.example.com/*"]
      to:
        - operation:
            methods: ["GET", "POST", "PUT"]
      when:
        - key: request.auth.claims[role]
          values: ["editor"]

    # Viewer users can only read
    - from:
        - source:
            requestPrincipals: ["https://auth.example.com/*"]
      to:
        - operation:
            methods: ["GET"]
      when:
        - key: request.auth.claims[role]
          values: ["viewer"]

    # Reject requests without valid JWT (this is implicit when
    # RequestAuthentication rejects invalid tokens, but we need
    # to ensure requests without tokens are also rejected)

Important: RequestAuthentication only validates tokens that are present. It does not reject requests without tokens. To require a token, pair it with an AuthorizationPolicy that checks for requestPrincipals.

External Authorization with OPA

For complex authorization logic that cannot be expressed in AuthorizationPolicy, delegate to Open Policy Agent (OPA):

Deploy OPA

apiVersion: apps/v1
kind: Deployment
metadata:
  name: opa-authorizer
  namespace: istio-system
spec:
  replicas: 2
  selector:
    matchLabels:
      app: opa-authorizer
  template:
    metadata:
      labels:
        app: opa-authorizer
      annotations:
        sidecar.istio.io/inject: "false"  # OPA doesn't need a sidecar
    spec:
      containers:
        - name: opa
          image: openpolicyagent/opa:latest-envoy
          ports:
            - containerPort: 9191  # gRPC for Envoy ext_authz
            - containerPort: 8181  # HTTP API for policy management
            - containerPort: 8282  # Diagnostics
          args:
            - "run"
            - "--server"
            - "--addr=0.0.0.0:8181"
            - "--diagnostic-addr=0.0.0.0:8282"
            - "--set=plugins.envoy_ext_authz_grpc.addr=0.0.0.0:9191"
            - "--set=plugins.envoy_ext_authz_grpc.path=istio/authz/allow"
            - "--set=decision_logs.console=true"
            - "/policies"
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 256Mi
          volumeMounts:
            - name: policies
              mountPath: /policies
          livenessProbe:
            httpGet:
              path: /health?plugins
              port: 8282
            initialDelaySeconds: 5
          readinessProbe:
            httpGet:
              path: /health?plugins
              port: 8282
            initialDelaySeconds: 5
      volumes:
        - name: policies
          configMap:
            name: opa-policies
---
apiVersion: v1
kind: Service
metadata:
  name: opa-authorizer
  namespace: istio-system
spec:
  selector:
    app: opa-authorizer
  ports:
    - name: grpc
      port: 9191
      targetPort: 9191
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: opa-policies
  namespace: istio-system
data:
  policy.rego: |
    package istio.authz

    import input.attributes.request.http as http_request
    import input.attributes.source.principal as source_principal

    default allow = false

    # Allow health checks
    allow {
        http_request.method == "GET"
        http_request.path == "/healthz"
    }

    # Allow requests from known services during business hours
    allow {
        source_principal != ""
        is_business_hours
    }

    # Rate limit: deny if source has made too many requests
    # (simplified, real implementation would check a shared counter)
    allow {
        source_principal != ""
        not is_rate_limited
    }

    is_business_hours {
        time.clock(time.now_ns()) == [h, m, s]
        h >= 8
        h < 22
    }

    is_rate_limited = false

Configure Istio to Use OPA

# In IstioOperator or Helm values
meshConfig:
  extensionProviders:
    - name: opa-authorizer
      envoyExtAuthzGrpc:
        service: opa-authorizer.istio-system.svc.cluster.local
        port: 9191
        timeout: 500ms
        failOpen: false  # Deny if OPA is unreachable

Then create the AuthorizationPolicy:

apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: opa-ext-auth
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-service
  action: CUSTOM
  provider:
    name: opa-authorizer
  rules:
    - to:
        - operation:
            paths: ["/api/*"]

Migrating to Strict mTLS Without Downtime

Moving from PERMISSIVE to STRICT mTLS requires careful planning to avoid breaking non-mesh services. This is a multi-step process that should be executed over days or weeks, not hours.

Step 1: Audit Current State

# Check which services are using mTLS
istioctl x describe pod -n production deploy/api-service

# Check all PeerAuthentication policies
kubectl get peerauthentication --all-namespaces

# Find pods without sidecars (these will break under STRICT mTLS)
kubectl get pods --all-namespaces -o jsonpath='{range .items[*]}{.metadata.namespace}/{.metadata.name} {.spec.containers[*].name}{"\n"}{end}' | grep -v istio-proxy | grep -v kube-system

# Check actual mTLS status between services
istioctl proxy-config listeners deploy/api-service -o json | \
  jq '.[].filterChains[].transportSocket.typedConfig.commonTlsContext'

Step 2: Ensure PERMISSIVE Mode is the Starting State

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system
spec:
  mtls:
    mode: PERMISSIVE

Step 3: Add Sidecars to All Workloads

Label all application namespaces for injection and restart deployments:

# Label namespaces
for ns in production staging batch; do
  kubectl label namespace $ns istio-injection=enabled --overwrite
done

# Restart deployments to inject sidecars
for ns in production staging batch; do
  kubectl rollout restart deployment -n $ns
  kubectl rollout status deployment --all -n $ns --timeout=300s
done

Step 4: Verify mTLS is Working (While Still Permissive)

# Use Kiali to verify mTLS connections
istioctl dashboard kiali

# Check mTLS status between specific services
istioctl proxy-config endpoints deploy/frontend -n production | grep reviews

# Look for mTLS indicators in access logs
kubectl logs deploy/api-service -c istio-proxy -n production | \
  grep -o '"upstream_transport_failure_reason":"[^"]*"' | sort | uniq -c

Step 5: Migrate Namespace by Namespace

Start with the least critical namespace:

# Enable strict mTLS for staging first
kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default
  namespace: staging
spec:
  mtls:
    mode: STRICT
EOF

# Verify no broken connections
kubectl logs -n staging -l app=api-service -c istio-proxy --tail=50 | grep -i "tls\|error\|refused"

# Run integration tests against staging
# If tests pass, move to the next namespace

Step 6: Handle Non-Mesh Services

For services that cannot run sidecars (databases, legacy systems, external services), keep specific workloads in PERMISSIVE mode:

# Allow plaintext from external MySQL client
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: mysql-permissive
  namespace: production
spec:
  selector:
    matchLabels:
      app: mysql
  mtls:
    mode: PERMISSIVE

Also create a DestinationRule to configure how the mesh communicates with non-mesh services:

# Disable mTLS when talking to external database
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: external-database
  namespace: production
spec:
  host: database.legacy.svc.cluster.local
  trafficPolicy:
    tls:
      mode: DISABLE

Step 7: Enable Mesh-Wide STRICT

Once all namespaces are verified:

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system
spec:
  mtls:
    mode: STRICT

Debugging mTLS and Security Issues

Connection Refused

# Check if the destination has a sidecar
kubectl get pod -n production -l app=api-service -o jsonpath='{.items[0].spec.containers[*].name}'
# If istio-proxy is missing, the pod cannot terminate mTLS

# Check mTLS mode mismatch
istioctl proxy-config listeners deploy/frontend -n production --port 8080

# Check the actual TLS handshake
kubectl exec deploy/frontend -c istio-proxy -n production -- \
  openssl s_client -connect api-service.production:8080 -servername api-service.production

Certificate Errors

# Check certificate validity and issuer
istioctl proxy-config secret deploy/api-service -n production -o json | \
  jq -r '.dynamicActiveSecrets[0].secret.tlsCertificate.certificateChain.inlineBytes' | \
  base64 -d | openssl x509 -text -noout

# Check certificate expiration
istioctl proxy-config secret deploy/api-service -n production -o json | \
  jq -r '.dynamicActiveSecrets[0].secret.tlsCertificate.certificateChain.inlineBytes' | \
  base64 -d | openssl x509 -enddate -noout

# Verify CA certificates match between services
istioctl proxy-config secret deploy/frontend -n production
istioctl proxy-config secret deploy/api-service -n production

Authorization Policy Not Taking Effect

# Evaluation order: CUSTOM -> DENY -> ALLOW
# DENY policies always override ALLOW

# Verify the policy selects the right workload
kubectl get authorizationpolicy -n production -o yaml | grep -A5 "selector"

# Check Envoy logs for RBAC denials
kubectl logs deploy/api-service -c istio-proxy -n production | grep "rbac_access_denied"

# Get detailed RBAC debug info
kubectl logs deploy/api-service -c istio-proxy -n production | grep "enforced_policy"

# Analyze all policies affecting a workload
istioctl x authz check deploy/api-service -n production

Common Issues and Solutions

Issue	Symptom	Solution
mTLS mode mismatch	Connection reset, 503 errors	Ensure both source and destination have matching TLS settings
Missing sidecar	Cannot terminate mTLS	Add injection label, restart deployment
Stale certificates	TLS handshake failure	Restart istiod, check CA secret
Wrong principal in policy	RBAC denied	Verify service account name matches `principals`
Policy not applied	All traffic allowed/denied	Check selector labels match pod labels
JWT validation failure	401 Unauthorized	Verify JWKS URI is accessible, check token expiration

Security Best Practices Checklist

Enable STRICT mTLS mesh-wide after migrating all workloads
Use deny-all baseline in every namespace, then add specific ALLOW policies
Use custom CA certificates for production --- do not rely on Istio's self-signed CA
Rotate CA certificates before they expire, with overlap period
Apply AuthorizationPolicies per service, not per namespace
Use SPIFFE principals, not source IPs, in policies (IPs change, identities do not)
Require JWT for external traffic via RequestAuthentication
Set failOpen: false for external authorization providers
Monitor RBAC denials and alert on unexpected patterns
Audit policies regularly --- use istioctl analyze and Kiali's validation
Use separate service accounts for each deployment (not the default SA)
Keep workload certificate TTL short (24 hours or less)

Summary

Istio's security model gives you zero-trust networking without application changes. Start with PERMISSIVE mTLS to verify everything works, migrate to STRICT namespace by namespace, and layer AuthorizationPolicies on top to control exactly which services can communicate. Use RequestAuthentication for external JWT validation, delegate complex authorization to OPA when Istio's built-in policies are not expressive enough, and always maintain a deny-all baseline policy. The goal is a mesh where every connection is encrypted, every identity is verified, and every request is explicitly authorized. The migration path from a permissive network to full zero-trust is incremental --- there is no need for a big-bang switch that risks breaking everything at once. Take it namespace by namespace, verify at each step, and use Istio's observability tools to confirm that mTLS and authorization are working as expected before moving forward.

On this page

Related Articles

Istio Observability and Authorization: Distributed Tracing, Metrics, and Access Policies

Istio Installation & Architecture: Your First Service Mesh

Istio Observability: Kiali, Jaeger, and Prometheus Integration

Istio Traffic Management: Routing, Canary, and Circuit Breaking

Istio AuthorizationPolicy For Namespace-Level RBAC In Multi-Tenant Kubernetes Clusters

Istio Pilot Discovery XDS Push Errors: Fixing "No Healthy Upstream" In Multi-Cluster Mesh Deployments

More in Istio

Istio EnvoyFilter Custom Configuration For Advanced HTTP Header Manipulation

Istio Gateway Returns 503 Service Unavailable: Debugging VirtualService And DestinationRule Misconfigurations

Fix Istio Sidecar Injection Not Working

Discussion