Istio mTLS & Security: Zero-Trust Service Communication
In a traditional network, services behind the firewall trust each other implicitly. A compromised service can freely communicate with any other internal service, move laterally, and exfiltrate data. This perimeter-based security model has been proven inadequate repeatedly --- from the Target breach in 2013 to the SolarWinds attack in 2020, attackers who gain a foothold inside the network face few barriers to lateral movement. Zero-trust networking eliminates this assumption by requiring every service to prove its identity and have explicit authorization for each request. Istio implements zero-trust through mutual TLS for identity verification and AuthorizationPolicies for fine-grained access control, all without requiring changes to your application code.
This guide covers how to enable and enforce mTLS across your mesh, build a comprehensive authorization policy layer, integrate external authentication providers, handle edge cases like non-mesh services and legacy protocols, and debug security issues in production environments.
Why mTLS Matters
Standard TLS (what your browser uses) is one-directional: the client verifies the server's identity, but the server does not verify the client. Mutual TLS adds client-side certificate verification, meaning both parties authenticate each other. In the context of a service mesh, this distinction is critical.
In a service mesh context, mTLS provides:
- Identity --- Each service has a cryptographic identity (SPIFFE ID) issued by Istio's certificate authority. This identity is bound to the Kubernetes service account, not to network addresses.
- Encryption --- All traffic between services is encrypted with TLS 1.3, even within the cluster network. Anyone sniffing the network sees only encrypted traffic.
- Integrity --- Tampering with in-flight data is detected through cryptographic message authentication.
- No code changes --- The Envoy sidecar handles all TLS operations transparently. Your application communicates over plaintext localhost, and the sidecar encrypts/decrypts at the pod boundary.
- Automatic rotation --- Certificates are rotated automatically (default: every 24 hours), eliminating the operational burden of manual certificate management.
Without mTLS, anyone who gains access to your cluster network can sniff inter-service traffic, impersonate services, and inject malicious responses. With mTLS, even a compromised node can only see encrypted traffic for which it does not have the private keys.
Zero-Trust Security Model
The zero-trust model that Istio implements follows these principles:
| Principle | How Istio Implements It |
|---|---|
| Never trust, always verify | mTLS verifies identity on every connection |
| Least privilege access | AuthorizationPolicies restrict what each service can access |
| Assume breach | Encryption prevents lateral eavesdropping |
| Verify explicitly | JWT validation for external requests |
| Continuous monitoring | Access logs and metrics for all traffic |
PeerAuthentication: Controlling mTLS Mode
The PeerAuthentication resource controls whether services require mTLS for incoming connections. It operates at three levels of granularity: mesh-wide, namespace, and workload.
Modes
| Mode | Behavior | Use When |
|---|---|---|
| PERMISSIVE | Accept both plaintext and mTLS connections (default) | During migration to mTLS |
| STRICT | Only accept mTLS connections | Production, fully meshed namespaces |
| DISABLE | Accept only plaintext connections | Debugging, or services that cannot use mTLS |
| UNSET | Inherit from parent scope | Using the hierarchy for configuration |
Mesh-Wide mTLS
Enable STRICT mTLS for the entire mesh by placing the policy in the istio-system namespace:
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-system # Mesh-wide when in istio-system
spec:
mtls:
mode: STRICT
This is the target state for a production mesh. Every connection between meshed services must use mTLS. Any plaintext connection attempt will be rejected.
Namespace-Level mTLS
Override the mesh-wide setting for a specific namespace:
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: default
namespace: payments # Only applies to this namespace
spec:
mtls:
mode: STRICT
This is useful when migrating namespace by namespace. You can enable STRICT for namespaces that are fully meshed while keeping others in PERMISSIVE mode.
Workload-Level mTLS
Target a specific workload within a namespace:
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: mysql-permissive
namespace: data
spec:
selector:
matchLabels:
app: mysql
mtls:
mode: PERMISSIVE # MySQL clients outside the mesh need plaintext access
Port-Level mTLS
Disable mTLS on specific ports while keeping it strict on others:
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: api-service
namespace: production
spec:
selector:
matchLabels:
app: api-service
mtls:
mode: STRICT
portLevelMtls:
8080:
mode: STRICT # Application traffic
9090:
mode: PERMISSIVE # Prometheus scrape port
15021:
mode: DISABLE # Health check port
Policy Precedence
When multiple PeerAuthentication policies overlap, the most specific one wins:
Workload-level (most specific)
|
v
Namespace-level
|
v
Mesh-wide (least specific)
If a workload-level policy exists for a pod, it takes full precedence over namespace and mesh-wide policies. There is no merging --- the most specific policy replaces everything above it.
Certificate Management
Istio's control plane (istiod) acts as a certificate authority (CA). It automatically issues X.509 certificates to every workload in the mesh without any manual intervention.
How Certificate Provisioning Works
- When a pod with an Envoy sidecar starts, the sidecar generates a private key and a Certificate Signing Request (CSR)
- The CSR is sent to istiod over a secure gRPC channel, authenticated using the Kubernetes service account token
- istiod validates the CSR against the pod's service account and namespace
- istiod signs the certificate using its CA key and returns the signed certificate
- The sidecar loads the certificate and begins accepting mTLS connections
- Before the certificate expires (default: 24 hours), the sidecar automatically generates a new CSR and repeats the process
This entire flow happens without any operator intervention.
SPIFFE Identity
Each workload receives a SPIFFE-compliant identity encoded in the certificate's Subject Alternative Name (SAN):
spiffe://cluster.local/ns/NAMESPACE/sa/SERVICE_ACCOUNT
For example, a service running as the reviews service account in the default namespace gets:
spiffe://cluster.local/ns/default/sa/reviews
This identity is what AuthorizationPolicies use to control access. The identity is cryptographically bound to the workload through the certificate chain, making it unforgeable.
Certificate Rotation and Lifetimes
| Setting | Default | Recommended Production | How to Change |
|---|---|---|---|
| Workload cert lifetime | 24 hours | 12-24 hours | --set values.pilot.env.DEFAULT_WORKLOAD_CERT_TTL=12h |
| CA cert lifetime | 10 years (self-signed) | 1-3 years (custom CA) | Use custom CA certificates |
| Root cert lifetime | 10 years | 5-10 years | Generate with long expiry |
| Grace period for rotation | 50% of lifetime | 50% of lifetime | Automatic, not configurable |
Short-lived workload certificates are a security advantage: even if a certificate is compromised, it expires quickly and cannot be reused.
Using Custom CA Certificates
For production, you typically want to use your own root CA instead of Istio's self-signed one. This is critical when:
- You need certificates trusted by external systems
- Compliance requires a specific certificate chain
- You use multiple clusters that need to trust each other
- You have an existing PKI infrastructure
# Generate a root CA (if you don't have one)
openssl req -new -newkey rsa:4096 -x509 -sha256 \
-days 3650 -nodes \
-subj "/O=Company Inc./CN=Root CA" \
-keyout root-key.pem -out root-cert.pem
# Generate an intermediate CA for Istio
openssl req -new -newkey rsa:4096 -sha256 -nodes \
-subj "/O=Company Inc./CN=Istio Intermediate CA" \
-keyout ca-key.pem -out ca-csr.pem
openssl x509 -req -days 730 -sha256 \
-CA root-cert.pem -CAkey root-key.pem -CAcreateserial \
-in ca-csr.pem -out ca-cert.pem \
-extfile <(printf "basicConstraints=CA:TRUE\nkeyUsage=critical,digitalSignature,keyCertSign,cRLSign")
# Create the certificate chain
cat ca-cert.pem root-cert.pem > cert-chain.pem
# Create the Kubernetes secret
kubectl create secret generic cacerts -n istio-system \
--from-file=ca-cert.pem \
--from-file=ca-key.pem \
--from-file=root-cert.pem \
--from-file=cert-chain.pem
# Restart istiod to pick up the new CA
kubectl rollout restart deployment istiod -n istio-system
# Verify the new CA is being used
istioctl proxy-config secret deploy/myapp -o json | \
jq -r '.dynamicActiveSecrets[0].secret.tlsCertificate.certificateChain.inlineBytes' | \
base64 -d | openssl x509 -text -noout | grep "Issuer:"
Multi-Cluster Certificate Trust
For multi-cluster meshes where services in different clusters need to communicate with mTLS, all clusters must share the same root CA:
# Use the same root-cert.pem and generate unique intermediate CAs per cluster
# Cluster 1
kubectl create secret generic cacerts -n istio-system \
--from-file=ca-cert.pem=cluster1-ca-cert.pem \
--from-file=ca-key.pem=cluster1-ca-key.pem \
--from-file=root-cert.pem \
--from-file=cert-chain.pem=cluster1-cert-chain.pem \
--context=cluster1
# Cluster 2
kubectl create secret generic cacerts -n istio-system \
--from-file=ca-cert.pem=cluster2-ca-cert.pem \
--from-file=ca-key.pem=cluster2-ca-key.pem \
--from-file=root-cert.pem \
--from-file=cert-chain.pem=cluster2-cert-chain.pem \
--context=cluster2
AuthorizationPolicy: Access Control
AuthorizationPolicies define who can access what. They operate on the identity established by mTLS (for in-mesh traffic) or JWT tokens (for external clients).
Policy Actions
| Action | Behavior | Evaluation Order |
|---|---|---|
| DENY | Deny matching requests | Evaluated first |
| CUSTOM | Delegate to an external authorization service | Evaluated second |
| ALLOW | Allow matching requests (deny all others when any ALLOW policy exists) | Evaluated third |
| AUDIT | Log matching requests (does not affect allow/deny) | Evaluated alongside others |
When multiple policies exist for a workload, the evaluation order is: CUSTOM, DENY, ALLOW. If a DENY policy matches, the request is denied regardless of any ALLOW policies. If no ALLOW policy exists, all traffic is allowed (unless a DENY matches).
Deny-All Baseline
Start with a deny-all policy, then explicitly allow required communication. This is the foundation of zero-trust:
# Deny all traffic in the namespace
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: deny-all
namespace: production
spec:
{} # Empty spec means deny all
After applying this, all traffic to services in the production namespace will be denied. You then add ALLOW policies for each legitimate communication path.
Allow Specific Service Communication
# Allow frontend to call the API service on specific paths
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: allow-frontend-to-api
namespace: production
spec:
selector:
matchLabels:
app: api-service
action: ALLOW
rules:
- from:
- source:
principals:
- "cluster.local/ns/production/sa/frontend"
to:
- operation:
methods: ["GET", "POST"]
paths: ["/api/*"]
Allow by Namespace
# Allow any service in the monitoring namespace to scrape metrics
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: allow-monitoring
namespace: production
spec:
selector:
matchLabels:
app: api-service
action: ALLOW
rules:
- from:
- source:
namespaces: ["monitoring", "istio-system"]
to:
- operation:
methods: ["GET"]
paths: ["/metrics", "/healthz", "/readyz"]
Deny Specific Sources
# Block a compromised service immediately
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: block-compromised
namespace: production
spec:
selector:
matchLabels:
app: database
action: DENY
rules:
- from:
- source:
principals:
- "cluster.local/ns/production/sa/compromised-service"
Complex Authorization Rules
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: api-access
namespace: production
spec:
selector:
matchLabels:
app: api-service
action: ALLOW
rules:
# Internal services can call any endpoint
- from:
- source:
principals:
- "cluster.local/ns/production/sa/frontend"
- "cluster.local/ns/production/sa/mobile-bff"
to:
- operation:
methods: ["GET", "POST", "PUT", "DELETE"]
# Batch processing service can only access batch endpoints
- from:
- source:
principals:
- "cluster.local/ns/batch/sa/batch-processor"
to:
- operation:
methods: ["POST"]
paths: ["/api/v2/batch/*"]
# External JWT-authenticated users can only read with v2 API
- from:
- source:
requestPrincipals: ["https://auth.example.com/*"]
to:
- operation:
methods: ["GET"]
when:
- key: request.headers[x-api-version]
values: ["v2"]
# Allow health checks from anywhere (no source restriction)
- to:
- operation:
methods: ["GET"]
paths: ["/healthz", "/readyz"]
Production Authorization Policy Pattern
For a typical microservices application, build policies layer by layer:
# 1. Deny all by default
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: deny-all
namespace: production
spec: {}
---
# 2. Allow ingress gateway to reach frontend
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: allow-ingress-to-frontend
namespace: production
spec:
selector:
matchLabels:
app: frontend
action: ALLOW
rules:
- from:
- source:
namespaces: ["istio-ingress"]
---
# 3. Allow frontend to reach API
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: allow-frontend-to-api
namespace: production
spec:
selector:
matchLabels:
app: api
action: ALLOW
rules:
- from:
- source:
principals: ["cluster.local/ns/production/sa/frontend"]
---
# 4. Allow API to reach database
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: allow-api-to-db
namespace: production
spec:
selector:
matchLabels:
app: database
action: ALLOW
rules:
- from:
- source:
principals: ["cluster.local/ns/production/sa/api"]
to:
- operation:
ports: ["5432"]
---
# 5. Allow monitoring everywhere
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: allow-monitoring
namespace: production
spec:
action: ALLOW
rules:
- from:
- source:
namespaces: ["monitoring"]
to:
- operation:
methods: ["GET"]
paths: ["/metrics"]
JWT Authentication with RequestAuthentication
For external traffic (from outside the mesh), use JWT tokens for authentication:
apiVersion: security.istio.io/v1
kind: RequestAuthentication
metadata:
name: jwt-auth
namespace: production
spec:
selector:
matchLabels:
app: api-service
jwtRules:
- issuer: "https://auth.example.com"
jwksUri: "https://auth.example.com/.well-known/jwks.json"
audiences:
- "api.example.com"
forwardOriginalToken: true
outputPayloadToHeader: "x-jwt-payload"
fromHeaders:
- name: Authorization
prefix: "Bearer "
fromParams:
- "access_token"
# Support multiple identity providers
- issuer: "https://accounts.google.com"
jwksUri: "https://www.googleapis.com/oauth2/v3/certs"
audiences:
- "your-google-client-id.apps.googleusercontent.com"
Then combine with an AuthorizationPolicy to enforce JWT claims:
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: require-jwt
namespace: production
spec:
selector:
matchLabels:
app: api-service
action: ALLOW
rules:
# Admin users can access everything
- from:
- source:
requestPrincipals: ["https://auth.example.com/*"]
when:
- key: request.auth.claims[role]
values: ["admin"]
# Editor users can read and write
- from:
- source:
requestPrincipals: ["https://auth.example.com/*"]
to:
- operation:
methods: ["GET", "POST", "PUT"]
when:
- key: request.auth.claims[role]
values: ["editor"]
# Viewer users can only read
- from:
- source:
requestPrincipals: ["https://auth.example.com/*"]
to:
- operation:
methods: ["GET"]
when:
- key: request.auth.claims[role]
values: ["viewer"]
# Reject requests without valid JWT (this is implicit when
# RequestAuthentication rejects invalid tokens, but we need
# to ensure requests without tokens are also rejected)
Important: RequestAuthentication only validates tokens that are present. It does not reject requests without tokens. To require a token, pair it with an AuthorizationPolicy that checks for requestPrincipals.
External Authorization with OPA
For complex authorization logic that cannot be expressed in AuthorizationPolicy, delegate to Open Policy Agent (OPA):
Deploy OPA
apiVersion: apps/v1
kind: Deployment
metadata:
name: opa-authorizer
namespace: istio-system
spec:
replicas: 2
selector:
matchLabels:
app: opa-authorizer
template:
metadata:
labels:
app: opa-authorizer
annotations:
sidecar.istio.io/inject: "false" # OPA doesn't need a sidecar
spec:
containers:
- name: opa
image: openpolicyagent/opa:latest-envoy
ports:
- containerPort: 9191 # gRPC for Envoy ext_authz
- containerPort: 8181 # HTTP API for policy management
- containerPort: 8282 # Diagnostics
args:
- "run"
- "--server"
- "--addr=0.0.0.0:8181"
- "--diagnostic-addr=0.0.0.0:8282"
- "--set=plugins.envoy_ext_authz_grpc.addr=0.0.0.0:9191"
- "--set=plugins.envoy_ext_authz_grpc.path=istio/authz/allow"
- "--set=decision_logs.console=true"
- "/policies"
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
volumeMounts:
- name: policies
mountPath: /policies
livenessProbe:
httpGet:
path: /health?plugins
port: 8282
initialDelaySeconds: 5
readinessProbe:
httpGet:
path: /health?plugins
port: 8282
initialDelaySeconds: 5
volumes:
- name: policies
configMap:
name: opa-policies
---
apiVersion: v1
kind: Service
metadata:
name: opa-authorizer
namespace: istio-system
spec:
selector:
app: opa-authorizer
ports:
- name: grpc
port: 9191
targetPort: 9191
---
apiVersion: v1
kind: ConfigMap
metadata:
name: opa-policies
namespace: istio-system
data:
policy.rego: |
package istio.authz
import input.attributes.request.http as http_request
import input.attributes.source.principal as source_principal
default allow = false
# Allow health checks
allow {
http_request.method == "GET"
http_request.path == "/healthz"
}
# Allow requests from known services during business hours
allow {
source_principal != ""
is_business_hours
}
# Rate limit: deny if source has made too many requests
# (simplified, real implementation would check a shared counter)
allow {
source_principal != ""
not is_rate_limited
}
is_business_hours {
time.clock(time.now_ns()) == [h, m, s]
h >= 8
h < 22
}
is_rate_limited = false
Configure Istio to Use OPA
Register the provider in the Istio mesh configuration:
# In IstioOperator or Helm values
meshConfig:
extensionProviders:
- name: opa-authorizer
envoyExtAuthzGrpc:
service: opa-authorizer.istio-system.svc.cluster.local
port: 9191
timeout: 500ms
failOpen: false # Deny if OPA is unreachable
Then create the AuthorizationPolicy:
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: opa-ext-auth
namespace: production
spec:
selector:
matchLabels:
app: api-service
action: CUSTOM
provider:
name: opa-authorizer
rules:
- to:
- operation:
paths: ["/api/*"]
Migrating to Strict mTLS Without Downtime
Moving from PERMISSIVE to STRICT mTLS requires careful planning to avoid breaking non-mesh services. This is a multi-step process that should be executed over days or weeks, not hours.
Step 1: Audit Current State
# Check which services are using mTLS
istioctl x describe pod -n production deploy/api-service
# Check all PeerAuthentication policies
kubectl get peerauthentication --all-namespaces
# Find pods without sidecars (these will break under STRICT mTLS)
kubectl get pods --all-namespaces -o jsonpath='{range .items[*]}{.metadata.namespace}/{.metadata.name} {.spec.containers[*].name}{"\n"}{end}' | grep -v istio-proxy | grep -v kube-system
# Check actual mTLS status between services
istioctl proxy-config listeners deploy/api-service -o json | \
jq '.[].filterChains[].transportSocket.typedConfig.commonTlsContext'
Step 2: Ensure PERMISSIVE Mode is the Starting State
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-system
spec:
mtls:
mode: PERMISSIVE
Step 3: Add Sidecars to All Workloads
Label all application namespaces for injection and restart deployments:
# Label namespaces
for ns in production staging batch; do
kubectl label namespace $ns istio-injection=enabled --overwrite
done
# Restart deployments to inject sidecars
for ns in production staging batch; do
kubectl rollout restart deployment -n $ns
kubectl rollout status deployment --all -n $ns --timeout=300s
done
Step 4: Verify mTLS is Working (While Still Permissive)
# Use Kiali to verify mTLS connections
istioctl dashboard kiali
# Check mTLS status between specific services
istioctl proxy-config endpoints deploy/frontend -n production | grep reviews
# Look for mTLS indicators in access logs
kubectl logs deploy/api-service -c istio-proxy -n production | \
grep -o '"upstream_transport_failure_reason":"[^"]*"' | sort | uniq -c
Step 5: Migrate Namespace by Namespace
Start with the least critical namespace:
# Enable strict mTLS for staging first
kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: default
namespace: staging
spec:
mtls:
mode: STRICT
EOF
# Verify no broken connections
kubectl logs -n staging -l app=api-service -c istio-proxy --tail=50 | grep -i "tls\|error\|refused"
# Run integration tests against staging
# If tests pass, move to the next namespace
Step 6: Handle Non-Mesh Services
For services that cannot run sidecars (databases, legacy systems, external services), keep specific workloads in PERMISSIVE mode:
# Allow plaintext from external MySQL client
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: mysql-permissive
namespace: production
spec:
selector:
matchLabels:
app: mysql
mtls:
mode: PERMISSIVE
Also create a DestinationRule to configure how the mesh communicates with non-mesh services:
# Disable mTLS when talking to external database
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
name: external-database
namespace: production
spec:
host: database.legacy.svc.cluster.local
trafficPolicy:
tls:
mode: DISABLE
Step 7: Enable Mesh-Wide STRICT
Once all namespaces are verified:
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-system
spec:
mtls:
mode: STRICT
Debugging mTLS and Security Issues
Connection Refused
# Check if the destination has a sidecar
kubectl get pod -n production -l app=api-service -o jsonpath='{.items[0].spec.containers[*].name}'
# If istio-proxy is missing, the pod cannot terminate mTLS
# Check mTLS mode mismatch
istioctl proxy-config listeners deploy/frontend -n production --port 8080
# Check the actual TLS handshake
kubectl exec deploy/frontend -c istio-proxy -n production -- \
openssl s_client -connect api-service.production:8080 -servername api-service.production
Certificate Errors
# Check certificate validity and issuer
istioctl proxy-config secret deploy/api-service -n production -o json | \
jq -r '.dynamicActiveSecrets[0].secret.tlsCertificate.certificateChain.inlineBytes' | \
base64 -d | openssl x509 -text -noout
# Check certificate expiration
istioctl proxy-config secret deploy/api-service -n production -o json | \
jq -r '.dynamicActiveSecrets[0].secret.tlsCertificate.certificateChain.inlineBytes' | \
base64 -d | openssl x509 -enddate -noout
# Verify CA certificates match between services
istioctl proxy-config secret deploy/frontend -n production
istioctl proxy-config secret deploy/api-service -n production
Authorization Policy Not Taking Effect
# Evaluation order: CUSTOM -> DENY -> ALLOW
# DENY policies always override ALLOW
# Verify the policy selects the right workload
kubectl get authorizationpolicy -n production -o yaml | grep -A5 "selector"
# Check Envoy logs for RBAC denials
kubectl logs deploy/api-service -c istio-proxy -n production | grep "rbac_access_denied"
# Get detailed RBAC debug info
kubectl logs deploy/api-service -c istio-proxy -n production | grep "enforced_policy"
# Analyze all policies affecting a workload
istioctl x authz check deploy/api-service -n production
Common Issues and Solutions
| Issue | Symptom | Solution |
|---|---|---|
| mTLS mode mismatch | Connection reset, 503 errors | Ensure both source and destination have matching TLS settings |
| Missing sidecar | Cannot terminate mTLS | Add injection label, restart deployment |
| Stale certificates | TLS handshake failure | Restart istiod, check CA secret |
| Wrong principal in policy | RBAC denied | Verify service account name matches principals |
| Policy not applied | All traffic allowed/denied | Check selector labels match pod labels |
| JWT validation failure | 401 Unauthorized | Verify JWKS URI is accessible, check token expiration |
Security Best Practices Checklist
- Enable STRICT mTLS mesh-wide after migrating all workloads
- Use deny-all baseline in every namespace, then add specific ALLOW policies
- Use custom CA certificates for production --- do not rely on Istio's self-signed CA
- Rotate CA certificates before they expire, with overlap period
- Apply AuthorizationPolicies per service, not per namespace
- Use SPIFFE principals, not source IPs, in policies (IPs change, identities do not)
- Require JWT for external traffic via RequestAuthentication
- Set
failOpen: falsefor external authorization providers - Monitor RBAC denials and alert on unexpected patterns
- Audit policies regularly --- use
istioctl analyzeand Kiali's validation - Use separate service accounts for each deployment (not the default SA)
- Keep workload certificate TTL short (24 hours or less)
Summary
Istio's security model gives you zero-trust networking without application changes. Start with PERMISSIVE mTLS to verify everything works, migrate to STRICT namespace by namespace, and layer AuthorizationPolicies on top to control exactly which services can communicate. Use RequestAuthentication for external JWT validation, delegate complex authorization to OPA when Istio's built-in policies are not expressive enough, and always maintain a deny-all baseline policy. The goal is a mesh where every connection is encrypted, every identity is verified, and every request is explicitly authorized. The migration path from a permissive network to full zero-trust is incremental --- there is no need for a big-bang switch that risks breaking everything at once. Take it namespace by namespace, verify at each step, and use Istio's observability tools to confirm that mTLS and authorization are working as expected before moving forward.
SRE & Observability Engineer
If it's not measured, it doesn't exist. SLO-driven, metrics-obsessed, and the person who gets paged at 3 AM so you don't have to. Observability isn't optional.
Related Articles
Istio Installation & Architecture: Your First Service Mesh
Install Istio on Kubernetes, understand the control plane architecture, deploy your first sidecar proxy, and configure namespace injection.
Istio Observability: Kiali, Jaeger, and Prometheus Integration
Leverage Istio's built-in observability — Kiali service graph, Jaeger distributed tracing, Prometheus metrics, and Grafana dashboards for your service mesh.
Istio Traffic Management: Routing, Canary, and Circuit Breaking
Configure Istio VirtualServices, DestinationRules, and Gateways for advanced traffic routing, canary deployments, fault injection, and circuit breaking.