Envoy Proxy for Microservices: Edge and Sidecar Patterns
Envoy is a high-performance proxy designed for microservices architectures. Built at Lyft and now a CNCF graduated project, it powers the data plane of service meshes like Istio, Consul Connect, and AWS App Mesh. Whether you use it standalone or as part of a mesh, understanding Envoy's architecture gives you fine-grained control over service-to-service communication, with features like automatic retries, circuit breaking, outlier detection, and distributed tracing built directly into the proxy layer.
This guide covers Envoy's core architecture in depth, complete configuration examples for both edge proxy and sidecar deployments, the xDS dynamic configuration model, SSL/TLS termination and mutual TLS, circuit breaking tuning, retry policies, rate limiting strategies, health checking, full observability setup, and a thorough comparison with Nginx and HAProxy to help you choose the right tool.
What Envoy Is and Why It Exists
Traditional proxies like Nginx and HAProxy were built for the north-south traffic pattern: clients on the internet connecting to servers in a data center. They excel at this. Envoy was built for a different problem: east-west traffic, where services inside a cluster talk to each other over the network.
In a microservices environment, a single user request may fan out to 10, 20, or 50 downstream service calls. Each of those calls is a potential failure point. A single slow service can cascade and bring down the entire system. Traditional application-level libraries (like Netflix Hystrix or resilience4j) address this, but they require code changes in every service, in every language. Envoy moves these capabilities into the infrastructure layer:
- Automatic retries with configurable backoff and retry budgets
- Circuit breaking to prevent cascade failures across the service graph
- Outlier detection to eject misbehaving hosts from load balancing pools
- Distributed tracing with automatic span generation and context propagation
- Dynamic configuration through APIs (xDS) rather than config file reloads
- Protocol-aware routing for HTTP/1.1, HTTP/2, gRPC, and raw TCP
- Weighted traffic splitting for canary deployments and A/B testing
- Mutual TLS for zero-trust service-to-service authentication
The key insight is that by embedding these capabilities in the proxy, every service gets them without changing application code. A Python service, a Go service, and a Java service all benefit equally from the same Envoy sidecar configuration.
Architecture Overview
Envoy's configuration model has four core concepts that map directly to how network traffic flows:
Incoming Connection
|
v
+---------------------+
| Listener | Accepts connections on IP:port
| (network address) |
+---------------------+
|
v
+---------------------+
| Filter Chain | Processes data through ordered filters
| (L4 + L7 filters) | (TLS, HTTP parsing, auth, rate limit)
+---------------------+
|
v
+---------------------+
| Route Table | Maps request attributes to clusters
| (path, headers) | (path prefix, header match, weighted)
+---------------------+
|
v
+---------------------+
| Cluster | Group of upstream hosts
| (load balancing) | (health checks, circuit breakers)
+---------------------+
|
v
+---------------------+
| Endpoint | Individual backend host:port
+---------------------+
Listeners
A listener is a named network location (IP + port) where Envoy accepts connections. Each listener has one or more filter chains that process the traffic:
listeners:
- name: http_listener
address:
socket_address:
address: 0.0.0.0
port_value: 8080
per_connection_buffer_limit_bytes: 32768
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
codec_type: AUTO
use_remote_address: true
common_http_protocol_options:
idle_timeout: 3600s
headers_with_underscores_action: REJECT_REQUEST
http2_protocol_options:
max_concurrent_streams: 128
initial_stream_window_size: 65536
initial_connection_window_size: 1048576
route_config:
name: local_route
virtual_hosts:
- name: backend
domains: ["*"]
routes:
- match: { prefix: "/" }
route: { cluster: app_service }
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
Key listener parameters:
| Parameter | Purpose | Recommended Value |
|---|---|---|
per_connection_buffer_limit_bytes | Memory limit per connection | 32768 (32KB) |
use_remote_address | Use real client IP for access log and rate limiting | true for edge proxy |
codec_type: AUTO | Auto-detect HTTP/1.1 vs HTTP/2 | AUTO for edge, HTTP2 for gRPC |
headers_with_underscores_action | Reject headers with underscores (security) | REJECT_REQUEST |
max_concurrent_streams | HTTP/2 stream multiplexing limit | 100-256 |
Filter Chains
Filters are the heart of Envoy's extensibility. They process traffic at both the network (L4) and HTTP (L7) layers. Network filters handle raw bytes; HTTP filters handle parsed requests. Filters are composable and execute in order:
| Filter Type | Name | Purpose |
|---|---|---|
| Network | http_connection_manager | Parse HTTP, apply HTTP filters, route to clusters |
| Network | tcp_proxy | Plain TCP proxying without HTTP awareness |
| Network | redis_proxy | Redis protocol-aware proxying with command splitting |
| Network | mongo_proxy | MongoDB wire protocol sniffing for metrics |
| HTTP | router | Route requests to clusters (required, always last) |
| HTTP | local_ratelimit | Per-instance token bucket rate limiting |
| HTTP | ratelimit | Global rate limiting via external service |
| HTTP | cors | Cross-Origin Resource Sharing handling |
| HTTP | jwt_authn | JWT token validation |
| HTTP | ext_authz | External authorization via gRPC/HTTP callout |
| HTTP | fault | Fault injection for testing (delays, aborts) |
| HTTP | compressor | Response compression (gzip, brotli) |
| HTTP | health_check | Respond to health checks without hitting upstream |
Clusters
A cluster is a group of upstream hosts that Envoy routes traffic to. Clusters are where you configure load balancing, health checks, circuit breakers, and connection pooling:
clusters:
- name: app_service
connect_timeout: 5s
type: STRICT_DNS
dns_lookup_family: V4_ONLY
lb_policy: ROUND_ROBIN
common_lb_config:
healthy_panic_threshold:
value: 50
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
sni: app-service.internal
load_assignment:
cluster_name: app_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: app-service-1
port_value: 8080
load_balancing_weight: 3
- endpoint:
address:
socket_address:
address: app-service-2
port_value: 8080
load_balancing_weight: 2
Cluster service discovery types:
| Type | Behavior | Use Case |
|---|---|---|
STATIC | Endpoints hardcoded in config | Fixed infrastructure, testing |
STRICT_DNS | DNS resolution, all returned IPs used | Docker Compose, simple DNS-based discovery |
LOGICAL_DNS | DNS resolution, only first IP used | External services behind a load balancer |
EDS | Endpoints from xDS control plane | Service mesh, dynamic environments |
ORIGINAL_DST | Route to the original destination IP | Transparent proxy, iptables redirect |
The healthy_panic_threshold at 50% means Envoy will route to all hosts (including unhealthy ones) if more than 50% of hosts are marked unhealthy. This prevents a cascading failure where one bad health check config takes out your entire cluster.
Routes
Routes map incoming requests to clusters based on path, headers, query parameters, or other criteria:
route_config:
virtual_hosts:
- name: api
domains: ["api.example.com"]
request_headers_to_add:
- header:
key: x-custom-header
value: "from-envoy"
append_action: OVERWRITE_IF_EXISTS_OR_ADD
routes:
# Header-based routing for canary deployments
- match:
prefix: "/"
headers:
- name: x-canary
exact_match: "true"
route:
cluster: api_canary
timeout: 30s
# Path-based routing with regex
- match:
safe_regex:
regex: "/users/[0-9]+"
route:
cluster: user_service
timeout: 10s
# Weighted routing for gradual rollouts
- match:
prefix: "/api/v2"
route:
weighted_clusters:
clusters:
- name: api_v2_stable
weight: 90
- name: api_v2_canary
weight: 10
# Prefix rewrite (strip /api prefix)
- match:
prefix: "/api/"
route:
cluster: api_service
prefix_rewrite: "/"
timeout: 15s
# Direct response (no upstream)
- match:
prefix: "/healthz"
direct_response:
status: 200
body:
inline_string: "ok"
# Default route
- match:
prefix: "/"
route:
cluster: api_stable
timeout: 30s
Weighted clusters are particularly useful for canary deployments. Send 10% of traffic to the new version and monitor error rates. If metrics look good, gradually increase the weight. If something goes wrong, shift back to 0% instantly without a deployment.
Static vs Dynamic Configuration (xDS)
Envoy supports two fundamentally different configuration approaches.
Static Configuration
Everything is defined in a YAML file loaded at startup. Changes require a restart (or hot restart). This is suitable for edge proxies, development environments, and simple deployments:
admin:
address:
socket_address:
address: 0.0.0.0
port_value: 9901
static_resources:
listeners:
- name: main
address:
socket_address: { address: 0.0.0.0, port_value: 8080 }
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress
codec_type: AUTO
route_config:
virtual_hosts:
- name: default
domains: ["*"]
routes:
- match: { prefix: "/api" }
route: { cluster: api_cluster }
- match: { prefix: "/" }
route: { cluster: web_cluster }
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: api_cluster
connect_timeout: 2s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: api_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address: { address: api-service, port_value: 8080 }
- name: web_cluster
connect_timeout: 2s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: web_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address: { address: web-service, port_value: 3000 }
Dynamic Configuration (xDS APIs)
In dynamic mode, Envoy fetches configuration from a control plane via gRPC or REST. This is what makes service meshes possible. The control plane pushes configuration changes to Envoy sidecars without restarts.
The xDS API family:
| API | Full Name | What It Configures |
|---|---|---|
| LDS | Listener Discovery Service | Listeners and filter chains |
| RDS | Route Discovery Service | Route tables and virtual hosts |
| CDS | Cluster Discovery Service | Upstream cluster definitions |
| EDS | Endpoint Discovery Service | Individual endpoints within clusters |
| SDS | Secret Discovery Service | TLS certificates and keys |
| ECDS | Extension Config Discovery Service | HTTP filter configurations |
| VHDS | Virtual Host Discovery Service | Virtual hosts (granular RDS) |
Bootstrap configuration for dynamic mode:
node:
cluster: my-cluster
id: my-node-1
metadata:
region: us-east-1
az: us-east-1a
admin:
address:
socket_address:
address: 127.0.0.1
port_value: 9901
dynamic_resources:
lds_config:
resource_api_version: V3
api_config_source:
api_type: GRPC
grpc_services:
- envoy_grpc:
cluster_name: xds_cluster
transport_api_version: V3
set_node_on_first_message_only: true
cds_config:
resource_api_version: V3
api_config_source:
api_type: GRPC
grpc_services:
- envoy_grpc:
cluster_name: xds_cluster
transport_api_version: V3
set_node_on_first_message_only: true
static_resources:
clusters:
- name: xds_cluster
connect_timeout: 5s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
typed_extension_protocol_options:
envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
"@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
explicit_http_config:
http2_protocol_options: {}
load_assignment:
cluster_name: xds_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: control-plane
port_value: 18000
The control plane itself can be built with frameworks like go-control-plane (Go), java-control-plane (Java), or commercial solutions like Istio, Consul, or Gloo. The xDS protocol is the universal interface -- any control plane that speaks xDS works with Envoy.
ADS (Aggregated Discovery Service)
For production deployments, use ADS to ensure consistent configuration updates. Without ADS, updates to CDS and EDS can arrive in different order, potentially routing to clusters that do not exist yet:
dynamic_resources:
ads_config:
api_type: GRPC
grpc_services:
- envoy_grpc:
cluster_name: xds_cluster
transport_api_version: V3
lds_config:
resource_api_version: V3
ads: {}
cds_config:
resource_api_version: V3
ads: {}
SSL/TLS Termination and Mutual TLS
Edge TLS Termination
For an edge proxy that terminates TLS from external clients:
listeners:
- name: https_listener
address:
socket_address:
address: 0.0.0.0
port_value: 8443
filter_chains:
- transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
common_tls_context:
tls_params:
tls_minimum_protocol_version: TLSv1_2
tls_maximum_protocol_version: TLSv1_3
cipher_suites:
- ECDHE-ECDSA-AES128-GCM-SHA256
- ECDHE-RSA-AES128-GCM-SHA256
- ECDHE-ECDSA-AES256-GCM-SHA384
- ECDHE-RSA-AES256-GCM-SHA384
tls_certificates:
- certificate_chain:
filename: /etc/envoy/certs/server.crt
private_key:
filename: /etc/envoy/certs/server.key
alpn_protocols: ["h2", "http/1.1"]
filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_https
codec_type: AUTO
route_config:
virtual_hosts:
- name: default
domains: ["*"]
routes:
- match: { prefix: "/" }
route: { cluster: app_service }
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
Mutual TLS (mTLS) Between Services
mTLS is the foundation of zero-trust networking. Both client and server verify each other's certificates:
# On the server side (downstream TLS context)
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
require_client_certificate: true
common_tls_context:
tls_certificates:
- certificate_chain:
filename: /etc/envoy/certs/server.crt
private_key:
filename: /etc/envoy/certs/server.key
validation_context:
trusted_ca:
filename: /etc/envoy/certs/ca.crt
match_typed_subject_alt_names:
- san_type: DNS
matcher:
exact: "client-service.internal"
# On the client side (upstream TLS context in cluster)
clusters:
- name: secure_service
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
common_tls_context:
tls_certificates:
- certificate_chain:
filename: /etc/envoy/certs/client.crt
private_key:
filename: /etc/envoy/certs/client.key
validation_context:
trusted_ca:
filename: /etc/envoy/certs/ca.crt
sni: secure-service.internal
In a service mesh like Istio, mTLS is configured automatically through SDS (Secret Discovery Service). The control plane provisions and rotates certificates for every sidecar without manual intervention.
Deploying as Edge Proxy
As an edge proxy, Envoy replaces Nginx or HAProxy at the ingress point of your infrastructure:
# docker-compose.yml
services:
envoy:
image: envoyproxy/envoy:v1.30-latest
ports:
- "80:8080"
- "443:8443"
- "9901:9901"
volumes:
- ./envoy.yaml:/etc/envoy/envoy.yaml
- ./certs:/etc/envoy/certs:ro
command: ["-c", "/etc/envoy/envoy.yaml", "--service-cluster", "edge", "--service-node", "edge-1"]
deploy:
resources:
limits:
cpus: '2'
memory: 512M
restart: unless-stopped
Complete Edge Proxy Configuration
admin:
address:
socket_address:
address: 0.0.0.0
port_value: 9901
static_resources:
listeners:
- name: http_listener
address:
socket_address:
address: 0.0.0.0
port_value: 8080
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
codec_type: AUTO
use_remote_address: true
route_config:
virtual_hosts:
- name: redirect
domains: ["*"]
routes:
- match: { prefix: "/" }
redirect:
https_redirect: true
response_code: MOVED_PERMANENTLY
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
- name: https_listener
address:
socket_address:
address: 0.0.0.0
port_value: 8443
filter_chains:
- transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
common_tls_context:
tls_params:
tls_minimum_protocol_version: TLSv1_2
tls_certificates:
- certificate_chain: { filename: /etc/envoy/certs/server.crt }
private_key: { filename: /etc/envoy/certs/server.key }
alpn_protocols: ["h2", "http/1.1"]
filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_https
codec_type: AUTO
use_remote_address: true
access_log:
- name: envoy.access_loggers.stdout
typed_config:
"@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
log_format:
json_format:
timestamp: "%START_TIME%"
method: "%REQ(:METHOD)%"
path: "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%"
protocol: "%PROTOCOL%"
status: "%RESPONSE_CODE%"
duration: "%DURATION%"
bytes: "%BYTES_SENT%"
upstream: "%UPSTREAM_HOST%"
request_id: "%REQ(X-REQUEST-ID)%"
route_config:
virtual_hosts:
- name: api
domains: ["api.example.com"]
routes:
- match: { prefix: "/" }
route:
cluster: api_service
timeout: 30s
- name: web
domains: ["www.example.com", "example.com"]
routes:
- match: { prefix: "/static/" }
route:
cluster: static_service
timeout: 10s
- match: { prefix: "/" }
route:
cluster: web_service
timeout: 30s
http_filters:
- name: envoy.filters.http.local_ratelimit
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
stat_prefix: local_rate_limit
token_bucket:
max_tokens: 1000
tokens_per_fill: 1000
fill_interval: 1s
filter_enabled:
runtime_key: local_rate_limit_enabled
default_value: { numerator: 100, denominator: HUNDRED }
filter_enforced:
runtime_key: local_rate_limit_enforced
default_value: { numerator: 100, denominator: HUNDRED }
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: api_service
connect_timeout: 2s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
health_checks:
- timeout: 3s
interval: 10s
unhealthy_threshold: 3
healthy_threshold: 2
http_health_check:
path: /healthz
circuit_breakers:
thresholds:
- priority: DEFAULT
max_connections: 1000
max_pending_requests: 500
max_requests: 2000
max_retries: 10
load_assignment:
cluster_name: api_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address: { address: api-svc, port_value: 8080 }
- name: web_service
connect_timeout: 2s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
health_checks:
- timeout: 3s
interval: 10s
unhealthy_threshold: 3
healthy_threshold: 2
http_health_check:
path: /health
load_assignment:
cluster_name: web_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address: { address: web-svc, port_value: 3000 }
- name: static_service
connect_timeout: 1s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: static_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address: { address: static-svc, port_value: 80 }
Deploying as Sidecar
The sidecar pattern runs an Envoy instance alongside each service instance. In Kubernetes, it runs as a container in the same pod, sharing the network namespace:
apiVersion: v1
kind: Pod
metadata:
name: my-service
labels:
app: my-service
spec:
containers:
- name: app
image: my-app:latest
ports:
- containerPort: 8080
env:
- name: HTTP_PROXY
value: "http://127.0.0.1:9211"
resources:
requests:
cpu: 100m
memory: 128Mi
- name: envoy-sidecar
image: envoyproxy/envoy:v1.30-latest
ports:
- containerPort: 9901
name: envoy-admin
- containerPort: 9211
name: envoy-egress
- containerPort: 9212
name: envoy-ingress
volumeMounts:
- name: envoy-config
mountPath: /etc/envoy
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 200m
memory: 128Mi
readinessProbe:
httpGet:
path: /ready
port: 9901
initialDelaySeconds: 2
periodSeconds: 5
livenessProbe:
httpGet:
path: /server_info
port: 9901
initialDelaySeconds: 5
periodSeconds: 15
volumes:
- name: envoy-config
configMap:
name: envoy-sidecar-config
initContainers:
- name: init-iptables
image: envoyproxy/envoy:v1.30-latest
securityContext:
capabilities:
add: ["NET_ADMIN"]
command:
- sh
- -c
- |
iptables -t nat -A PREROUTING -p tcp --dport 8080 -j REDIRECT --to-port 9212
iptables -t nat -A OUTPUT -p tcp --dport 8080 -m owner ! --uid-owner 1337 -j REDIRECT --to-port 9211
The init container sets up iptables rules that transparently redirect traffic through Envoy. Inbound traffic to port 8080 is redirected to Envoy's ingress listener (9212), and outbound traffic from the app is redirected to Envoy's egress listener (9211). The --uid-owner 1337 exclusion prevents Envoy's own traffic from being redirected (infinite loop).
In production service meshes like Istio, all of this is automated. The Istio sidecar injector automatically adds the Envoy container and iptables init container to every pod.
Circuit Breaking
Circuit breaking prevents a failing service from consuming all available resources and cascading the failure to its callers. When a service becomes slow or unresponsive, Envoy stops sending it traffic:
clusters:
- name: payment_service
connect_timeout: 2s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
circuit_breakers:
thresholds:
- priority: DEFAULT
max_connections: 100
max_pending_requests: 50
max_requests: 200
max_retries: 3
track_remaining: true
retry_budget:
budget_percent:
value: 20.0
min_retry_concurrency: 3
- priority: HIGH
max_connections: 200
max_pending_requests: 100
max_requests: 400
max_retries: 5
load_assignment:
cluster_name: payment_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address: { address: payment-svc, port_value: 8080 }
Circuit breaker thresholds explained:
| Threshold | What It Limits | When Circuit Opens |
|---|---|---|
max_connections | Concurrent TCP connections to the cluster | New connections return 503 |
max_pending_requests | Requests waiting for a connection from the pool | Queued requests return 503 |
max_requests | Total concurrent requests (HTTP/2 multiplexed) | New requests return 503 |
max_retries | Concurrent retry attempts across the cluster | Retries are skipped |
retry_budget | Percentage of active requests that can be retries | Prevents retry storms |
The retry_budget is particularly important. Without it, a failing service can experience a "retry storm" where every failed request generates retries, which also fail and generate more retries. Setting budget_percent to 20% means only 20% of active requests can be retries at any time.
Monitor circuit breaker state via Envoy's stats:
curl -s http://localhost:9901/stats | grep circuit_breakers
# cluster.payment_service.circuit_breakers.default.cx_open: 0
# cluster.payment_service.circuit_breakers.default.cx_pool_open: 0
# cluster.payment_service.circuit_breakers.default.rq_open: 0
# cluster.payment_service.circuit_breakers.default.rq_pending_open: 0
# cluster.payment_service.circuit_breakers.default.remaining_cx: 100
# cluster.payment_service.circuit_breakers.default.remaining_pending: 50
When any _open counter is non-zero, the circuit is open for that threshold. The remaining_* counters (enabled by track_remaining: true) show headroom.
Retries and Timeouts
Configure retries per route for transient failures:
routes:
- match:
prefix: "/api/"
route:
cluster: api_service
timeout: 15s
retry_policy:
retry_on: "5xx,reset,connect-failure,retriable-4xx,refused-stream"
num_retries: 3
per_try_timeout: 5s
per_try_idle_timeout: 3s
retry_back_off:
base_interval: 0.1s
max_interval: 1s
retriable_status_codes:
- 503
- 429
retry_host_predicate:
- name: envoy.retry_host_predicates.previous_hosts
host_selection_retry_max_attempts: 5
Retry configuration reference:
| Setting | Purpose | Recommendation |
|---|---|---|
retry_on | Conditions that trigger a retry | Include 5xx, connect-failure, reset |
num_retries | Maximum retry attempts | 2-3 for most services |
per_try_timeout | Timeout for each individual attempt | Less than overall route timeout |
retry_back_off | Exponential backoff between retries | Start at 100ms, cap at 1s |
retry_host_predicate.previous_hosts | Retry on a different host than the one that failed | Always enable |
retriable_status_codes | Additional HTTP status codes to retry on | 503, 429 |
Critical rule: only retry idempotent operations. GET requests are safe to retry. POST requests should generally not be retried unless your API is designed for idempotency (e.g., uses idempotency keys). Retrying a non-idempotent POST can cause duplicate charges, duplicate messages, or other data corruption.
For non-idempotent routes, use a separate retry policy or disable retries entirely:
routes:
- match:
prefix: "/api/payments"
headers:
- name: ":method"
exact_match: "POST"
route:
cluster: payment_service
timeout: 30s
# No retry_policy -- do not retry payments
Rate Limiting
Local Rate Limiting
Applied per Envoy instance using a token bucket algorithm:
http_filters:
- name: envoy.filters.http.local_ratelimit
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
stat_prefix: http_local_rate_limiter
token_bucket:
max_tokens: 1000
tokens_per_fill: 1000
fill_interval: 1s
filter_enabled:
runtime_key: local_rate_limit_enabled
default_value: { numerator: 100, denominator: HUNDRED }
filter_enforced:
runtime_key: local_rate_limit_enforced
default_value: { numerator: 100, denominator: HUNDRED }
response_headers_to_add:
- append_action: OVERWRITE_IF_EXISTS_OR_ADD
header:
key: x-ratelimit-limit
value: "1000"
- append_action: OVERWRITE_IF_EXISTS_OR_ADD
header:
key: x-ratelimit-remaining
value: "0"
status:
code: TooManyRequests
Global Rate Limiting
For coordinated rate limiting across all Envoy instances, use an external rate limit service like envoy-ratelimit:
http_filters:
- name: envoy.filters.http.ratelimit
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
domain: my_domain
failure_mode_deny: false
rate_limit_service:
grpc_service:
envoy_grpc:
cluster_name: rate_limit_service
transport_api_version: V3
The failure_mode_deny: false setting means that if the rate limit service is unreachable, requests are allowed through. Set to true if you want to fail closed (deny requests when the rate limiter is down).
Configure per-route rate limit actions:
routes:
- match:
prefix: "/api/"
route:
cluster: api_service
rate_limits:
- actions:
- remote_address: {}
- actions:
- request_headers:
header_name: x-api-key
descriptor_key: api_key
Health Checking and Outlier Detection
Envoy supports both active and passive health checking.
Active Health Checks
clusters:
- name: api_service
health_checks:
- timeout: 3s
interval: 10s
unhealthy_threshold: 3
healthy_threshold: 2
no_traffic_interval: 60s
no_traffic_healthy_interval: 120s
http_health_check:
path: /healthz
host: api-service.internal
expected_statuses:
- start: 200
end: 200
request_headers_to_add:
- header:
key: x-health-check
value: "envoy"
append_action: OVERWRITE_IF_EXISTS_OR_ADD
The no_traffic_interval reduces health check frequency for clusters that are not receiving real traffic. This saves resources in large deployments with many clusters.
gRPC Health Checks
For gRPC services implementing the standard health checking protocol:
health_checks:
- timeout: 2s
interval: 10s
unhealthy_threshold: 3
healthy_threshold: 2
grpc_health_check:
service_name: my.service.Name
Outlier Detection (Passive Health Checking)
Outlier detection watches real traffic and ejects hosts that are performing badly. It catches issues that active health checks miss, like a service that responds to /healthz but fails on real requests:
clusters:
- name: api_service
outlier_detection:
consecutive_5xx: 5
interval: 10s
base_ejection_time: 30s
max_ejection_percent: 50
enforcing_consecutive_5xx: 100
enforcing_success_rate: 100
success_rate_minimum_hosts: 3
success_rate_request_volume: 100
success_rate_stdev_factor: 1900
consecutive_gateway_failure: 3
enforcing_consecutive_gateway_failure: 100
split_external_local_origin_errors: true
Outlier detection parameters:
| Parameter | Purpose | Default |
|---|---|---|
consecutive_5xx | Eject after N consecutive 5xx responses | 5 |
interval | How often to evaluate outlier status | 10s |
base_ejection_time | Base duration of ejection (multiplied by ejection count) | 30s |
max_ejection_percent | Max percentage of hosts that can be ejected | 10 |
success_rate_minimum_hosts | Minimum hosts needed for success rate analysis | 5 |
success_rate_stdev_factor | Standard deviations from mean before ejection | 1900 (1.9x) |
The max_ejection_percent is a safety valve. Even if every host is failing, Envoy will not eject more than this percentage, preventing a complete cluster outage.
Built-In Observability
Envoy's observability is its strongest differentiator. Every proxy instance exposes rich telemetry without any application code changes.
Stats and Prometheus Metrics
Envoy exposes thousands of metrics via the admin interface:
# All stats
curl http://localhost:9901/stats
# Prometheus format
curl http://localhost:9901/stats/prometheus
# Filter by pattern
curl "http://localhost:9901/stats?filter=cluster.api_service"
# Only counters
curl "http://localhost:9901/stats?type=Counters"
Key metrics to monitor and alert on:
| Metric | What It Tells You | Alert When |
|---|---|---|
upstream_rq_total | Total requests to a cluster | N/A (informational) |
upstream_rq_5xx | 5xx error count | Rate exceeds baseline |
upstream_rq_time | Request latency histogram | p99 exceeds SLA |
upstream_cx_active | Active connections to upstream | Near circuit breaker limit |
upstream_cx_connect_fail | Connection failures | Any non-zero count |
membership_healthy | Healthy hosts in cluster | Below minimum threshold |
membership_total | Total hosts in cluster | Unexpected changes |
upstream_rq_retry | Retry count | High retry rate |
upstream_rq_pending_overflow | Requests rejected by circuit breaker | Any non-zero count |
downstream_cx_active | Active client connections | Near capacity |
downstream_rq_total | Total incoming requests | Unexpected spikes |
Distributed Tracing
Envoy can propagate trace headers and report spans to Zipkin, Jaeger, or any OpenTelemetry collector:
tracing:
http:
name: envoy.tracers.opentelemetry
typed_config:
"@type": type.googleapis.com/envoy.config.trace.v3.OpenTelemetryConfig
grpc_service:
envoy_grpc:
cluster_name: otel_collector
service_name: my-service
Envoy automatically generates spans for each request and propagates trace context headers between services. The key headers propagated:
| Header | Tracing System |
|---|---|
x-request-id | Envoy internal |
x-b3-traceid, x-b3-spanid, x-b3-parentspanid | Zipkin/B3 |
traceparent, tracestate | W3C Trace Context |
x-cloud-trace-context | Google Cloud Trace |
Your application code only needs to forward these headers on outbound requests. Envoy handles span creation, timing, and reporting.
Access Logs
Configure structured access logging for request-level debugging:
access_log:
- name: envoy.access_loggers.file
typed_config:
"@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
path: /var/log/envoy/access.json
log_format:
json_format:
timestamp: "%START_TIME%"
method: "%REQ(:METHOD)%"
path: "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%"
protocol: "%PROTOCOL%"
response_code: "%RESPONSE_CODE%"
response_flags: "%RESPONSE_FLAGS%"
duration_ms: "%DURATION%"
upstream_host: "%UPSTREAM_HOST%"
upstream_cluster: "%UPSTREAM_CLUSTER%"
upstream_local_address: "%UPSTREAM_LOCAL_ADDRESS%"
bytes_received: "%BYTES_RECEIVED%"
bytes_sent: "%BYTES_SENT%"
request_id: "%REQ(X-REQUEST-ID)%"
user_agent: "%REQ(USER-AGENT)%"
downstream_remote_address: "%DOWNSTREAM_REMOTE_ADDRESS%"
Response flags are particularly useful for debugging:
| Flag | Meaning |
|---|---|
UH | No healthy upstream hosts |
UF | Upstream connection failure |
UO | Upstream overflow (circuit breaker triggered) |
UT | Upstream request timeout |
UC | Upstream connection termination |
LR | Connection local reset |
RL | Rate limited |
DC | Downstream connection termination |
NR | No route configured |
Comparison with Nginx and HAProxy
| Feature | Envoy | Nginx | HAProxy |
|---|---|---|---|
| Primary use case | Service mesh, east-west traffic | Web server, reverse proxy, north-south | Load balancing, north-south |
| Configuration | YAML, dynamic via xDS APIs | Config files, reload signal | Config files, reload signal |
| Hot reload | Draining + hot restart, or xDS (no restart) | Worker process reload | Seamless reload with fd passing |
| gRPC support | Native, first-class, bidirectional streaming | Basic reverse proxy (since 1.13) | TCP mode only (no L7 awareness) |
| HTTP/2 | Full support, including upstream H2 | Full downstream, limited upstream H2 | Full support |
| Circuit breaking | Built-in with configurable thresholds | Not built-in | Not built-in |
| Retry policies | Configurable per-route with backoff and budgets | Limited retry with proxy_next_upstream | Retries with retries directive |
| Distributed tracing | Built-in (Zipkin, Jaeger, OTel) | Via third-party modules | Not built-in |
| Observability | Thousands of metrics, histograms | Basic stub_status + modules | Stats page + Prometheus exporter |
| Outlier detection | Built-in passive health checking | max_fails (basic) | Health checks (active only) |
| Rate limiting | Local + global (external service) | limit_req_zone (built-in) | Stick tables (built-in) |
| Sidecar pattern | Designed for it, minimal footprint | Possible but heavier | Not designed for it |
| Dynamic config | Full xDS API, no restart needed | Reload required | Reload + runtime API |
| WebAssembly plugins | Built-in Wasm support | Not supported | Not supported |
| Learning curve | Steep (verbose YAML, many concepts) | Moderate (intuitive config syntax) | Moderate (four-section model) |
| Memory footprint | ~30-50MB per sidecar | ~5-10MB per worker | ~5-10MB base |
| Community | CNCF, service mesh ecosystem | Broad, web-focused, largest install base | Load balancing focused, proven at scale |
When to Use Each
Choose Envoy when: You are running microservices and need circuit breaking, retries, distributed tracing, and dynamic configuration. Essential if adopting a service mesh. Best for gRPC-heavy environments and Kubernetes-native architectures.
Choose Nginx when: You need a web server that also does reverse proxying, caching, and static file serving. Best for traditional architectures, simple deployments, and when your team already knows Nginx well.
Choose HAProxy when: You need dedicated, high-performance load balancing with advanced health checking, stick tables, and TCP proxying. Excellent for database load balancing and environments where connection-level control matters.
Combine them: Many production architectures use Envoy for east-west traffic (sidecar mesh) while using Nginx or HAProxy at the edge for north-south traffic. The tools are complementary, not mutually exclusive.
Troubleshooting
Admin Interface
The admin interface at port 9901 is your primary debugging tool:
# View all registered clusters and their health
curl http://localhost:9901/clusters
# View all registered listeners
curl http://localhost:9901/listeners
# View current configuration dump
curl http://localhost:9901/config_dump
# View server info (version, uptime, state)
curl http://localhost:9901/server_info
# Check readiness
curl http://localhost:9901/ready
# View hot restart version
curl http://localhost:9901/hot_restart_version
# Log level adjustment at runtime
curl -X POST "http://localhost:9901/logging?level=debug"
curl -X POST "http://localhost:9901/logging?level=info"
Common Issues
| Symptom | Response Flag | Likely Cause | Fix |
|---|---|---|---|
| 503 No Healthy Upstream | UH | All backends failed health checks | Check backend health, verify health check path |
| 503 Upstream Overflow | UO | Circuit breaker tripped | Increase circuit breaker thresholds or add capacity |
| 504 Upstream Timeout | UT | Backend too slow | Increase route timeout or per_try_timeout |
| 503 No Route | NR | No matching route for the request | Check route config, domain matching, path prefixes |
| Connection reset | UC | Backend closed connection unexpectedly | Check backend connection limits, keepalive settings |
| Retry storms | High retry count | Too many retries without budget | Add retry_budget, reduce num_retries |
Debug Logging
Enable debug logging temporarily to trace request flow:
# Set all loggers to debug
curl -X POST "http://localhost:9901/logging?level=debug"
# Set specific logger
curl -X POST "http://localhost:9901/logging?connection=debug"
curl -X POST "http://localhost:9901/logging?http=debug"
curl -X POST "http://localhost:9901/logging?router=debug"
# Reset to info after debugging
curl -X POST "http://localhost:9901/logging?level=info"
Getting Started
A minimal Docker Compose setup to try Envoy as an edge proxy:
services:
envoy:
image: envoyproxy/envoy:v1.30-latest
volumes:
- ./envoy.yaml:/etc/envoy/envoy.yaml
ports:
- "8080:8080"
- "9901:9901"
app:
image: your-app:latest
expose:
- "3000"
Start with static configuration. Learn the listener-route-cluster model by building a simple edge proxy. Once that is comfortable, add health checks and circuit breakers. Then explore xDS for dynamic configuration. If you are running Kubernetes, consider Istio or Consul Connect -- they use Envoy as the data plane but manage the configuration complexity for you through higher-level abstractions.
Key Takeaways
- Envoy excels at east-west (service-to-service) traffic in microservices architectures. Its circuit breaking, retries, and observability features are specifically designed for this problem.
- The core mental model is listeners (where connections arrive), filter chains (how they are processed), routes (where they go), and clusters (the upstream services).
- Use static configuration for edge proxies and development. Use xDS with a control plane for service mesh deployments where services are dynamic.
- Circuit breaking with retry budgets prevents cascade failures. Without retry budgets, retries can amplify failures instead of recovering from them.
- Configure retries only for idempotent operations. Retrying a non-idempotent POST can cause data corruption.
- Outlier detection (passive health checking) catches failures that active health checks miss, like services that respond to
/healthzbut fail on real requests. - Envoy's built-in stats, distributed tracing, and access logs give you deep visibility into every hop without modifying application code.
- The admin interface on port 9901 is your primary debugging tool. Use it to inspect clusters, check health, adjust log levels, and dump configuration.
- Start simple with static config and Docker Compose before investing in a full service mesh. Understand the fundamentals before adding automation.
- Envoy, Nginx, and HAProxy are complementary tools. Many production architectures use Envoy for the mesh and Nginx or HAProxy at the edge.
SRE & Observability Engineer
If it's not measured, it doesn't exist. SLO-driven, metrics-obsessed, and the person who gets paged at 3 AM so you don't have to. Observability isn't optional.
Related Articles
Istio Installation & Architecture: Your First Service Mesh
Install Istio on Kubernetes, understand the control plane architecture, deploy your first sidecar proxy, and configure namespace injection.
Istio Traffic Management: Routing, Canary, and Circuit Breaking
Configure Istio VirtualServices, DestinationRules, and Gateways for advanced traffic routing, canary deployments, fault injection, and circuit breaking.
Building a Complete Prometheus + Grafana Monitoring Stack from Scratch
Build a production Prometheus and Grafana monitoring stack from scratch — service discovery, recording rules, alerting, and dashboards.