Envoy Proxy for Microservices: Edge and Sidecar Patterns

Envoy is a high-performance proxy designed for microservices architectures. Built at Lyft and now a CNCF graduated project, it powers the data plane of service meshes like Istio, Consul Connect, and AWS App Mesh. Whether you use it standalone or as part of a mesh, understanding Envoy's architecture gives you fine-grained control over service-to-service communication, with features like automatic retries, circuit breaking, outlier detection, and distributed tracing built directly into the proxy layer.

This guide covers Envoy's core architecture in depth, complete configuration examples for both edge proxy and sidecar deployments, the xDS dynamic configuration model, SSL/TLS termination and mutual TLS, circuit breaking tuning, retry policies, rate limiting strategies, health checking, full observability setup, and a thorough comparison with Nginx and HAProxy to help you choose the right tool.

What Envoy Is and Why It Exists

Traditional proxies like Nginx and HAProxy were built for the north-south traffic pattern: clients on the internet connecting to servers in a data center. They excel at this. Envoy was built for a different problem: east-west traffic, where services inside a cluster talk to each other over the network.

In a microservices environment, a single user request may fan out to 10, 20, or 50 downstream service calls. Each of those calls is a potential failure point. A single slow service can cascade and bring down the entire system. Traditional application-level libraries (like Netflix Hystrix or resilience4j) address this, but they require code changes in every service, in every language. Envoy moves these capabilities into the infrastructure layer:

Automatic retries with configurable backoff and retry budgets
Circuit breaking to prevent cascade failures across the service graph
Outlier detection to eject misbehaving hosts from load balancing pools
Distributed tracing with automatic span generation and context propagation
Dynamic configuration through APIs (xDS) rather than config file reloads
Protocol-aware routing for HTTP/1.1, HTTP/2, gRPC, and raw TCP
Weighted traffic splitting for canary deployments and A/B testing
Mutual TLS for zero-trust service-to-service authentication

The key insight is that by embedding these capabilities in the proxy, every service gets them without changing application code. A Python service, a Go service, and a Java service all benefit equally from the same Envoy sidecar configuration.

Architecture Overview

Envoy's configuration model has four core concepts that map directly to how network traffic flows:

Incoming Connection
      |
      v
+---------------------+
|  Listener            |  Accepts connections on IP:port
|  (network address)   |
+---------------------+
      |
      v
+---------------------+
|  Filter Chain        |  Processes data through ordered filters
|  (L4 + L7 filters)  |  (TLS, HTTP parsing, auth, rate limit)
+---------------------+
      |
      v
+---------------------+
|  Route Table         |  Maps request attributes to clusters
|  (path, headers)     |  (path prefix, header match, weighted)
+---------------------+
      |
      v
+---------------------+
|  Cluster             |  Group of upstream hosts
|  (load balancing)    |  (health checks, circuit breakers)
+---------------------+
      |
      v
+---------------------+
|  Endpoint            |  Individual backend host:port
+---------------------+

Listeners

A listener is a named network location (IP + port) where Envoy accepts connections. Each listener has one or more filter chains that process the traffic:

listeners:
  - name: http_listener
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 8080
    per_connection_buffer_limit_bytes: 32768
    filter_chains:
      - filters:
          - name: envoy.filters.network.http_connection_manager
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
              stat_prefix: ingress_http
              codec_type: AUTO
              use_remote_address: true
              common_http_protocol_options:
                idle_timeout: 3600s
                headers_with_underscores_action: REJECT_REQUEST
              http2_protocol_options:
                max_concurrent_streams: 128
                initial_stream_window_size: 65536
                initial_connection_window_size: 1048576
              route_config:
                name: local_route
                virtual_hosts:
                  - name: backend
                    domains: ["*"]
                    routes:
                      - match: { prefix: "/" }
                        route: { cluster: app_service }
              http_filters:
                - name: envoy.filters.http.router
                  typed_config:
                    "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

Key listener parameters:

Parameter	Purpose	Recommended Value
`per_connection_buffer_limit_bytes`	Memory limit per connection	32768 (32KB)
`use_remote_address`	Use real client IP for access log and rate limiting	true for edge proxy
`codec_type: AUTO`	Auto-detect HTTP/1.1 vs HTTP/2	AUTO for edge, HTTP2 for gRPC
`headers_with_underscores_action`	Reject headers with underscores (security)	REJECT_REQUEST
`max_concurrent_streams`	HTTP/2 stream multiplexing limit	100-256

Filter Chains

Filters are the heart of Envoy's extensibility. They process traffic at both the network (L4) and HTTP (L7) layers. Network filters handle raw bytes; HTTP filters handle parsed requests. Filters are composable and execute in order:

Filter Type	Name	Purpose
Network	`http_connection_manager`	Parse HTTP, apply HTTP filters, route to clusters
Network	`tcp_proxy`	Plain TCP proxying without HTTP awareness
Network	`redis_proxy`	Redis protocol-aware proxying with command splitting
Network	`mongo_proxy`	MongoDB wire protocol sniffing for metrics
HTTP	`router`	Route requests to clusters (required, always last)
HTTP	`local_ratelimit`	Per-instance token bucket rate limiting
HTTP	`ratelimit`	Global rate limiting via external service
HTTP	`cors`	Cross-Origin Resource Sharing handling
HTTP	`jwt_authn`	JWT token validation
HTTP	`ext_authz`	External authorization via gRPC/HTTP callout
HTTP	`fault`	Fault injection for testing (delays, aborts)
HTTP	`compressor`	Response compression (gzip, brotli)
HTTP	`health_check`	Respond to health checks without hitting upstream

Clusters

A cluster is a group of upstream hosts that Envoy routes traffic to. Clusters are where you configure load balancing, health checks, circuit breakers, and connection pooling:

clusters:
  - name: app_service
    connect_timeout: 5s
    type: STRICT_DNS
    dns_lookup_family: V4_ONLY
    lb_policy: ROUND_ROBIN
    common_lb_config:
      healthy_panic_threshold:
        value: 50
    transport_socket:
      name: envoy.transport_sockets.tls
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
        sni: app-service.internal
    load_assignment:
      cluster_name: app_service
      endpoints:
        - lb_endpoints:
            - endpoint:
                address:
                  socket_address:
                    address: app-service-1
                    port_value: 8080
              load_balancing_weight: 3
            - endpoint:
                address:
                  socket_address:
                    address: app-service-2
                    port_value: 8080
              load_balancing_weight: 2

Cluster service discovery types:

Type	Behavior	Use Case
`STATIC`	Endpoints hardcoded in config	Fixed infrastructure, testing
`STRICT_DNS`	DNS resolution, all returned IPs used	Docker Compose, simple DNS-based discovery
`LOGICAL_DNS`	DNS resolution, only first IP used	External services behind a load balancer
`EDS`	Endpoints from xDS control plane	Service mesh, dynamic environments
`ORIGINAL_DST`	Route to the original destination IP	Transparent proxy, iptables redirect

The healthy_panic_threshold at 50% means Envoy will route to all hosts (including unhealthy ones) if more than 50% of hosts are marked unhealthy. This prevents a cascading failure where one bad health check config takes out your entire cluster.

Routes

Routes map incoming requests to clusters based on path, headers, query parameters, or other criteria:

route_config:
  virtual_hosts:
    - name: api
      domains: ["api.example.com"]
      request_headers_to_add:
        - header:
            key: x-custom-header
            value: "from-envoy"
          append_action: OVERWRITE_IF_EXISTS_OR_ADD
      routes:
        # Header-based routing for canary deployments
        - match:
            prefix: "/"
            headers:
              - name: x-canary
                exact_match: "true"
          route:
            cluster: api_canary
            timeout: 30s

        # Path-based routing with regex
        - match:
            safe_regex:
              regex: "/users/[0-9]+"
          route:
            cluster: user_service
            timeout: 10s

        # Weighted routing for gradual rollouts
        - match:
            prefix: "/api/v2"
          route:
            weighted_clusters:
              clusters:
                - name: api_v2_stable
                  weight: 90
                - name: api_v2_canary
                  weight: 10

        # Prefix rewrite (strip /api prefix)
        - match:
            prefix: "/api/"
          route:
            cluster: api_service
            prefix_rewrite: "/"
            timeout: 15s

        # Direct response (no upstream)
        - match:
            prefix: "/healthz"
          direct_response:
            status: 200
            body:
              inline_string: "ok"

        # Default route
        - match:
            prefix: "/"
          route:
            cluster: api_stable
            timeout: 30s

Weighted clusters are particularly useful for canary deployments. Send 10% of traffic to the new version and monitor error rates. If metrics look good, gradually increase the weight. If something goes wrong, shift back to 0% instantly without a deployment.

Static vs Dynamic Configuration (xDS)

Envoy supports two fundamentally different configuration approaches.

Static Configuration

Everything is defined in a YAML file loaded at startup. Changes require a restart (or hot restart). This is suitable for edge proxies, development environments, and simple deployments:

admin:
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 9901

static_resources:
  listeners:
    - name: main
      address:
        socket_address: { address: 0.0.0.0, port_value: 8080 }
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: ingress
                codec_type: AUTO
                route_config:
                  virtual_hosts:
                    - name: default
                      domains: ["*"]
                      routes:
                        - match: { prefix: "/api" }
                          route: { cluster: api_cluster }
                        - match: { prefix: "/" }
                          route: { cluster: web_cluster }
                http_filters:
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

  clusters:
    - name: api_cluster
      connect_timeout: 2s
      type: STRICT_DNS
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: api_cluster
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address: { address: api-service, port_value: 8080 }

    - name: web_cluster
      connect_timeout: 2s
      type: STRICT_DNS
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: web_cluster
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address: { address: web-service, port_value: 3000 }

Dynamic Configuration (xDS APIs)

In dynamic mode, Envoy fetches configuration from a control plane via gRPC or REST. This is what makes service meshes possible. The control plane pushes configuration changes to Envoy sidecars without restarts.

The xDS API family:

API	Full Name	What It Configures
LDS	Listener Discovery Service	Listeners and filter chains
RDS	Route Discovery Service	Route tables and virtual hosts
CDS	Cluster Discovery Service	Upstream cluster definitions
EDS	Endpoint Discovery Service	Individual endpoints within clusters
SDS	Secret Discovery Service	TLS certificates and keys
ECDS	Extension Config Discovery Service	HTTP filter configurations
VHDS	Virtual Host Discovery Service	Virtual hosts (granular RDS)

Bootstrap configuration for dynamic mode:

node:
  cluster: my-cluster
  id: my-node-1
  metadata:
    region: us-east-1
    az: us-east-1a

admin:
  address:
    socket_address:
      address: 127.0.0.1
      port_value: 9901

dynamic_resources:
  lds_config:
    resource_api_version: V3
    api_config_source:
      api_type: GRPC
      grpc_services:
        - envoy_grpc:
            cluster_name: xds_cluster
      transport_api_version: V3
      set_node_on_first_message_only: true
  cds_config:
    resource_api_version: V3
    api_config_source:
      api_type: GRPC
      grpc_services:
        - envoy_grpc:
            cluster_name: xds_cluster
      transport_api_version: V3
      set_node_on_first_message_only: true

static_resources:
  clusters:
    - name: xds_cluster
      connect_timeout: 5s
      type: STRICT_DNS
      lb_policy: ROUND_ROBIN
      typed_extension_protocol_options:
        envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
          "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
          explicit_http_config:
            http2_protocol_options: {}
      load_assignment:
        cluster_name: xds_cluster
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: control-plane
                      port_value: 18000

The control plane itself can be built with frameworks like go-control-plane (Go), java-control-plane (Java), or commercial solutions like Istio, Consul, or Gloo. The xDS protocol is the universal interface -- any control plane that speaks xDS works with Envoy.

ADS (Aggregated Discovery Service)

For production deployments, use ADS to ensure consistent configuration updates. Without ADS, updates to CDS and EDS can arrive in different order, potentially routing to clusters that do not exist yet:

dynamic_resources:
  ads_config:
    api_type: GRPC
    grpc_services:
      - envoy_grpc:
          cluster_name: xds_cluster
    transport_api_version: V3
  lds_config:
    resource_api_version: V3
    ads: {}
  cds_config:
    resource_api_version: V3
    ads: {}

SSL/TLS Termination and Mutual TLS

Edge TLS Termination

For an edge proxy that terminates TLS from external clients:

listeners:
  - name: https_listener
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 8443
    filter_chains:
      - transport_socket:
          name: envoy.transport_sockets.tls
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
            common_tls_context:
              tls_params:
                tls_minimum_protocol_version: TLSv1_2
                tls_maximum_protocol_version: TLSv1_3
                cipher_suites:
                  - ECDHE-ECDSA-AES128-GCM-SHA256
                  - ECDHE-RSA-AES128-GCM-SHA256
                  - ECDHE-ECDSA-AES256-GCM-SHA384
                  - ECDHE-RSA-AES256-GCM-SHA384
              tls_certificates:
                - certificate_chain:
                    filename: /etc/envoy/certs/server.crt
                  private_key:
                    filename: /etc/envoy/certs/server.key
              alpn_protocols: ["h2", "http/1.1"]
        filters:
          - name: envoy.filters.network.http_connection_manager
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
              stat_prefix: ingress_https
              codec_type: AUTO
              route_config:
                virtual_hosts:
                  - name: default
                    domains: ["*"]
                    routes:
                      - match: { prefix: "/" }
                        route: { cluster: app_service }
              http_filters:
                - name: envoy.filters.http.router
                  typed_config:
                    "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

Mutual TLS (mTLS) Between Services

mTLS is the foundation of zero-trust networking. Both client and server verify each other's certificates:

# On the server side (downstream TLS context)
transport_socket:
  name: envoy.transport_sockets.tls
  typed_config:
    "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
    require_client_certificate: true
    common_tls_context:
      tls_certificates:
        - certificate_chain:
            filename: /etc/envoy/certs/server.crt
          private_key:
            filename: /etc/envoy/certs/server.key
      validation_context:
        trusted_ca:
          filename: /etc/envoy/certs/ca.crt
        match_typed_subject_alt_names:
          - san_type: DNS
            matcher:
              exact: "client-service.internal"

# On the client side (upstream TLS context in cluster)
clusters:
  - name: secure_service
    transport_socket:
      name: envoy.transport_sockets.tls
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
        common_tls_context:
          tls_certificates:
            - certificate_chain:
                filename: /etc/envoy/certs/client.crt
              private_key:
                filename: /etc/envoy/certs/client.key
          validation_context:
            trusted_ca:
              filename: /etc/envoy/certs/ca.crt
        sni: secure-service.internal

In a service mesh like Istio, mTLS is configured automatically through SDS (Secret Discovery Service). The control plane provisions and rotates certificates for every sidecar without manual intervention.

Deploying as Edge Proxy

As an edge proxy, Envoy replaces Nginx or HAProxy at the ingress point of your infrastructure:

# docker-compose.yml
services:
  envoy:
    image: envoyproxy/envoy:v1.30-latest
    ports:
      - "80:8080"
      - "443:8443"
      - "9901:9901"
    volumes:
      - ./envoy.yaml:/etc/envoy/envoy.yaml
      - ./certs:/etc/envoy/certs:ro
    command: ["-c", "/etc/envoy/envoy.yaml", "--service-cluster", "edge", "--service-node", "edge-1"]
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 512M
    restart: unless-stopped

Complete Edge Proxy Configuration

admin:
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 9901

static_resources:
  listeners:
    - name: http_listener
      address:
        socket_address:
          address: 0.0.0.0
          port_value: 8080
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: ingress_http
                codec_type: AUTO
                use_remote_address: true
                route_config:
                  virtual_hosts:
                    - name: redirect
                      domains: ["*"]
                      routes:
                        - match: { prefix: "/" }
                          redirect:
                            https_redirect: true
                            response_code: MOVED_PERMANENTLY
                http_filters:
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

    - name: https_listener
      address:
        socket_address:
          address: 0.0.0.0
          port_value: 8443
      filter_chains:
        - transport_socket:
            name: envoy.transport_sockets.tls
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
              common_tls_context:
                tls_params:
                  tls_minimum_protocol_version: TLSv1_2
                tls_certificates:
                  - certificate_chain: { filename: /etc/envoy/certs/server.crt }
                    private_key: { filename: /etc/envoy/certs/server.key }
                alpn_protocols: ["h2", "http/1.1"]
          filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: ingress_https
                codec_type: AUTO
                use_remote_address: true
                access_log:
                  - name: envoy.access_loggers.stdout
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
                      log_format:
                        json_format:
                          timestamp: "%START_TIME%"
                          method: "%REQ(:METHOD)%"
                          path: "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%"
                          protocol: "%PROTOCOL%"
                          status: "%RESPONSE_CODE%"
                          duration: "%DURATION%"
                          bytes: "%BYTES_SENT%"
                          upstream: "%UPSTREAM_HOST%"
                          request_id: "%REQ(X-REQUEST-ID)%"
                route_config:
                  virtual_hosts:
                    - name: api
                      domains: ["api.example.com"]
                      routes:
                        - match: { prefix: "/" }
                          route:
                            cluster: api_service
                            timeout: 30s
                    - name: web
                      domains: ["www.example.com", "example.com"]
                      routes:
                        - match: { prefix: "/static/" }
                          route:
                            cluster: static_service
                            timeout: 10s
                        - match: { prefix: "/" }
                          route:
                            cluster: web_service
                            timeout: 30s
                http_filters:
                  - name: envoy.filters.http.local_ratelimit
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
                      stat_prefix: local_rate_limit
                      token_bucket:
                        max_tokens: 1000
                        tokens_per_fill: 1000
                        fill_interval: 1s
                      filter_enabled:
                        runtime_key: local_rate_limit_enabled
                        default_value: { numerator: 100, denominator: HUNDRED }
                      filter_enforced:
                        runtime_key: local_rate_limit_enforced
                        default_value: { numerator: 100, denominator: HUNDRED }
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

  clusters:
    - name: api_service
      connect_timeout: 2s
      type: STRICT_DNS
      lb_policy: ROUND_ROBIN
      health_checks:
        - timeout: 3s
          interval: 10s
          unhealthy_threshold: 3
          healthy_threshold: 2
          http_health_check:
            path: /healthz
      circuit_breakers:
        thresholds:
          - priority: DEFAULT
            max_connections: 1000
            max_pending_requests: 500
            max_requests: 2000
            max_retries: 10
      load_assignment:
        cluster_name: api_service
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address: { address: api-svc, port_value: 8080 }

    - name: web_service
      connect_timeout: 2s
      type: STRICT_DNS
      lb_policy: ROUND_ROBIN
      health_checks:
        - timeout: 3s
          interval: 10s
          unhealthy_threshold: 3
          healthy_threshold: 2
          http_health_check:
            path: /health
      load_assignment:
        cluster_name: web_service
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address: { address: web-svc, port_value: 3000 }

    - name: static_service
      connect_timeout: 1s
      type: STRICT_DNS
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: static_service
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address: { address: static-svc, port_value: 80 }

Deploying as Sidecar

The sidecar pattern runs an Envoy instance alongside each service instance. In Kubernetes, it runs as a container in the same pod, sharing the network namespace:

apiVersion: v1
kind: Pod
metadata:
  name: my-service
  labels:
    app: my-service
spec:
  containers:
    - name: app
      image: my-app:latest
      ports:
        - containerPort: 8080
      env:
        - name: HTTP_PROXY
          value: "http://127.0.0.1:9211"
      resources:
        requests:
          cpu: 100m
          memory: 128Mi

    - name: envoy-sidecar
      image: envoyproxy/envoy:v1.30-latest
      ports:
        - containerPort: 9901
          name: envoy-admin
        - containerPort: 9211
          name: envoy-egress
        - containerPort: 9212
          name: envoy-ingress
      volumeMounts:
        - name: envoy-config
          mountPath: /etc/envoy
      resources:
        requests:
          cpu: 50m
          memory: 64Mi
        limits:
          cpu: 200m
          memory: 128Mi
      readinessProbe:
        httpGet:
          path: /ready
          port: 9901
        initialDelaySeconds: 2
        periodSeconds: 5
      livenessProbe:
        httpGet:
          path: /server_info
          port: 9901
        initialDelaySeconds: 5
        periodSeconds: 15

  volumes:
    - name: envoy-config
      configMap:
        name: envoy-sidecar-config

  initContainers:
    - name: init-iptables
      image: envoyproxy/envoy:v1.30-latest
      securityContext:
        capabilities:
          add: ["NET_ADMIN"]
      command:
        - sh
        - -c
        - |
          iptables -t nat -A PREROUTING -p tcp --dport 8080 -j REDIRECT --to-port 9212
          iptables -t nat -A OUTPUT -p tcp --dport 8080 -m owner ! --uid-owner 1337 -j REDIRECT --to-port 9211

The init container sets up iptables rules that transparently redirect traffic through Envoy. Inbound traffic to port 8080 is redirected to Envoy's ingress listener (9212), and outbound traffic from the app is redirected to Envoy's egress listener (9211). The --uid-owner 1337 exclusion prevents Envoy's own traffic from being redirected (infinite loop).

In production service meshes like Istio, all of this is automated. The Istio sidecar injector automatically adds the Envoy container and iptables init container to every pod.

Circuit Breaking

Circuit breaking prevents a failing service from consuming all available resources and cascading the failure to its callers. When a service becomes slow or unresponsive, Envoy stops sending it traffic:

clusters:
  - name: payment_service
    connect_timeout: 2s
    type: STRICT_DNS
    lb_policy: ROUND_ROBIN
    circuit_breakers:
      thresholds:
        - priority: DEFAULT
          max_connections: 100
          max_pending_requests: 50
          max_requests: 200
          max_retries: 3
          track_remaining: true
          retry_budget:
            budget_percent:
              value: 20.0
            min_retry_concurrency: 3
        - priority: HIGH
          max_connections: 200
          max_pending_requests: 100
          max_requests: 400
          max_retries: 5
    load_assignment:
      cluster_name: payment_service
      endpoints:
        - lb_endpoints:
            - endpoint:
                address:
                  socket_address: { address: payment-svc, port_value: 8080 }

Circuit breaker thresholds explained:

Threshold	What It Limits	When Circuit Opens
`max_connections`	Concurrent TCP connections to the cluster	New connections return 503
`max_pending_requests`	Requests waiting for a connection from the pool	Queued requests return 503
`max_requests`	Total concurrent requests (HTTP/2 multiplexed)	New requests return 503
`max_retries`	Concurrent retry attempts across the cluster	Retries are skipped
`retry_budget`	Percentage of active requests that can be retries	Prevents retry storms

The retry_budget is particularly important. Without it, a failing service can experience a "retry storm" where every failed request generates retries, which also fail and generate more retries. Setting budget_percent to 20% means only 20% of active requests can be retries at any time.

Monitor circuit breaker state via Envoy's stats:

curl -s http://localhost:9901/stats | grep circuit_breakers
# cluster.payment_service.circuit_breakers.default.cx_open: 0
# cluster.payment_service.circuit_breakers.default.cx_pool_open: 0
# cluster.payment_service.circuit_breakers.default.rq_open: 0
# cluster.payment_service.circuit_breakers.default.rq_pending_open: 0
# cluster.payment_service.circuit_breakers.default.remaining_cx: 100
# cluster.payment_service.circuit_breakers.default.remaining_pending: 50

When any _open counter is non-zero, the circuit is open for that threshold. The remaining_* counters (enabled by track_remaining: true) show headroom.

Retries and Timeouts

Configure retries per route for transient failures:

routes:
  - match:
      prefix: "/api/"
    route:
      cluster: api_service
      timeout: 15s
      retry_policy:
        retry_on: "5xx,reset,connect-failure,retriable-4xx,refused-stream"
        num_retries: 3
        per_try_timeout: 5s
        per_try_idle_timeout: 3s
        retry_back_off:
          base_interval: 0.1s
          max_interval: 1s
        retriable_status_codes:
          - 503
          - 429
        retry_host_predicate:
          - name: envoy.retry_host_predicates.previous_hosts
        host_selection_retry_max_attempts: 5

Retry configuration reference:

Setting	Purpose	Recommendation
`retry_on`	Conditions that trigger a retry	Include `5xx`, `connect-failure`, `reset`
`num_retries`	Maximum retry attempts	2-3 for most services
`per_try_timeout`	Timeout for each individual attempt	Less than overall route timeout
`retry_back_off`	Exponential backoff between retries	Start at 100ms, cap at 1s
`retry_host_predicate.previous_hosts`	Retry on a different host than the one that failed	Always enable
`retriable_status_codes`	Additional HTTP status codes to retry on	503, 429

Critical rule: only retry idempotent operations. GET requests are safe to retry. POST requests should generally not be retried unless your API is designed for idempotency (e.g., uses idempotency keys). Retrying a non-idempotent POST can cause duplicate charges, duplicate messages, or other data corruption.

For non-idempotent routes, use a separate retry policy or disable retries entirely:

routes:
  - match:
      prefix: "/api/payments"
      headers:
        - name: ":method"
          exact_match: "POST"
    route:
      cluster: payment_service
      timeout: 30s
      # No retry_policy -- do not retry payments

Rate Limiting

Local Rate Limiting

Applied per Envoy instance using a token bucket algorithm:

http_filters:
  - name: envoy.filters.http.local_ratelimit
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
      stat_prefix: http_local_rate_limiter
      token_bucket:
        max_tokens: 1000
        tokens_per_fill: 1000
        fill_interval: 1s
      filter_enabled:
        runtime_key: local_rate_limit_enabled
        default_value: { numerator: 100, denominator: HUNDRED }
      filter_enforced:
        runtime_key: local_rate_limit_enforced
        default_value: { numerator: 100, denominator: HUNDRED }
      response_headers_to_add:
        - append_action: OVERWRITE_IF_EXISTS_OR_ADD
          header:
            key: x-ratelimit-limit
            value: "1000"
        - append_action: OVERWRITE_IF_EXISTS_OR_ADD
          header:
            key: x-ratelimit-remaining
            value: "0"
      status:
        code: TooManyRequests

Global Rate Limiting

For coordinated rate limiting across all Envoy instances, use an external rate limit service like envoy-ratelimit:

http_filters:
  - name: envoy.filters.http.ratelimit
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
      domain: my_domain
      failure_mode_deny: false
      rate_limit_service:
        grpc_service:
          envoy_grpc:
            cluster_name: rate_limit_service
        transport_api_version: V3

The failure_mode_deny: false setting means that if the rate limit service is unreachable, requests are allowed through. Set to true if you want to fail closed (deny requests when the rate limiter is down).

Configure per-route rate limit actions:

routes:
  - match:
      prefix: "/api/"
    route:
      cluster: api_service
      rate_limits:
        - actions:
            - remote_address: {}
        - actions:
            - request_headers:
                header_name: x-api-key
                descriptor_key: api_key

Health Checking and Outlier Detection

Envoy supports both active and passive health checking.

Active Health Checks

clusters:
  - name: api_service
    health_checks:
      - timeout: 3s
        interval: 10s
        unhealthy_threshold: 3
        healthy_threshold: 2
        no_traffic_interval: 60s
        no_traffic_healthy_interval: 120s
        http_health_check:
          path: /healthz
          host: api-service.internal
          expected_statuses:
            - start: 200
              end: 200
          request_headers_to_add:
            - header:
                key: x-health-check
                value: "envoy"
              append_action: OVERWRITE_IF_EXISTS_OR_ADD

The no_traffic_interval reduces health check frequency for clusters that are not receiving real traffic. This saves resources in large deployments with many clusters.

gRPC Health Checks

For gRPC services implementing the standard health checking protocol:

health_checks:
  - timeout: 2s
    interval: 10s
    unhealthy_threshold: 3
    healthy_threshold: 2
    grpc_health_check:
      service_name: my.service.Name

Outlier Detection (Passive Health Checking)

Outlier detection watches real traffic and ejects hosts that are performing badly. It catches issues that active health checks miss, like a service that responds to /healthz but fails on real requests:

clusters:
  - name: api_service
    outlier_detection:
      consecutive_5xx: 5
      interval: 10s
      base_ejection_time: 30s
      max_ejection_percent: 50
      enforcing_consecutive_5xx: 100
      enforcing_success_rate: 100
      success_rate_minimum_hosts: 3
      success_rate_request_volume: 100
      success_rate_stdev_factor: 1900
      consecutive_gateway_failure: 3
      enforcing_consecutive_gateway_failure: 100
      split_external_local_origin_errors: true

Outlier detection parameters:

Parameter	Purpose	Default
`consecutive_5xx`	Eject after N consecutive 5xx responses	5
`interval`	How often to evaluate outlier status	10s
`base_ejection_time`	Base duration of ejection (multiplied by ejection count)	30s
`max_ejection_percent`	Max percentage of hosts that can be ejected	10
`success_rate_minimum_hosts`	Minimum hosts needed for success rate analysis	5
`success_rate_stdev_factor`	Standard deviations from mean before ejection	1900 (1.9x)

The max_ejection_percent is a safety valve. Even if every host is failing, Envoy will not eject more than this percentage, preventing a complete cluster outage.

Built-In Observability

Envoy's observability is its strongest differentiator. Every proxy instance exposes rich telemetry without any application code changes.

Stats and Prometheus Metrics

Envoy exposes thousands of metrics via the admin interface:

# All stats
curl http://localhost:9901/stats

# Prometheus format
curl http://localhost:9901/stats/prometheus

# Filter by pattern
curl "http://localhost:9901/stats?filter=cluster.api_service"

# Only counters
curl "http://localhost:9901/stats?type=Counters"

Key metrics to monitor and alert on:

Metric	What It Tells You	Alert When
`upstream_rq_total`	Total requests to a cluster	N/A (informational)
`upstream_rq_5xx`	5xx error count	Rate exceeds baseline
`upstream_rq_time`	Request latency histogram	p99 exceeds SLA
`upstream_cx_active`	Active connections to upstream	Near circuit breaker limit
`upstream_cx_connect_fail`	Connection failures	Any non-zero count
`membership_healthy`	Healthy hosts in cluster	Below minimum threshold
`membership_total`	Total hosts in cluster	Unexpected changes
`upstream_rq_retry`	Retry count	High retry rate
`upstream_rq_pending_overflow`	Requests rejected by circuit breaker	Any non-zero count
`downstream_cx_active`	Active client connections	Near capacity
`downstream_rq_total`	Total incoming requests	Unexpected spikes

Distributed Tracing

Envoy can propagate trace headers and report spans to Zipkin, Jaeger, or any OpenTelemetry collector:

tracing:
  http:
    name: envoy.tracers.opentelemetry
    typed_config:
      "@type": type.googleapis.com/envoy.config.trace.v3.OpenTelemetryConfig
      grpc_service:
        envoy_grpc:
          cluster_name: otel_collector
      service_name: my-service

Envoy automatically generates spans for each request and propagates trace context headers between services. The key headers propagated:

Header	Tracing System
`x-request-id`	Envoy internal
`x-b3-traceid`, `x-b3-spanid`, `x-b3-parentspanid`	Zipkin/B3
`traceparent`, `tracestate`	W3C Trace Context
`x-cloud-trace-context`	Google Cloud Trace

Your application code only needs to forward these headers on outbound requests. Envoy handles span creation, timing, and reporting.

Access Logs

Configure structured access logging for request-level debugging:

access_log:
  - name: envoy.access_loggers.file
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
      path: /var/log/envoy/access.json
      log_format:
        json_format:
          timestamp: "%START_TIME%"
          method: "%REQ(:METHOD)%"
          path: "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%"
          protocol: "%PROTOCOL%"
          response_code: "%RESPONSE_CODE%"
          response_flags: "%RESPONSE_FLAGS%"
          duration_ms: "%DURATION%"
          upstream_host: "%UPSTREAM_HOST%"
          upstream_cluster: "%UPSTREAM_CLUSTER%"
          upstream_local_address: "%UPSTREAM_LOCAL_ADDRESS%"
          bytes_received: "%BYTES_RECEIVED%"
          bytes_sent: "%BYTES_SENT%"
          request_id: "%REQ(X-REQUEST-ID)%"
          user_agent: "%REQ(USER-AGENT)%"
          downstream_remote_address: "%DOWNSTREAM_REMOTE_ADDRESS%"

Response flags are particularly useful for debugging:

Flag	Meaning
`UH`	No healthy upstream hosts
`UF`	Upstream connection failure
`UO`	Upstream overflow (circuit breaker triggered)
`UT`	Upstream request timeout
`UC`	Upstream connection termination
`LR`	Connection local reset
`RL`	Rate limited
`DC`	Downstream connection termination
`NR`	No route configured

Comparison with Nginx and HAProxy

Feature	Envoy	Nginx	HAProxy
Primary use case	Service mesh, east-west traffic	Web server, reverse proxy, north-south	Load balancing, north-south
Configuration	YAML, dynamic via xDS APIs	Config files, reload signal	Config files, reload signal
Hot reload	Draining + hot restart, or xDS (no restart)	Worker process reload	Seamless reload with fd passing
gRPC support	Native, first-class, bidirectional streaming	Basic reverse proxy (since 1.13)	TCP mode only (no L7 awareness)
HTTP/2	Full support, including upstream H2	Full downstream, limited upstream H2	Full support
Circuit breaking	Built-in with configurable thresholds	Not built-in	Not built-in
Retry policies	Configurable per-route with backoff and budgets	Limited retry with `proxy_next_upstream`	Retries with `retries` directive
Distributed tracing	Built-in (Zipkin, Jaeger, OTel)	Via third-party modules	Not built-in
Observability	Thousands of metrics, histograms	Basic stub_status + modules	Stats page + Prometheus exporter
Outlier detection	Built-in passive health checking	`max_fails` (basic)	Health checks (active only)
Rate limiting	Local + global (external service)	`limit_req_zone` (built-in)	Stick tables (built-in)
Sidecar pattern	Designed for it, minimal footprint	Possible but heavier	Not designed for it
Dynamic config	Full xDS API, no restart needed	Reload required	Reload + runtime API
WebAssembly plugins	Built-in Wasm support	Not supported	Not supported
Learning curve	Steep (verbose YAML, many concepts)	Moderate (intuitive config syntax)	Moderate (four-section model)
Memory footprint	~30-50MB per sidecar	~5-10MB per worker	~5-10MB base
Community	CNCF, service mesh ecosystem	Broad, web-focused, largest install base	Load balancing focused, proven at scale

When to Use Each

Choose Envoy when: You are running microservices and need circuit breaking, retries, distributed tracing, and dynamic configuration. Essential if adopting a service mesh. Best for gRPC-heavy environments and Kubernetes-native architectures.

Choose Nginx when: You need a web server that also does reverse proxying, caching, and static file serving. Best for traditional architectures, simple deployments, and when your team already knows Nginx well.

Choose HAProxy when: You need dedicated, high-performance load balancing with advanced health checking, stick tables, and TCP proxying. Excellent for database load balancing and environments where connection-level control matters.

Combine them: Many production architectures use Envoy for east-west traffic (sidecar mesh) while using Nginx or HAProxy at the edge for north-south traffic. The tools are complementary, not mutually exclusive.

Troubleshooting

Admin Interface

The admin interface at port 9901 is your primary debugging tool:

# View all registered clusters and their health
curl http://localhost:9901/clusters

# View all registered listeners
curl http://localhost:9901/listeners

# View current configuration dump
curl http://localhost:9901/config_dump

# View server info (version, uptime, state)
curl http://localhost:9901/server_info

# Check readiness
curl http://localhost:9901/ready

# View hot restart version
curl http://localhost:9901/hot_restart_version

# Log level adjustment at runtime
curl -X POST "http://localhost:9901/logging?level=debug"
curl -X POST "http://localhost:9901/logging?level=info"

Common Issues

Symptom	Response Flag	Likely Cause	Fix
503 No Healthy Upstream	UH	All backends failed health checks	Check backend health, verify health check path
503 Upstream Overflow	UO	Circuit breaker tripped	Increase circuit breaker thresholds or add capacity
504 Upstream Timeout	UT	Backend too slow	Increase route timeout or per_try_timeout
503 No Route	NR	No matching route for the request	Check route config, domain matching, path prefixes
Connection reset	UC	Backend closed connection unexpectedly	Check backend connection limits, keepalive settings
Retry storms	High retry count	Too many retries without budget	Add retry_budget, reduce num_retries

Debug Logging

Enable debug logging temporarily to trace request flow:

# Set all loggers to debug
curl -X POST "http://localhost:9901/logging?level=debug"

# Set specific logger
curl -X POST "http://localhost:9901/logging?connection=debug"
curl -X POST "http://localhost:9901/logging?http=debug"
curl -X POST "http://localhost:9901/logging?router=debug"

# Reset to info after debugging
curl -X POST "http://localhost:9901/logging?level=info"

Getting Started

A minimal Docker Compose setup to try Envoy as an edge proxy:

services:
  envoy:
    image: envoyproxy/envoy:v1.30-latest
    volumes:
      - ./envoy.yaml:/etc/envoy/envoy.yaml
    ports:
      - "8080:8080"
      - "9901:9901"

  app:
    image: your-app:latest
    expose:
      - "3000"

Start with static configuration. Learn the listener-route-cluster model by building a simple edge proxy. Once that is comfortable, add health checks and circuit breakers. Then explore xDS for dynamic configuration. If you are running Kubernetes, consider Istio or Consul Connect -- they use Envoy as the data plane but manage the configuration complexity for you through higher-level abstractions.

Key Takeaways

Envoy excels at east-west (service-to-service) traffic in microservices architectures. Its circuit breaking, retries, and observability features are specifically designed for this problem.
The core mental model is listeners (where connections arrive), filter chains (how they are processed), routes (where they go), and clusters (the upstream services).
Use static configuration for edge proxies and development. Use xDS with a control plane for service mesh deployments where services are dynamic.
Circuit breaking with retry budgets prevents cascade failures. Without retry budgets, retries can amplify failures instead of recovering from them.
Configure retries only for idempotent operations. Retrying a non-idempotent POST can cause data corruption.
Outlier detection (passive health checking) catches failures that active health checks miss, like services that respond to /healthz but fail on real requests.
Envoy's built-in stats, distributed tracing, and access logs give you deep visibility into every hop without modifying application code.
The admin interface on port 9901 is your primary debugging tool. Use it to inspect clusters, check health, adjust log levels, and dump configuration.
Start simple with static config and Docker Compose before investing in a full service mesh. Understand the fundamentals before adding automation.
Envoy, Nginx, and HAProxy are complementary tools. Many production architectures use Envoy for the mesh and Nginx or HAProxy at the edge.

On this page

Related Articles

Envoy Proxy Rate Limiting: Global and Local Strategies

Envoy Proxy: Architecture, xDS Configuration, and Getting Started

Envoy Traffic Management: Retries, Timeouts, Canary Deployments, and Rate Limiting

Fix Envoy Proxy 'upstream connect error or disconnect/reset before headers'

Envoy Observability: Distributed Tracing and Metrics

Istio Installation & Architecture: Your First Service Mesh

Discussion