DevOpsil
Envoy
92%
Fresh

Envoy Proxy for Microservices: Edge and Sidecar Patterns

Riku TanakaRiku Tanaka26 min read

Envoy is a high-performance proxy designed for microservices architectures. Built at Lyft and now a CNCF graduated project, it powers the data plane of service meshes like Istio, Consul Connect, and AWS App Mesh. Whether you use it standalone or as part of a mesh, understanding Envoy's architecture gives you fine-grained control over service-to-service communication, with features like automatic retries, circuit breaking, outlier detection, and distributed tracing built directly into the proxy layer.

This guide covers Envoy's core architecture in depth, complete configuration examples for both edge proxy and sidecar deployments, the xDS dynamic configuration model, SSL/TLS termination and mutual TLS, circuit breaking tuning, retry policies, rate limiting strategies, health checking, full observability setup, and a thorough comparison with Nginx and HAProxy to help you choose the right tool.

What Envoy Is and Why It Exists

Traditional proxies like Nginx and HAProxy were built for the north-south traffic pattern: clients on the internet connecting to servers in a data center. They excel at this. Envoy was built for a different problem: east-west traffic, where services inside a cluster talk to each other over the network.

In a microservices environment, a single user request may fan out to 10, 20, or 50 downstream service calls. Each of those calls is a potential failure point. A single slow service can cascade and bring down the entire system. Traditional application-level libraries (like Netflix Hystrix or resilience4j) address this, but they require code changes in every service, in every language. Envoy moves these capabilities into the infrastructure layer:

  • Automatic retries with configurable backoff and retry budgets
  • Circuit breaking to prevent cascade failures across the service graph
  • Outlier detection to eject misbehaving hosts from load balancing pools
  • Distributed tracing with automatic span generation and context propagation
  • Dynamic configuration through APIs (xDS) rather than config file reloads
  • Protocol-aware routing for HTTP/1.1, HTTP/2, gRPC, and raw TCP
  • Weighted traffic splitting for canary deployments and A/B testing
  • Mutual TLS for zero-trust service-to-service authentication

The key insight is that by embedding these capabilities in the proxy, every service gets them without changing application code. A Python service, a Go service, and a Java service all benefit equally from the same Envoy sidecar configuration.

Architecture Overview

Envoy's configuration model has four core concepts that map directly to how network traffic flows:

Incoming Connection
      |
      v
+---------------------+
|  Listener            |  Accepts connections on IP:port
|  (network address)   |
+---------------------+
      |
      v
+---------------------+
|  Filter Chain        |  Processes data through ordered filters
|  (L4 + L7 filters)  |  (TLS, HTTP parsing, auth, rate limit)
+---------------------+
      |
      v
+---------------------+
|  Route Table         |  Maps request attributes to clusters
|  (path, headers)     |  (path prefix, header match, weighted)
+---------------------+
      |
      v
+---------------------+
|  Cluster             |  Group of upstream hosts
|  (load balancing)    |  (health checks, circuit breakers)
+---------------------+
      |
      v
+---------------------+
|  Endpoint            |  Individual backend host:port
+---------------------+

Listeners

A listener is a named network location (IP + port) where Envoy accepts connections. Each listener has one or more filter chains that process the traffic:

listeners:
  - name: http_listener
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 8080
    per_connection_buffer_limit_bytes: 32768
    filter_chains:
      - filters:
          - name: envoy.filters.network.http_connection_manager
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
              stat_prefix: ingress_http
              codec_type: AUTO
              use_remote_address: true
              common_http_protocol_options:
                idle_timeout: 3600s
                headers_with_underscores_action: REJECT_REQUEST
              http2_protocol_options:
                max_concurrent_streams: 128
                initial_stream_window_size: 65536
                initial_connection_window_size: 1048576
              route_config:
                name: local_route
                virtual_hosts:
                  - name: backend
                    domains: ["*"]
                    routes:
                      - match: { prefix: "/" }
                        route: { cluster: app_service }
              http_filters:
                - name: envoy.filters.http.router
                  typed_config:
                    "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

Key listener parameters:

ParameterPurposeRecommended Value
per_connection_buffer_limit_bytesMemory limit per connection32768 (32KB)
use_remote_addressUse real client IP for access log and rate limitingtrue for edge proxy
codec_type: AUTOAuto-detect HTTP/1.1 vs HTTP/2AUTO for edge, HTTP2 for gRPC
headers_with_underscores_actionReject headers with underscores (security)REJECT_REQUEST
max_concurrent_streamsHTTP/2 stream multiplexing limit100-256

Filter Chains

Filters are the heart of Envoy's extensibility. They process traffic at both the network (L4) and HTTP (L7) layers. Network filters handle raw bytes; HTTP filters handle parsed requests. Filters are composable and execute in order:

Filter TypeNamePurpose
Networkhttp_connection_managerParse HTTP, apply HTTP filters, route to clusters
Networktcp_proxyPlain TCP proxying without HTTP awareness
Networkredis_proxyRedis protocol-aware proxying with command splitting
Networkmongo_proxyMongoDB wire protocol sniffing for metrics
HTTProuterRoute requests to clusters (required, always last)
HTTPlocal_ratelimitPer-instance token bucket rate limiting
HTTPratelimitGlobal rate limiting via external service
HTTPcorsCross-Origin Resource Sharing handling
HTTPjwt_authnJWT token validation
HTTPext_authzExternal authorization via gRPC/HTTP callout
HTTPfaultFault injection for testing (delays, aborts)
HTTPcompressorResponse compression (gzip, brotli)
HTTPhealth_checkRespond to health checks without hitting upstream

Clusters

A cluster is a group of upstream hosts that Envoy routes traffic to. Clusters are where you configure load balancing, health checks, circuit breakers, and connection pooling:

clusters:
  - name: app_service
    connect_timeout: 5s
    type: STRICT_DNS
    dns_lookup_family: V4_ONLY
    lb_policy: ROUND_ROBIN
    common_lb_config:
      healthy_panic_threshold:
        value: 50
    transport_socket:
      name: envoy.transport_sockets.tls
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
        sni: app-service.internal
    load_assignment:
      cluster_name: app_service
      endpoints:
        - lb_endpoints:
            - endpoint:
                address:
                  socket_address:
                    address: app-service-1
                    port_value: 8080
              load_balancing_weight: 3
            - endpoint:
                address:
                  socket_address:
                    address: app-service-2
                    port_value: 8080
              load_balancing_weight: 2

Cluster service discovery types:

TypeBehaviorUse Case
STATICEndpoints hardcoded in configFixed infrastructure, testing
STRICT_DNSDNS resolution, all returned IPs usedDocker Compose, simple DNS-based discovery
LOGICAL_DNSDNS resolution, only first IP usedExternal services behind a load balancer
EDSEndpoints from xDS control planeService mesh, dynamic environments
ORIGINAL_DSTRoute to the original destination IPTransparent proxy, iptables redirect

The healthy_panic_threshold at 50% means Envoy will route to all hosts (including unhealthy ones) if more than 50% of hosts are marked unhealthy. This prevents a cascading failure where one bad health check config takes out your entire cluster.

Routes

Routes map incoming requests to clusters based on path, headers, query parameters, or other criteria:

route_config:
  virtual_hosts:
    - name: api
      domains: ["api.example.com"]
      request_headers_to_add:
        - header:
            key: x-custom-header
            value: "from-envoy"
          append_action: OVERWRITE_IF_EXISTS_OR_ADD
      routes:
        # Header-based routing for canary deployments
        - match:
            prefix: "/"
            headers:
              - name: x-canary
                exact_match: "true"
          route:
            cluster: api_canary
            timeout: 30s

        # Path-based routing with regex
        - match:
            safe_regex:
              regex: "/users/[0-9]+"
          route:
            cluster: user_service
            timeout: 10s

        # Weighted routing for gradual rollouts
        - match:
            prefix: "/api/v2"
          route:
            weighted_clusters:
              clusters:
                - name: api_v2_stable
                  weight: 90
                - name: api_v2_canary
                  weight: 10

        # Prefix rewrite (strip /api prefix)
        - match:
            prefix: "/api/"
          route:
            cluster: api_service
            prefix_rewrite: "/"
            timeout: 15s

        # Direct response (no upstream)
        - match:
            prefix: "/healthz"
          direct_response:
            status: 200
            body:
              inline_string: "ok"

        # Default route
        - match:
            prefix: "/"
          route:
            cluster: api_stable
            timeout: 30s

Weighted clusters are particularly useful for canary deployments. Send 10% of traffic to the new version and monitor error rates. If metrics look good, gradually increase the weight. If something goes wrong, shift back to 0% instantly without a deployment.

Static vs Dynamic Configuration (xDS)

Envoy supports two fundamentally different configuration approaches.

Static Configuration

Everything is defined in a YAML file loaded at startup. Changes require a restart (or hot restart). This is suitable for edge proxies, development environments, and simple deployments:

admin:
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 9901

static_resources:
  listeners:
    - name: main
      address:
        socket_address: { address: 0.0.0.0, port_value: 8080 }
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: ingress
                codec_type: AUTO
                route_config:
                  virtual_hosts:
                    - name: default
                      domains: ["*"]
                      routes:
                        - match: { prefix: "/api" }
                          route: { cluster: api_cluster }
                        - match: { prefix: "/" }
                          route: { cluster: web_cluster }
                http_filters:
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

  clusters:
    - name: api_cluster
      connect_timeout: 2s
      type: STRICT_DNS
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: api_cluster
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address: { address: api-service, port_value: 8080 }

    - name: web_cluster
      connect_timeout: 2s
      type: STRICT_DNS
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: web_cluster
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address: { address: web-service, port_value: 3000 }

Dynamic Configuration (xDS APIs)

In dynamic mode, Envoy fetches configuration from a control plane via gRPC or REST. This is what makes service meshes possible. The control plane pushes configuration changes to Envoy sidecars without restarts.

The xDS API family:

APIFull NameWhat It Configures
LDSListener Discovery ServiceListeners and filter chains
RDSRoute Discovery ServiceRoute tables and virtual hosts
CDSCluster Discovery ServiceUpstream cluster definitions
EDSEndpoint Discovery ServiceIndividual endpoints within clusters
SDSSecret Discovery ServiceTLS certificates and keys
ECDSExtension Config Discovery ServiceHTTP filter configurations
VHDSVirtual Host Discovery ServiceVirtual hosts (granular RDS)

Bootstrap configuration for dynamic mode:

node:
  cluster: my-cluster
  id: my-node-1
  metadata:
    region: us-east-1
    az: us-east-1a

admin:
  address:
    socket_address:
      address: 127.0.0.1
      port_value: 9901

dynamic_resources:
  lds_config:
    resource_api_version: V3
    api_config_source:
      api_type: GRPC
      grpc_services:
        - envoy_grpc:
            cluster_name: xds_cluster
      transport_api_version: V3
      set_node_on_first_message_only: true
  cds_config:
    resource_api_version: V3
    api_config_source:
      api_type: GRPC
      grpc_services:
        - envoy_grpc:
            cluster_name: xds_cluster
      transport_api_version: V3
      set_node_on_first_message_only: true

static_resources:
  clusters:
    - name: xds_cluster
      connect_timeout: 5s
      type: STRICT_DNS
      lb_policy: ROUND_ROBIN
      typed_extension_protocol_options:
        envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
          "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
          explicit_http_config:
            http2_protocol_options: {}
      load_assignment:
        cluster_name: xds_cluster
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: control-plane
                      port_value: 18000

The control plane itself can be built with frameworks like go-control-plane (Go), java-control-plane (Java), or commercial solutions like Istio, Consul, or Gloo. The xDS protocol is the universal interface -- any control plane that speaks xDS works with Envoy.

ADS (Aggregated Discovery Service)

For production deployments, use ADS to ensure consistent configuration updates. Without ADS, updates to CDS and EDS can arrive in different order, potentially routing to clusters that do not exist yet:

dynamic_resources:
  ads_config:
    api_type: GRPC
    grpc_services:
      - envoy_grpc:
          cluster_name: xds_cluster
    transport_api_version: V3
  lds_config:
    resource_api_version: V3
    ads: {}
  cds_config:
    resource_api_version: V3
    ads: {}

SSL/TLS Termination and Mutual TLS

Edge TLS Termination

For an edge proxy that terminates TLS from external clients:

listeners:
  - name: https_listener
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 8443
    filter_chains:
      - transport_socket:
          name: envoy.transport_sockets.tls
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
            common_tls_context:
              tls_params:
                tls_minimum_protocol_version: TLSv1_2
                tls_maximum_protocol_version: TLSv1_3
                cipher_suites:
                  - ECDHE-ECDSA-AES128-GCM-SHA256
                  - ECDHE-RSA-AES128-GCM-SHA256
                  - ECDHE-ECDSA-AES256-GCM-SHA384
                  - ECDHE-RSA-AES256-GCM-SHA384
              tls_certificates:
                - certificate_chain:
                    filename: /etc/envoy/certs/server.crt
                  private_key:
                    filename: /etc/envoy/certs/server.key
              alpn_protocols: ["h2", "http/1.1"]
        filters:
          - name: envoy.filters.network.http_connection_manager
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
              stat_prefix: ingress_https
              codec_type: AUTO
              route_config:
                virtual_hosts:
                  - name: default
                    domains: ["*"]
                    routes:
                      - match: { prefix: "/" }
                        route: { cluster: app_service }
              http_filters:
                - name: envoy.filters.http.router
                  typed_config:
                    "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

Mutual TLS (mTLS) Between Services

mTLS is the foundation of zero-trust networking. Both client and server verify each other's certificates:

# On the server side (downstream TLS context)
transport_socket:
  name: envoy.transport_sockets.tls
  typed_config:
    "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
    require_client_certificate: true
    common_tls_context:
      tls_certificates:
        - certificate_chain:
            filename: /etc/envoy/certs/server.crt
          private_key:
            filename: /etc/envoy/certs/server.key
      validation_context:
        trusted_ca:
          filename: /etc/envoy/certs/ca.crt
        match_typed_subject_alt_names:
          - san_type: DNS
            matcher:
              exact: "client-service.internal"
# On the client side (upstream TLS context in cluster)
clusters:
  - name: secure_service
    transport_socket:
      name: envoy.transport_sockets.tls
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
        common_tls_context:
          tls_certificates:
            - certificate_chain:
                filename: /etc/envoy/certs/client.crt
              private_key:
                filename: /etc/envoy/certs/client.key
          validation_context:
            trusted_ca:
              filename: /etc/envoy/certs/ca.crt
        sni: secure-service.internal

In a service mesh like Istio, mTLS is configured automatically through SDS (Secret Discovery Service). The control plane provisions and rotates certificates for every sidecar without manual intervention.

Deploying as Edge Proxy

As an edge proxy, Envoy replaces Nginx or HAProxy at the ingress point of your infrastructure:

# docker-compose.yml
services:
  envoy:
    image: envoyproxy/envoy:v1.30-latest
    ports:
      - "80:8080"
      - "443:8443"
      - "9901:9901"
    volumes:
      - ./envoy.yaml:/etc/envoy/envoy.yaml
      - ./certs:/etc/envoy/certs:ro
    command: ["-c", "/etc/envoy/envoy.yaml", "--service-cluster", "edge", "--service-node", "edge-1"]
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 512M
    restart: unless-stopped

Complete Edge Proxy Configuration

admin:
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 9901

static_resources:
  listeners:
    - name: http_listener
      address:
        socket_address:
          address: 0.0.0.0
          port_value: 8080
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: ingress_http
                codec_type: AUTO
                use_remote_address: true
                route_config:
                  virtual_hosts:
                    - name: redirect
                      domains: ["*"]
                      routes:
                        - match: { prefix: "/" }
                          redirect:
                            https_redirect: true
                            response_code: MOVED_PERMANENTLY
                http_filters:
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

    - name: https_listener
      address:
        socket_address:
          address: 0.0.0.0
          port_value: 8443
      filter_chains:
        - transport_socket:
            name: envoy.transport_sockets.tls
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
              common_tls_context:
                tls_params:
                  tls_minimum_protocol_version: TLSv1_2
                tls_certificates:
                  - certificate_chain: { filename: /etc/envoy/certs/server.crt }
                    private_key: { filename: /etc/envoy/certs/server.key }
                alpn_protocols: ["h2", "http/1.1"]
          filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: ingress_https
                codec_type: AUTO
                use_remote_address: true
                access_log:
                  - name: envoy.access_loggers.stdout
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
                      log_format:
                        json_format:
                          timestamp: "%START_TIME%"
                          method: "%REQ(:METHOD)%"
                          path: "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%"
                          protocol: "%PROTOCOL%"
                          status: "%RESPONSE_CODE%"
                          duration: "%DURATION%"
                          bytes: "%BYTES_SENT%"
                          upstream: "%UPSTREAM_HOST%"
                          request_id: "%REQ(X-REQUEST-ID)%"
                route_config:
                  virtual_hosts:
                    - name: api
                      domains: ["api.example.com"]
                      routes:
                        - match: { prefix: "/" }
                          route:
                            cluster: api_service
                            timeout: 30s
                    - name: web
                      domains: ["www.example.com", "example.com"]
                      routes:
                        - match: { prefix: "/static/" }
                          route:
                            cluster: static_service
                            timeout: 10s
                        - match: { prefix: "/" }
                          route:
                            cluster: web_service
                            timeout: 30s
                http_filters:
                  - name: envoy.filters.http.local_ratelimit
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
                      stat_prefix: local_rate_limit
                      token_bucket:
                        max_tokens: 1000
                        tokens_per_fill: 1000
                        fill_interval: 1s
                      filter_enabled:
                        runtime_key: local_rate_limit_enabled
                        default_value: { numerator: 100, denominator: HUNDRED }
                      filter_enforced:
                        runtime_key: local_rate_limit_enforced
                        default_value: { numerator: 100, denominator: HUNDRED }
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

  clusters:
    - name: api_service
      connect_timeout: 2s
      type: STRICT_DNS
      lb_policy: ROUND_ROBIN
      health_checks:
        - timeout: 3s
          interval: 10s
          unhealthy_threshold: 3
          healthy_threshold: 2
          http_health_check:
            path: /healthz
      circuit_breakers:
        thresholds:
          - priority: DEFAULT
            max_connections: 1000
            max_pending_requests: 500
            max_requests: 2000
            max_retries: 10
      load_assignment:
        cluster_name: api_service
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address: { address: api-svc, port_value: 8080 }

    - name: web_service
      connect_timeout: 2s
      type: STRICT_DNS
      lb_policy: ROUND_ROBIN
      health_checks:
        - timeout: 3s
          interval: 10s
          unhealthy_threshold: 3
          healthy_threshold: 2
          http_health_check:
            path: /health
      load_assignment:
        cluster_name: web_service
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address: { address: web-svc, port_value: 3000 }

    - name: static_service
      connect_timeout: 1s
      type: STRICT_DNS
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: static_service
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address: { address: static-svc, port_value: 80 }

Deploying as Sidecar

The sidecar pattern runs an Envoy instance alongside each service instance. In Kubernetes, it runs as a container in the same pod, sharing the network namespace:

apiVersion: v1
kind: Pod
metadata:
  name: my-service
  labels:
    app: my-service
spec:
  containers:
    - name: app
      image: my-app:latest
      ports:
        - containerPort: 8080
      env:
        - name: HTTP_PROXY
          value: "http://127.0.0.1:9211"
      resources:
        requests:
          cpu: 100m
          memory: 128Mi

    - name: envoy-sidecar
      image: envoyproxy/envoy:v1.30-latest
      ports:
        - containerPort: 9901
          name: envoy-admin
        - containerPort: 9211
          name: envoy-egress
        - containerPort: 9212
          name: envoy-ingress
      volumeMounts:
        - name: envoy-config
          mountPath: /etc/envoy
      resources:
        requests:
          cpu: 50m
          memory: 64Mi
        limits:
          cpu: 200m
          memory: 128Mi
      readinessProbe:
        httpGet:
          path: /ready
          port: 9901
        initialDelaySeconds: 2
        periodSeconds: 5
      livenessProbe:
        httpGet:
          path: /server_info
          port: 9901
        initialDelaySeconds: 5
        periodSeconds: 15

  volumes:
    - name: envoy-config
      configMap:
        name: envoy-sidecar-config

  initContainers:
    - name: init-iptables
      image: envoyproxy/envoy:v1.30-latest
      securityContext:
        capabilities:
          add: ["NET_ADMIN"]
      command:
        - sh
        - -c
        - |
          iptables -t nat -A PREROUTING -p tcp --dport 8080 -j REDIRECT --to-port 9212
          iptables -t nat -A OUTPUT -p tcp --dport 8080 -m owner ! --uid-owner 1337 -j REDIRECT --to-port 9211

The init container sets up iptables rules that transparently redirect traffic through Envoy. Inbound traffic to port 8080 is redirected to Envoy's ingress listener (9212), and outbound traffic from the app is redirected to Envoy's egress listener (9211). The --uid-owner 1337 exclusion prevents Envoy's own traffic from being redirected (infinite loop).

In production service meshes like Istio, all of this is automated. The Istio sidecar injector automatically adds the Envoy container and iptables init container to every pod.

Circuit Breaking

Circuit breaking prevents a failing service from consuming all available resources and cascading the failure to its callers. When a service becomes slow or unresponsive, Envoy stops sending it traffic:

clusters:
  - name: payment_service
    connect_timeout: 2s
    type: STRICT_DNS
    lb_policy: ROUND_ROBIN
    circuit_breakers:
      thresholds:
        - priority: DEFAULT
          max_connections: 100
          max_pending_requests: 50
          max_requests: 200
          max_retries: 3
          track_remaining: true
          retry_budget:
            budget_percent:
              value: 20.0
            min_retry_concurrency: 3
        - priority: HIGH
          max_connections: 200
          max_pending_requests: 100
          max_requests: 400
          max_retries: 5
    load_assignment:
      cluster_name: payment_service
      endpoints:
        - lb_endpoints:
            - endpoint:
                address:
                  socket_address: { address: payment-svc, port_value: 8080 }

Circuit breaker thresholds explained:

ThresholdWhat It LimitsWhen Circuit Opens
max_connectionsConcurrent TCP connections to the clusterNew connections return 503
max_pending_requestsRequests waiting for a connection from the poolQueued requests return 503
max_requestsTotal concurrent requests (HTTP/2 multiplexed)New requests return 503
max_retriesConcurrent retry attempts across the clusterRetries are skipped
retry_budgetPercentage of active requests that can be retriesPrevents retry storms

The retry_budget is particularly important. Without it, a failing service can experience a "retry storm" where every failed request generates retries, which also fail and generate more retries. Setting budget_percent to 20% means only 20% of active requests can be retries at any time.

Monitor circuit breaker state via Envoy's stats:

curl -s http://localhost:9901/stats | grep circuit_breakers
# cluster.payment_service.circuit_breakers.default.cx_open: 0
# cluster.payment_service.circuit_breakers.default.cx_pool_open: 0
# cluster.payment_service.circuit_breakers.default.rq_open: 0
# cluster.payment_service.circuit_breakers.default.rq_pending_open: 0
# cluster.payment_service.circuit_breakers.default.remaining_cx: 100
# cluster.payment_service.circuit_breakers.default.remaining_pending: 50

When any _open counter is non-zero, the circuit is open for that threshold. The remaining_* counters (enabled by track_remaining: true) show headroom.

Retries and Timeouts

Configure retries per route for transient failures:

routes:
  - match:
      prefix: "/api/"
    route:
      cluster: api_service
      timeout: 15s
      retry_policy:
        retry_on: "5xx,reset,connect-failure,retriable-4xx,refused-stream"
        num_retries: 3
        per_try_timeout: 5s
        per_try_idle_timeout: 3s
        retry_back_off:
          base_interval: 0.1s
          max_interval: 1s
        retriable_status_codes:
          - 503
          - 429
        retry_host_predicate:
          - name: envoy.retry_host_predicates.previous_hosts
        host_selection_retry_max_attempts: 5

Retry configuration reference:

SettingPurposeRecommendation
retry_onConditions that trigger a retryInclude 5xx, connect-failure, reset
num_retriesMaximum retry attempts2-3 for most services
per_try_timeoutTimeout for each individual attemptLess than overall route timeout
retry_back_offExponential backoff between retriesStart at 100ms, cap at 1s
retry_host_predicate.previous_hostsRetry on a different host than the one that failedAlways enable
retriable_status_codesAdditional HTTP status codes to retry on503, 429

Critical rule: only retry idempotent operations. GET requests are safe to retry. POST requests should generally not be retried unless your API is designed for idempotency (e.g., uses idempotency keys). Retrying a non-idempotent POST can cause duplicate charges, duplicate messages, or other data corruption.

For non-idempotent routes, use a separate retry policy or disable retries entirely:

routes:
  - match:
      prefix: "/api/payments"
      headers:
        - name: ":method"
          exact_match: "POST"
    route:
      cluster: payment_service
      timeout: 30s
      # No retry_policy -- do not retry payments

Rate Limiting

Local Rate Limiting

Applied per Envoy instance using a token bucket algorithm:

http_filters:
  - name: envoy.filters.http.local_ratelimit
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
      stat_prefix: http_local_rate_limiter
      token_bucket:
        max_tokens: 1000
        tokens_per_fill: 1000
        fill_interval: 1s
      filter_enabled:
        runtime_key: local_rate_limit_enabled
        default_value: { numerator: 100, denominator: HUNDRED }
      filter_enforced:
        runtime_key: local_rate_limit_enforced
        default_value: { numerator: 100, denominator: HUNDRED }
      response_headers_to_add:
        - append_action: OVERWRITE_IF_EXISTS_OR_ADD
          header:
            key: x-ratelimit-limit
            value: "1000"
        - append_action: OVERWRITE_IF_EXISTS_OR_ADD
          header:
            key: x-ratelimit-remaining
            value: "0"
      status:
        code: TooManyRequests

Global Rate Limiting

For coordinated rate limiting across all Envoy instances, use an external rate limit service like envoy-ratelimit:

http_filters:
  - name: envoy.filters.http.ratelimit
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
      domain: my_domain
      failure_mode_deny: false
      rate_limit_service:
        grpc_service:
          envoy_grpc:
            cluster_name: rate_limit_service
        transport_api_version: V3

The failure_mode_deny: false setting means that if the rate limit service is unreachable, requests are allowed through. Set to true if you want to fail closed (deny requests when the rate limiter is down).

Configure per-route rate limit actions:

routes:
  - match:
      prefix: "/api/"
    route:
      cluster: api_service
      rate_limits:
        - actions:
            - remote_address: {}
        - actions:
            - request_headers:
                header_name: x-api-key
                descriptor_key: api_key

Health Checking and Outlier Detection

Envoy supports both active and passive health checking.

Active Health Checks

clusters:
  - name: api_service
    health_checks:
      - timeout: 3s
        interval: 10s
        unhealthy_threshold: 3
        healthy_threshold: 2
        no_traffic_interval: 60s
        no_traffic_healthy_interval: 120s
        http_health_check:
          path: /healthz
          host: api-service.internal
          expected_statuses:
            - start: 200
              end: 200
          request_headers_to_add:
            - header:
                key: x-health-check
                value: "envoy"
              append_action: OVERWRITE_IF_EXISTS_OR_ADD

The no_traffic_interval reduces health check frequency for clusters that are not receiving real traffic. This saves resources in large deployments with many clusters.

gRPC Health Checks

For gRPC services implementing the standard health checking protocol:

health_checks:
  - timeout: 2s
    interval: 10s
    unhealthy_threshold: 3
    healthy_threshold: 2
    grpc_health_check:
      service_name: my.service.Name

Outlier Detection (Passive Health Checking)

Outlier detection watches real traffic and ejects hosts that are performing badly. It catches issues that active health checks miss, like a service that responds to /healthz but fails on real requests:

clusters:
  - name: api_service
    outlier_detection:
      consecutive_5xx: 5
      interval: 10s
      base_ejection_time: 30s
      max_ejection_percent: 50
      enforcing_consecutive_5xx: 100
      enforcing_success_rate: 100
      success_rate_minimum_hosts: 3
      success_rate_request_volume: 100
      success_rate_stdev_factor: 1900
      consecutive_gateway_failure: 3
      enforcing_consecutive_gateway_failure: 100
      split_external_local_origin_errors: true

Outlier detection parameters:

ParameterPurposeDefault
consecutive_5xxEject after N consecutive 5xx responses5
intervalHow often to evaluate outlier status10s
base_ejection_timeBase duration of ejection (multiplied by ejection count)30s
max_ejection_percentMax percentage of hosts that can be ejected10
success_rate_minimum_hostsMinimum hosts needed for success rate analysis5
success_rate_stdev_factorStandard deviations from mean before ejection1900 (1.9x)

The max_ejection_percent is a safety valve. Even if every host is failing, Envoy will not eject more than this percentage, preventing a complete cluster outage.

Built-In Observability

Envoy's observability is its strongest differentiator. Every proxy instance exposes rich telemetry without any application code changes.

Stats and Prometheus Metrics

Envoy exposes thousands of metrics via the admin interface:

# All stats
curl http://localhost:9901/stats

# Prometheus format
curl http://localhost:9901/stats/prometheus

# Filter by pattern
curl "http://localhost:9901/stats?filter=cluster.api_service"

# Only counters
curl "http://localhost:9901/stats?type=Counters"

Key metrics to monitor and alert on:

MetricWhat It Tells YouAlert When
upstream_rq_totalTotal requests to a clusterN/A (informational)
upstream_rq_5xx5xx error countRate exceeds baseline
upstream_rq_timeRequest latency histogramp99 exceeds SLA
upstream_cx_activeActive connections to upstreamNear circuit breaker limit
upstream_cx_connect_failConnection failuresAny non-zero count
membership_healthyHealthy hosts in clusterBelow minimum threshold
membership_totalTotal hosts in clusterUnexpected changes
upstream_rq_retryRetry countHigh retry rate
upstream_rq_pending_overflowRequests rejected by circuit breakerAny non-zero count
downstream_cx_activeActive client connectionsNear capacity
downstream_rq_totalTotal incoming requestsUnexpected spikes

Distributed Tracing

Envoy can propagate trace headers and report spans to Zipkin, Jaeger, or any OpenTelemetry collector:

tracing:
  http:
    name: envoy.tracers.opentelemetry
    typed_config:
      "@type": type.googleapis.com/envoy.config.trace.v3.OpenTelemetryConfig
      grpc_service:
        envoy_grpc:
          cluster_name: otel_collector
      service_name: my-service

Envoy automatically generates spans for each request and propagates trace context headers between services. The key headers propagated:

HeaderTracing System
x-request-idEnvoy internal
x-b3-traceid, x-b3-spanid, x-b3-parentspanidZipkin/B3
traceparent, tracestateW3C Trace Context
x-cloud-trace-contextGoogle Cloud Trace

Your application code only needs to forward these headers on outbound requests. Envoy handles span creation, timing, and reporting.

Access Logs

Configure structured access logging for request-level debugging:

access_log:
  - name: envoy.access_loggers.file
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
      path: /var/log/envoy/access.json
      log_format:
        json_format:
          timestamp: "%START_TIME%"
          method: "%REQ(:METHOD)%"
          path: "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%"
          protocol: "%PROTOCOL%"
          response_code: "%RESPONSE_CODE%"
          response_flags: "%RESPONSE_FLAGS%"
          duration_ms: "%DURATION%"
          upstream_host: "%UPSTREAM_HOST%"
          upstream_cluster: "%UPSTREAM_CLUSTER%"
          upstream_local_address: "%UPSTREAM_LOCAL_ADDRESS%"
          bytes_received: "%BYTES_RECEIVED%"
          bytes_sent: "%BYTES_SENT%"
          request_id: "%REQ(X-REQUEST-ID)%"
          user_agent: "%REQ(USER-AGENT)%"
          downstream_remote_address: "%DOWNSTREAM_REMOTE_ADDRESS%"

Response flags are particularly useful for debugging:

FlagMeaning
UHNo healthy upstream hosts
UFUpstream connection failure
UOUpstream overflow (circuit breaker triggered)
UTUpstream request timeout
UCUpstream connection termination
LRConnection local reset
RLRate limited
DCDownstream connection termination
NRNo route configured

Comparison with Nginx and HAProxy

FeatureEnvoyNginxHAProxy
Primary use caseService mesh, east-west trafficWeb server, reverse proxy, north-southLoad balancing, north-south
ConfigurationYAML, dynamic via xDS APIsConfig files, reload signalConfig files, reload signal
Hot reloadDraining + hot restart, or xDS (no restart)Worker process reloadSeamless reload with fd passing
gRPC supportNative, first-class, bidirectional streamingBasic reverse proxy (since 1.13)TCP mode only (no L7 awareness)
HTTP/2Full support, including upstream H2Full downstream, limited upstream H2Full support
Circuit breakingBuilt-in with configurable thresholdsNot built-inNot built-in
Retry policiesConfigurable per-route with backoff and budgetsLimited retry with proxy_next_upstreamRetries with retries directive
Distributed tracingBuilt-in (Zipkin, Jaeger, OTel)Via third-party modulesNot built-in
ObservabilityThousands of metrics, histogramsBasic stub_status + modulesStats page + Prometheus exporter
Outlier detectionBuilt-in passive health checkingmax_fails (basic)Health checks (active only)
Rate limitingLocal + global (external service)limit_req_zone (built-in)Stick tables (built-in)
Sidecar patternDesigned for it, minimal footprintPossible but heavierNot designed for it
Dynamic configFull xDS API, no restart neededReload requiredReload + runtime API
WebAssembly pluginsBuilt-in Wasm supportNot supportedNot supported
Learning curveSteep (verbose YAML, many concepts)Moderate (intuitive config syntax)Moderate (four-section model)
Memory footprint~30-50MB per sidecar~5-10MB per worker~5-10MB base
CommunityCNCF, service mesh ecosystemBroad, web-focused, largest install baseLoad balancing focused, proven at scale

When to Use Each

Choose Envoy when: You are running microservices and need circuit breaking, retries, distributed tracing, and dynamic configuration. Essential if adopting a service mesh. Best for gRPC-heavy environments and Kubernetes-native architectures.

Choose Nginx when: You need a web server that also does reverse proxying, caching, and static file serving. Best for traditional architectures, simple deployments, and when your team already knows Nginx well.

Choose HAProxy when: You need dedicated, high-performance load balancing with advanced health checking, stick tables, and TCP proxying. Excellent for database load balancing and environments where connection-level control matters.

Combine them: Many production architectures use Envoy for east-west traffic (sidecar mesh) while using Nginx or HAProxy at the edge for north-south traffic. The tools are complementary, not mutually exclusive.

Troubleshooting

Admin Interface

The admin interface at port 9901 is your primary debugging tool:

# View all registered clusters and their health
curl http://localhost:9901/clusters

# View all registered listeners
curl http://localhost:9901/listeners

# View current configuration dump
curl http://localhost:9901/config_dump

# View server info (version, uptime, state)
curl http://localhost:9901/server_info

# Check readiness
curl http://localhost:9901/ready

# View hot restart version
curl http://localhost:9901/hot_restart_version

# Log level adjustment at runtime
curl -X POST "http://localhost:9901/logging?level=debug"
curl -X POST "http://localhost:9901/logging?level=info"

Common Issues

SymptomResponse FlagLikely CauseFix
503 No Healthy UpstreamUHAll backends failed health checksCheck backend health, verify health check path
503 Upstream OverflowUOCircuit breaker trippedIncrease circuit breaker thresholds or add capacity
504 Upstream TimeoutUTBackend too slowIncrease route timeout or per_try_timeout
503 No RouteNRNo matching route for the requestCheck route config, domain matching, path prefixes
Connection resetUCBackend closed connection unexpectedlyCheck backend connection limits, keepalive settings
Retry stormsHigh retry countToo many retries without budgetAdd retry_budget, reduce num_retries

Debug Logging

Enable debug logging temporarily to trace request flow:

# Set all loggers to debug
curl -X POST "http://localhost:9901/logging?level=debug"

# Set specific logger
curl -X POST "http://localhost:9901/logging?connection=debug"
curl -X POST "http://localhost:9901/logging?http=debug"
curl -X POST "http://localhost:9901/logging?router=debug"

# Reset to info after debugging
curl -X POST "http://localhost:9901/logging?level=info"

Getting Started

A minimal Docker Compose setup to try Envoy as an edge proxy:

services:
  envoy:
    image: envoyproxy/envoy:v1.30-latest
    volumes:
      - ./envoy.yaml:/etc/envoy/envoy.yaml
    ports:
      - "8080:8080"
      - "9901:9901"

  app:
    image: your-app:latest
    expose:
      - "3000"

Start with static configuration. Learn the listener-route-cluster model by building a simple edge proxy. Once that is comfortable, add health checks and circuit breakers. Then explore xDS for dynamic configuration. If you are running Kubernetes, consider Istio or Consul Connect -- they use Envoy as the data plane but manage the configuration complexity for you through higher-level abstractions.

Key Takeaways

  • Envoy excels at east-west (service-to-service) traffic in microservices architectures. Its circuit breaking, retries, and observability features are specifically designed for this problem.
  • The core mental model is listeners (where connections arrive), filter chains (how they are processed), routes (where they go), and clusters (the upstream services).
  • Use static configuration for edge proxies and development. Use xDS with a control plane for service mesh deployments where services are dynamic.
  • Circuit breaking with retry budgets prevents cascade failures. Without retry budgets, retries can amplify failures instead of recovering from them.
  • Configure retries only for idempotent operations. Retrying a non-idempotent POST can cause data corruption.
  • Outlier detection (passive health checking) catches failures that active health checks miss, like services that respond to /healthz but fail on real requests.
  • Envoy's built-in stats, distributed tracing, and access logs give you deep visibility into every hop without modifying application code.
  • The admin interface on port 9901 is your primary debugging tool. Use it to inspect clusters, check health, adjust log levels, and dump configuration.
  • Start simple with static config and Docker Compose before investing in a full service mesh. Understand the fundamentals before adding automation.
  • Envoy, Nginx, and HAProxy are complementary tools. Many production architectures use Envoy for the mesh and Nginx or HAProxy at the edge.
Share:
Riku Tanaka
Riku Tanaka

SRE & Observability Engineer

If it's not measured, it doesn't exist. SLO-driven, metrics-obsessed, and the person who gets paged at 3 AM so you don't have to. Observability isn't optional.

Related Articles