Scalable Log Aggregation with Grafana Loki and Promtail
Why Elasticsearch Isn't the Only Answer Anymore
Every team I've worked with that ran Elasticsearch for logs eventually hit the same wall: storage costs spiral, cluster management becomes a full-time job, and the JVM heap tuning alone requires a PhD. Elasticsearch is powerful, but it indexes everything by default — and most of that index is never queried.
Loki takes a fundamentally different approach. It indexes only metadata (labels), not log content. The actual log lines are stored compressed in object storage. This makes it dramatically cheaper to operate and pairs naturally with the Prometheus label model your team already understands.
As the Google SRE book reminds us, the cost of your observability stack should be proportional to the value it delivers. Loki gets that balance right for most teams.
Architecture Overview
┌──────────┐ ┌───────────┐ ┌──────────┐ ┌──────────────┐
│ App Pods │────▶│ Promtail │────▶│ Loki │────▶│ Grafana │
│ (stdout) │ │ (DaemonSet)│ │ (Gateway) │ │ (LogQL UI) │
└──────────┘ └───────────┘ └──────────┘ └──────────────┘
│
┌────┴────┐
│ S3 / │
│ MinIO │
└─────────┘
Promtail runs on every node, tails container log files, attaches Kubernetes labels, and pushes to Loki. Loki stores chunks in object storage and maintains a small index for label lookups. Grafana queries Loki using LogQL.
Deploying Loki with Helm
For production, use the Simple Scalable deployment mode. It separates read and write paths for independent scaling.
# values-loki.yaml
loki:
auth_enabled: false
commonConfig:
replication_factor: 1
storage:
type: s3
bucketNames:
chunks: loki-chunks
ruler: loki-ruler
s3:
endpoint: minio.storage:9000
accessKeyId: ${MINIO_ACCESS_KEY}
secretAccessKey: ${MINIO_SECRET_KEY}
s3ForcePathStyle: true
insecure: true
schemaConfig:
configs:
- from: "2024-01-01"
store: tsdb
object_store: s3
schema: v13
index:
prefix: loki_index_
period: 24h
limits_config:
retention_period: 744h # 31 days
max_query_length: 721h
max_query_parallelism: 32
ingestion_rate_mb: 10
ingestion_burst_size_mb: 20
per_stream_rate_limit: 5MB
per_stream_rate_limit_burst: 15MB
compactor:
retention_enabled: true
working_directory: /tmp/compactor
write:
replicas: 3
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: "2"
memory: 2Gi
read:
replicas: 2
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: "2"
memory: 2Gi
gateway:
replicas: 2
helm install loki grafana/loki -n observability -f values-loki.yaml
Key configuration choices here. The per_stream_rate_limit prevents a single noisy service from overwhelming the cluster. The retention_period of 31 days is enough for most operational use — if you need logs older than that, you probably need an audit system, not a log aggregator.
Deploying Promtail
Promtail runs as a DaemonSet, reading container logs from the node filesystem.
# values-promtail.yaml
config:
clients:
- url: http://loki-gateway.observability/loki/api/v1/push
tenant_id: default
batchwait: 1s
batchsize: 1048576 # 1 MiB
positions:
filename: /run/promtail/positions.yaml
scrape_configs:
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
# Only collect logs from pods with annotation
- source_labels: [__meta_kubernetes_pod_annotation_logging_enabled]
action: keep
regex: "true"
# Set namespace label
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
# Set pod name label
- source_labels: [__meta_kubernetes_pod_name]
target_label: pod
# Set container name label
- source_labels: [__meta_kubernetes_pod_container_name]
target_label: container
# Set app label from pod label
- source_labels: [__meta_kubernetes_pod_label_app]
target_label: app
pipeline_stages:
# Parse JSON logs
- json:
expressions:
level: level
msg: message
trace_id: trace_id
# Set level as a label for filtering
- labels:
level:
# Drop debug logs in production
- match:
selector: '{level="debug"}'
action: drop
# Extract timestamp from log line
- timestamp:
source: timestamp
format: RFC3339Nano
helm install promtail grafana/promtail -n observability -f values-promtail.yaml
Notice the drop stage for debug logs. In production, debug logs are almost never queried but account for 60-70% of log volume in most services. Dropping them at the agent saves storage, bandwidth, and money.
The keep relabel on logging_enabled is equally important. Opt-in logging means new services don't accidentally flood your pipeline. Add the annotation when you're ready.
# Pod annotation to enable log collection
metadata:
annotations:
logging.enabled: "true"
Label Cardinality: The One Thing That Will Break Loki
Loki's index is label-based. Every unique combination of labels creates a stream. Too many streams and Loki grinds to a halt.
Good labels: namespace, app, container, level — low cardinality, stable values.
Bad labels: user_id, request_id, trace_id, ip_address — high cardinality, creates millions of streams.
# WRONG: This creates a stream per request ID
- source_labels: [__meta_kubernetes_pod_annotation_request_id]
target_label: request_id
# RIGHT: Keep request_id in the log line, not as a label
# Query it with LogQL filters instead:
# {app="api"} |= "request_id=abc123"
If you need to search by trace_id, use a LogQL filter on the log content, not a label. Loki is designed for this — content filtering is fast because chunks are compressed, not indexed.
Useful LogQL Queries
Once data flows, here's how to actually use it.
# All error logs for the api service in the last hour
{app="api", level="error"}
# Error logs containing a specific trace ID
{app="api", level="error"} |= "trace_id=abc123def456"
# Parse JSON and filter by status code
{app="api"} | json | status_code >= 500
# Count errors per minute by service
sum(count_over_time({level="error"}[1m])) by (app)
# Top 10 most frequent error messages
{level="error"} | json | line_format "{{.message}}" | topk(10, count_over_time({level="error"}[1h]))
# Detect log volume spikes (useful for anomaly detection)
sum(rate({namespace="production"}[5m])) by (app) > 2 *
sum(rate({namespace="production"}[5m] offset 1h)) by (app)
Monitoring Loki Itself
Just like any observability component, Loki needs to be monitored. It exposes Prometheus metrics.
# Ingestion rate in bytes per second
sum(rate(loki_distributor_bytes_received_total[5m]))
# Ingestion failures — should be zero
sum(rate(loki_distributor_ingester_append_failures_total[5m]))
# Query latency P99
histogram_quantile(0.99,
sum(rate(loki_request_duration_seconds_bucket{route="loki_api_v1_query_range"}[5m])) by (le)
)
# Active streams count — watch for cardinality explosions
loki_ingester_memory_streams
groups:
- name: loki
rules:
- alert: LokiIngestionFailures
expr: sum(rate(loki_distributor_ingester_append_failures_total[5m])) > 0
for: 5m
labels:
severity: critical
annotations:
summary: "Loki is failing to ingest logs"
runbook: "https://wiki.internal/runbooks/loki-ingestion-failure"
- alert: LokiHighStreamCount
expr: loki_ingester_memory_streams > 100000
for: 15m
labels:
severity: warning
annotations:
summary: "Loki stream count exceeding 100k — check for label cardinality issues"
Retention and Cost Control
The biggest operational win with Loki is controlling what you store.
- Drop debug logs at the agent — covered above with Promtail pipeline stages.
- Set per-tenant retention — different namespaces may have different compliance needs.
- Use lifecycle policies on your object storage — belt and suspenders with Loki's compactor.
- Monitor ingestion rate — set alerts when a service suddenly starts logging 10x more than usual.
Logs are the highest-volume telemetry signal. A single verbose Java service can generate more bytes per day than every metric in your Prometheus instance. Control the volume at the source, not at the storage layer.
The Practical Path
Start with Promtail collecting from a single namespace. Validate that logs arrive in Grafana and queries work. Then expand namespace by namespace, adding pipeline stages to parse and filter as needed.
Loki won't replace Elasticsearch for every use case — full-text search across months of data is still Elasticsearch's strength. But for the 90% of log queries that are "show me errors for this service in the last hour," Loki is faster to operate, cheaper to run, and fits naturally into your existing Grafana and Prometheus ecosystem.
Troubleshooting Common Loki Issues
Loki is simpler than Elasticsearch, but it has its own failure modes. Here's what to check when things go wrong.
Logs Not Appearing in Grafana
Work backwards from Grafana to the source:
# Step 1: Verify Promtail is running and tailing logs
kubectl get pods -n observability -l app.kubernetes.io/name=promtail
kubectl logs <promtail-pod> -n observability --tail=20
# Step 2: Check Promtail targets — are your pods being discovered?
# Port-forward to Promtail's HTTP endpoint
kubectl port-forward -n observability <promtail-pod> 3101:3101
curl -s http://localhost:3101/targets | jq '.[] | select(.labels.app == "your-app")'
# Step 3: Check if Loki is receiving data
curl -s http://loki-gateway.observability/loki/api/v1/labels | jq .
# If your app label isn't listed, Promtail isn't sending data for it
# Step 4: Check for rate limiting
kubectl logs -n observability -l app.kubernetes.io/name=loki-write --tail=50 | grep "rate limit"
Common root causes:
| Symptom | Cause | Fix |
|---|---|---|
| No targets in Promtail | Missing logging.enabled: "true" annotation | Add annotation to pod spec |
| Targets exist but no logs | Promtail can't read log files | Check volume mounts on DaemonSet |
429 Too Many Requests | Per-stream rate limit exceeded | Increase per_stream_rate_limit or reduce log volume |
entry out of order | Timestamps are arriving non-sequentially | Enable unordered_writes: true in Loki config |
The Out-of-Order Entries Problem
By default, Loki rejects log entries that arrive with timestamps older than the most recent entry for that stream. This happens frequently with pods that buffer logs or when Promtail restarts and replays its position file. Enable unordered writes to fix it:
# Add to values-loki.yaml under loki.config
loki:
limits_config:
unordered_writes: true
max_query_length: 721h
This adds a small performance overhead but eliminates the most common source of dropped logs in production.
Multi-Tenant Loki for Team Isolation
When multiple teams share a Loki cluster, tenant isolation prevents noisy services from one team degrading query performance for everyone. Enable multi-tenancy and configure per-tenant limits:
# values-loki.yaml
loki:
auth_enabled: true
limits_config:
# Default limits for all tenants
retention_period: 744h
ingestion_rate_mb: 4
ingestion_burst_size_mb: 8
per_stream_rate_limit: 3MB
runtime_config:
overrides:
# Team with high log volume gets higher limits
platform-team:
ingestion_rate_mb: 20
ingestion_burst_size_mb: 40
per_stream_rate_limit: 10MB
max_query_parallelism: 64
# Team with lower needs gets standard limits
frontend-team:
ingestion_rate_mb: 4
per_stream_rate_limit: 3MB
max_query_parallelism: 16
Configure Promtail to set the tenant ID based on the namespace:
# values-promtail.yaml — add to scrape_configs
pipeline_stages:
- tenant:
source: namespace
Now each namespace's logs are isolated. The platform team's verbose debug logging can't exhaust the frontend team's query budget. Grafana passes the X-Scope-OrgID header to query a specific tenant:
# Query logs for a specific tenant
curl -H "X-Scope-OrgID: platform-team" \
"http://loki-gateway.observability/loki/api/v1/query_range" \
--data-urlencode 'query={app="api", level="error"}' \
--data-urlencode 'start=1711000000000000000' \
--data-urlencode 'end=1711100000000000000'
In Grafana, configure separate Loki data sources per tenant, each with a custom HTTP header for the org ID. This gives teams self-service access to their own logs without stepping on each other.
That's the kind of trade-off an SRE should be making: optimize for the common case, not the edge case.
Related Articles
SRE & Observability Engineer
If it's not measured, it doesn't exist. SLO-driven, metrics-obsessed, and the person who gets paged at 3 AM so you don't have to. Observability isn't optional.
Related Articles
Building a Complete Prometheus + Grafana Monitoring Stack from Scratch
Build a production Prometheus and Grafana monitoring stack from scratch — service discovery, recording rules, alerting, and dashboards.
OpenTelemetry Collector: Deploying Your Observability Pipeline the Right Way
Deploy and configure the OpenTelemetry Collector to unify traces, metrics, and logs into a single pipeline — with production-tested patterns.
Designing Grafana Dashboards That SREs Actually Use
Build Grafana dashboards that surface real signals instead of decorating walls — a structured approach rooted in SRE principles.