DevOpsil
Monitoring
80%
Fresh

PromQL: Cheat Sheet

Riku TanakaRiku Tanaka2 min read

Selectors & Matchers

http_requests_total{method="GET"}           # Exact match
http_requests_total{handler=~"/api/.*"}     # Regex match
http_requests_total{status!="200"}          # Negative match
http_requests_total{method!~"OPTIONS|HEAD"} # Negative regex

Rates & Counters

rate(http_requests_total[5m])          # Per-second rate over 5m (use for counters)
irate(http_requests_total[5m])         # Instant rate (last two points)
increase(http_requests_total[1h])      # Absolute increase over 1h
delta(temperature_celsius[5m])         # Change in gauge over 5m

Aggregations

sum(rate(http_requests_total[5m]))                          # Total rate
sum by (status) (rate(http_requests_total[5m]))             # Group by status
avg by (instance) (rate(node_cpu_seconds_total{mode!="idle"}[5m]))
topk(5, container_memory_usage_bytes)                       # Top 5 by memory
count(up == 1)                                              # Count targets up
OperatorDescription
sumTotal across series
avgArithmetic mean
min / maxMinimum / maximum value
countNumber of series
topk / bottomkTop/bottom K series
quantileQuantile across series

Histograms

# 99th percentile request duration
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))

# 95th percentile grouped by handler
histogram_quantile(0.95,
  sum by (le, handler) (rate(http_request_duration_seconds_bucket[5m])))

# Average duration from histogram
rate(http_request_duration_seconds_sum[5m])
  / rate(http_request_duration_seconds_count[5m])

Binary Operations

# Error ratio (5xx / total)
sum(rate(http_requests_total{status=~"5.."}[5m]))
  / sum(rate(http_requests_total[5m]))

# Disk usage percentage
1 - (node_filesystem_avail_bytes / node_filesystem_size_bytes)

# Filter series above threshold
rate(http_requests_total[5m]) > 100

Common Alert Expressions

# Error rate above 5%
sum(rate(http_requests_total{status=~"5.."}[5m]))
  / sum(rate(http_requests_total[5m])) > 0.05

# Instance down
up == 0

# Disk fills in 4 hours
predict_linear(node_filesystem_avail_bytes[1h], 4 * 3600) < 0

# Memory above 90%
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) > 0.9

# Pod restarting frequently
increase(kube_pod_container_status_restarts_total[1h]) > 5

Useful Functions

FunctionPurpose
rate(m[5m])Per-second rate of counter
increase(m[1h])Total increase of counter
histogram_quantile(0.99, ...)Percentile from histogram
predict_linear(m[1h], 3600)Linear extrapolation
absent(up{job="api"})Returns 1 if no series exist
label_replace(...)Rewrite labels
avg_over_time(m[1h])Average of gauge over time
max_over_time(m[6h])Max of gauge over time

Subqueries

# Max of 5m rate, over 1h at 1m resolution
max_over_time(rate(http_requests_total[5m])[1h:1m])

# Average uptime over 24 hours
avg_over_time(up[24h])
Share:
Riku Tanaka
Riku Tanaka

SRE & Observability Engineer

If it's not measured, it doesn't exist. SLO-driven, metrics-obsessed, and the person who gets paged at 3 AM so you don't have to. Observability isn't optional.

Related Articles