PromQL: Cheat Sheet

Selectors & Matchers

http_requests_total{method="GET"}           # Exact match
http_requests_total{handler=~"/api/.*"}     # Regex match
http_requests_total{status!="200"}          # Negative match
http_requests_total{method!~"OPTIONS|HEAD"} # Negative regex

Rates & Counters

rate(http_requests_total[5m])          # Per-second rate over 5m (use for counters)
irate(http_requests_total[5m])         # Instant rate (last two points)
increase(http_requests_total[1h])      # Absolute increase over 1h
delta(temperature_celsius[5m])         # Change in gauge over 5m

Aggregations

sum(rate(http_requests_total[5m]))                          # Total rate
sum by (status) (rate(http_requests_total[5m]))             # Group by status
avg by (instance) (rate(node_cpu_seconds_total{mode!="idle"}[5m]))
topk(5, container_memory_usage_bytes)                       # Top 5 by memory
count(up == 1)                                              # Count targets up

Operator	Description
`sum`	Total across series
`avg`	Arithmetic mean
`min` / `max`	Minimum / maximum value
`count`	Number of series
`topk` / `bottomk`	Top/bottom K series
`quantile`	Quantile across series

Histograms

# 99th percentile request duration
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))

# 95th percentile grouped by handler
histogram_quantile(0.95,
  sum by (le, handler) (rate(http_request_duration_seconds_bucket[5m])))

# Average duration from histogram
rate(http_request_duration_seconds_sum[5m])
  / rate(http_request_duration_seconds_count[5m])

Binary Operations

# Error ratio (5xx / total)
sum(rate(http_requests_total{status=~"5.."}[5m]))
  / sum(rate(http_requests_total[5m]))

# Disk usage percentage
1 - (node_filesystem_avail_bytes / node_filesystem_size_bytes)

# Filter series above threshold
rate(http_requests_total[5m]) > 100

Common Alert Expressions

# Error rate above 5%
sum(rate(http_requests_total{status=~"5.."}[5m]))
  / sum(rate(http_requests_total[5m])) > 0.05

# Instance down
up == 0

# Disk fills in 4 hours
predict_linear(node_filesystem_avail_bytes[1h], 4 * 3600) < 0

# Memory above 90%
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) > 0.9

# Pod restarting frequently
increase(kube_pod_container_status_restarts_total[1h]) > 5

Useful Functions

Function	Purpose
`rate(m[5m])`	Per-second rate of counter
`increase(m[1h])`	Total increase of counter
`histogram_quantile(0.99, ...)`	Percentile from histogram
`predict_linear(m[1h], 3600)`	Linear extrapolation
`absent(up{job="api"})`	Returns 1 if no series exist
`label_replace(...)`	Rewrite labels
`avg_over_time(m[1h])`	Average of gauge over time
`max_over_time(m[6h])`	Max of gauge over time

Subqueries

# Max of 5m rate, over 1h at 1m resolution
max_over_time(rate(http_requests_total[5m])[1h:1m])

# Average uptime over 24 hours
avg_over_time(up[24h])

MonitoringDeep DiveIntermediateNeeds Review

Building a Complete Prometheus + Grafana Monitoring Stack from Scratch

Build a production Prometheus and Grafana monitoring stack from scratch — service discovery, recording rules, alerting, and dashboards.

Riku Tanaka·Mar 23, 2026

15 min read

MonitoringTutorialIntermediateNeeds Review

Prometheus Recording Rules: Fix Your Query Performance Before It Breaks Grafana

Use Prometheus recording rules to pre-compute expensive queries, speed up dashboards, and make SLO calculations reliable at scale.

Riku Tanaka·Mar 22, 2026

10 min read

MonitoringTutorialIntermediateNeeds Review

Prometheus Alerting Rules That Don't Wake You Up for Nothing

Design Prometheus alerting rules that catch real incidents and ignore noise — practical patterns from years of on-call experience.

Riku Tanaka·Mar 20, 2026

9 min read

MonitoringTutorialIntermediateNeeds Review

Designing Grafana Dashboards That SREs Actually Use

Build Grafana dashboards that surface real signals instead of decorating walls — a structured approach rooted in SRE principles.

Riku Tanaka·Mar 20, 2026

9 min read

MonitoringTutorialIntermediateNeeds Review

Implementing SLOs and Error Budgets From Scratch

A step-by-step guide to implementing SLOs and error budgets using Prometheus — from defining SLIs to building burn-rate alerts.

Riku Tanaka·Mar 20, 2026

9 min read

MonitoringTutorialBeginner

Prometheus Scrape Target Down: Diagnosing And Fixing "connection Refused" Errors Step By Step

If you've spent any time with Prometheus, you've seen it. That red `DOWN` label in the Targets page, accompanied by the dreaded `connection refused` error....

Muhammad Hassan·Apr 28, 2026

8 min read

On this page

PromQL: Cheat Sheet

Selectors & Matchers

Rates & Counters

Aggregations

Histograms

Binary Operations

Common Alert Expressions

Useful Functions

Subqueries

Related Articles

Building a Complete Prometheus + Grafana Monitoring Stack from Scratch

Prometheus Recording Rules: Fix Your Query Performance Before It Breaks Grafana

Prometheus Alerting Rules That Don't Wake You Up for Nothing

Designing Grafana Dashboards That SREs Actually Use

Implementing SLOs and Error Budgets From Scratch

Prometheus Scrape Target Down: Diagnosing And Fixing "connection Refused" Errors Step By Step

More in Monitoring

Distributed Tracing With Jaeger: Pinpointing Latency Bottlenecks In Microservices

DNS Troubleshooting for DevOps: dig, nslookup, and Common Failures

Elasticsearch Cluster Sizing for Production: Nodes, Shards, and Memory

Scalable Log Aggregation with Grafana Loki and Promtail

Discussion