MonitoringIntermediate

Observability & SRE

Build a complete observability stack — Prometheus + Grafana, alerting rules, dashboard design, OpenTelemetry, SLOs, and on-call practices that prevent burnout.

Riku Tanaka6 chapters8 hours

Start Course

Chapters

Prometheus + Grafana Monitoring Stack

Deploy a complete monitoring stack from scratch with Prometheus, Grafana, node-exporter, and alertmanager.

Read chapter15 min read

Alerting Rules That Work

Write alerting rules that catch real problems without waking you up for noise at 3 AM.

Read chapter9 min read

Grafana Dashboard Design

Design dashboards that SREs actually use — layout principles, variable templates, and annotation layers.

Read chapter9 min read

OpenTelemetry Collector

Deploy the OpenTelemetry Collector as your unified observability pipeline for traces, metrics, and logs.

Read chapter8 min read

SLOs and Error Budgets

Define Service Level Objectives, calculate error budgets, and use them to balance reliability with velocity.

Read chapter9 min read

On-Call Practices That Prevent Burnout

Build sustainable on-call rotations with proper handoffs, escalation policies, and compensation.

Read chapter10 min read