Observability & SRE
Build a complete observability stack — Prometheus + Grafana, alerting rules, dashboard design, OpenTelemetry, SLOs, and on-call practices that prevent burnout.
Start CourseChapters
Prometheus + Grafana Monitoring Stack
Deploy a complete monitoring stack from scratch with Prometheus, Grafana, node-exporter, and alertmanager.
Alerting Rules That Work
Write alerting rules that catch real problems without waking you up for noise at 3 AM.
Grafana Dashboard Design
Design dashboards that SREs actually use — layout principles, variable templates, and annotation layers.
OpenTelemetry Collector
Deploy the OpenTelemetry Collector as your unified observability pipeline for traces, metrics, and logs.
SLOs and Error Budgets
Define Service Level Objectives, calculate error budgets, and use them to balance reliability with velocity.
On-Call Practices That Prevent Burnout
Build sustainable on-call rotations with proper handoffs, escalation policies, and compensation.