DevOpsil
MonitoringIntermediate

Observability & SRE

Build a complete observability stack — Prometheus + Grafana, alerting rules, dashboard design, OpenTelemetry, SLOs, and on-call practices that prevent burnout.

Riku TanakaRiku Tanaka6 chapters8 hours
Start Course

Chapters

01

Prometheus + Grafana Monitoring Stack

Deploy a complete monitoring stack from scratch with Prometheus, Grafana, node-exporter, and alertmanager.

Read chapter15 min read
02

Alerting Rules That Work

Write alerting rules that catch real problems without waking you up for noise at 3 AM.

Read chapter9 min read
03

Grafana Dashboard Design

Design dashboards that SREs actually use — layout principles, variable templates, and annotation layers.

Read chapter9 min read
04

OpenTelemetry Collector

Deploy the OpenTelemetry Collector as your unified observability pipeline for traces, metrics, and logs.

Read chapter8 min read
05

SLOs and Error Budgets

Define Service Level Objectives, calculate error budgets, and use them to balance reliability with velocity.

Read chapter9 min read
06

On-Call Practices That Prevent Burnout

Build sustainable on-call rotations with proper handoffs, escalation policies, and compensation.

Read chapter10 min read