SLOs & Incident Response
From SLIs to Postmortems
Define meaningful SLOs, implement error budgets, and build systematic incident response workflows. Includes hands-on simulated incidents with real troubleshooting.
What You'll Achieve
Define Meaningful SLOs
Implement Burn Rate Alerts
Systematic Incident Response
Blameless Postmortems
Who This Track Is For
Designed for professionals ready to level up their observability expertise
SRE teams implementing SLO-based practices
On-call engineers improving incident response
Engineering managers building reliability culture
Anyone responsible for production reliability
Prerequisites
What You'll Learn
A structured progression through key topics, with hands-on labs at every step
- SLIs, SLOs, and SLAs explained
- Choosing the right SLIs for your service
- Error budget policies
- Sloth SLO generator workshop
- Burn rate alerting
- RED method for services
- USE method for resources
- Signal correlation for debugging
- Incident response framework
- Communication during incidents
- Blameless postmortems
- Runbook best practices
What You'll Be Able To Do
Practical skills you can apply immediately in your work
Define Meaningful SLOs
Translate business requirements into SLIs, SLOs, and error budget policies
Implement Burn Rate Alerts
Create multi-window burn rate alerts that balance speed and accuracy
Systematic Incident Response
Follow RED/USE methods for rapid root cause identification
Blameless Postmortems
Conduct effective postmortems and create actionable runbooks
Team Training
Customized to your team's needs
Explore Other Tracks
Continue your observability journey with complementary training
Observability Foundations
Your Entry Point to Modern Observability
Master the three pillars of observability (metrics, logs, traces) with hands-on OpenTelemetry instrumentation. Build production-ready dashboards and understand how signals correlate.
Grafana Stack Deep Dive
Master the Complete LGTM Stack
Go beyond basics with advanced PromQL, LogQL, and TraceQL. Learn production patterns for recording rules, alerting, cost optimization, and scaling the Grafana stack.
Kubernetes Observability
Full-Stack K8s Monitoring
Deploy complete observability for Kubernetes clusters. From kube-state-metrics to custom ServiceMonitors, build production-ready monitoring for your K8s infrastructure.