Interactive demonstration of Datadog's journey from unsupervised to supervised learning
Overview
Unsupervised Approach
Supervised Learning
Weak Supervision
Results & Comparison
🎯 The Challenge
According to Google SRE, deployments account for approximately 70% of incidents. This demo explores how Datadog developed an automated system to detect faulty deployments using machine learning.
📊 Key Challenges
No Labels
Lack of ground truth data
Data Imbalance
Faulty deployments are rare
Diversity
Different application profiles
Time Pressure
Need for quick detection
🔬 Definition of Faulty Deployment
A deployment is considered faulty if it exhibits:
Impact: Significant increase in error rate relative to baseline
Temporal Correlation: Error increase aligns with deployment timing
Persistence: Increased error rate is sustained over time
🛤️ Evolution Path
The project evolved through three main phases:
Unsupervised Approach: Rule-based statistical checks with iterative refinement
Supervised Learning: Sequential models trained on ensemble outputs
Weak Supervision: Improved label quality through multiple weak signals
📈 Simulate Deployment Monitoring
50%
2%
🔍 Statistical Checks
-
Impact Check
-
Temporal Check
-
Persistence Check
-
Final Decision
⚖️ Iterative Framework
The unsupervised approach uses unanimous voting: all checks must pass for a deployment to be flagged as faulty.
Simulate deployments to see history
⏱️ Sequential Model Approach
Train models to predict 60-minute ensemble results using early data (10 and 20 minutes).
🎯 Model Performance
10-Minute Model
Coverage: 21.5%
Trade-off: High precision, low recall
20-Minute Model
Coverage: 25.9%
Trade-off: Balanced precision/recall
60-Minute Model
Coverage: 62.9%
Trade-off: High recall, slower detection
🔄 Training Process
The supervised models use features from statistical checks computed at early timestamps to predict the final ensemble decision. This approach allows for faster detection while maintaining accuracy.
🏷️ Weak Supervision Framework
Instead of manual labeling, we use multiple weak signals to generate high-quality labels automatically.