TOTO Model Demonstration

Interactive visualization of Datadog's Time-Series-Optimized Transformer for Observability metrics. Explore zero-shot forecasting, probabilistic predictions, and multi-variate time series analysis.

TOTO Model Stats

2T+ Training Data Points
Zero-Shot Forecasting
Multi-Variate Support
Student-T Mixture Model

🔮 Zero-Shot Forecasting

TOTO can forecast time series without fine-tuning on your specific data. Click on the chart to add anomalies!

168 hours
95%
MAE 2.34
MAPE 4.7%
Coverage 94.2%
Historical Data
TOTO Forecast
Uncertainty Band

📊 Multi-Variate Analysis

TOTO efficiently processes multiple variables using Proportional Factorized Space-Time Attention. Hover over lines to see correlations!

4 variables
50%

Correlation Matrix

CPU Usage
Memory Usage
Network I/O
Disk I/O

🎮 Interactive Model Playground

Experiment with TOTO's architecture parameters and see how they affect performance.

Model Parameters

8 heads

Performance Metrics

Inference Speed
1.2s
Memory Usage
2.1GB
Accuracy
88.3%

Model Architecture Visualization

Input Layer
Multi-Head Attention
Output Layer
🧠

Foundation Model

Pre-trained on 2+ trillion time series data points, the largest dataset for any open-weights time series model.

🎯

Observability Focus

Specifically designed for monitoring metrics covering infrastructure, networking, databases, and applications.

📈

Probabilistic Predictions

Generates both point forecasts and uncertainty estimates using a Student-T mixture model.

âš¡

High Performance

State-of-the-art results on GIFT-Eval benchmark and custom BOOM observability dataset.

🔧

Decoder-Only Architecture

Supports variable prediction horizons and context lengths for flexible forecasting.

🚀

Production Ready

Available on Hugging Face with optimizations for xformers and flash-attention.