Today we’re releasing Toto 2, a family of open-weights time series forecasting models, on Hugging Face. Spanning 4m to 2.5B parameters, Toto 2.0 is designed to answer a simple and open question: Can time series foundation models (TSFMs) improve as they scale? Our results show they can. The highlights:
Scaling that works. Every size improves on the one below it, with no sign of saturation at 2.5B. Best in class on every benchmark we tested. Toto 2.0 takes the top spots on BOOM (Datadog’s observability forecasting benchmark), GIFT-Eval (the standard general-purpose benchmark), and TIME (a new contamination-resistant zero-shot benchmark).A generational jump from Toto 1.0. Toto 2.0 is 7× more parameter-efficient at matching quality and dramatically faster at inference time.Trained on observability and synthetic data, generalizes broadly. Toto 2.0 does not see any public forecasting data during pretraining, yet leads the field on general-purpose benchmarks.
CRPS Rank vs. parameter count on BOOM (left) and GIFT-Eval (right) for top foundation models; lower is better. The Pareto frontier traces the best CRPS rank achievable at each parameter budget—points on or near it represent the best quality-for-size tradeoff available. Every Toto 2.0 size sits on or near the frontier on both benchmarks, and CRPS rank improves monotonically with model size across the family. CRPS Rank vs. parameter count on BOOM (left) and GIFT-Eval (right) for top foundation models; lower is better. The Pareto frontier traces the best CRPS rank achievable at each parameter budget—points on or near it represent the best quality-for-size tradeoff available. Every Toto 2.0 size sits on or near the frontier on both benchmarks, and CRPS rank improves monotonically with model size across the family.












