Back to Articles
MolmoMotion: Under the hood Introducing MolmoMotion-1M and PointMotionBench Experiments and performance 3D motion forecasting Downstream evaluation: robotics planning Downstream evaluation: video generation Limitations and what's next 🧠 Models: https://huggingface.co/collections/allenai/molmomotion | 📄 Tech Report: https://allenai.org/papers/molmomotion | 📊 Data: https://huggingface.co/datasets/allenai/molmo-motion-1m | 💻 Code: https://github.com/allenai/molmo-motion.git | 🌐 Project Page: https://molmomotion.github.io/
Machines have become remarkably good at perceiving motion. Given a video, modern models can track how objects and points move through a scene with exceptionally high confidence. But perception is inherently retrospective: it explains motion that has already happened. Many of the systems and applications we want to build need to look forward instead. A robot reaching for a cup has to anticipate how the cup will move before it touches it. A video generator has to know what realistic motion comes next if it's going to produce physically plausible frames.
Predicting motion is harder than observing it, but it's also far more useful in many scenarios.
This idea was the motivation behind MolmoMotion, a new motion forecasting model we're releasing today. Given a video frame, 3D points marked on an object, and written instructions describing the intended action (e.g., “Move and rotate the wooden bowl with fruit on the table”), MolmoMotion predicts where those points will move over the next few seconds in 3D space—achieving substantially stronger performance than existing forecasting methods.






