Back to Articles
Introduction From Offline Video QA to Streaming Understanding Streaming Input: Turning Video into Model State Two-Layer Memory for Online Interaction Observe First, Ask Later Engineering Value: Continuous Understanding at the Edge Published on June 26, 2026
Introduction
Most video models start watching only after a user asks a question. Real devices do not work that way: cameras keep recording, robots keep moving, screens keep changing, and queries can arrive at any time. A model for these scenarios should not wait until a query arrives before it begins to understand the scene.
VLX-Flow is designed for this online setting. It processes video as a sequence of streaming chunks, updates model memory incrementally, and answers from the maintained state instead of reprocessing the entire video history for every interaction.






