Most video processing is a batch job. You upload a file, a pipeline chews through it, and minutes or hours later you get an output. That model breaks completely when the goal is to publish a highlight while the match is still being played. Live sports highlight generation is one of the clearest examples of an AI workload where the architecture, not just the model, is the hard part.

The constraint that changes everything

In a batch pipeline, latency is a convenience. In a live pipeline, latency is the product. If a goal goes in and the clip is not on social within a minute or two, the moment is gone. That single constraint forces a different design at every layer.

Streaming ingestion, not file uploads

A live system taps the broadcast over RTMP or HLS and processes it as a continuous stream, frame by frame, rather than waiting for a finished file. You are running inference on an open-ended input with no end-of-file to wait for.