I want to share a concrete example of using an LLM to automate a manual process in my workflow. Not chatbot stuff. An actual pipeline step that used to require a human sitting with two audio tracks for hours.

The problem

I build live speech-to-speech translation. To measure latency, I need to know which phrase in the source audio corresponds to which phrase in the translated audio, so I can measure the time gap between them. This alignment used to be done by hand. A person listens to both tracks, matches up the phrases, and logs timestamps. For a 6-minute session that's easily an afternoon of work.

The hard part isn't the math. It's the alignment. Languages reorder things. German puts verbs at the end. Arabic restructures sentences. A Spanish phrase at position 3 might map to an English phrase at position 7.

Where the LLM fits in