When I started building Hoovik — a distributed video conferencing platform — I expected WebRTC signaling and transcription pipelines to be the hardest problems.
They weren’t.
The real engineering challenge was building a production-ready real-time multimodal emotion inference engine capable of processing live video meetings under strict latency constraints.
Unlike offline ML systems, live meeting environments are unstable by default:
microphones get muted













