Capping VLM spend per CV researcher: hierarchical budgets in practice

TL;DR: Our 11-person CV team at Prophesee was burning through €3-4k weeks of VLM spend on dataset annotation with no idea which researcher caused which spike. We put Bifrost between the labelling scripts and the providers, mapped one virtual key per person with monthly caps, and the receipt-chasing stopped. Took an afternoon to wire up. Real savings came from enforcement, not from clever routing.

So, the thing is, when you let eleven people each script their own VLM annotation passes against three different providers, you get one giant invoice at the end of the month and no idea who is responsible for the €700 Tuesday spike. We hit that exact wall in late February. Two consecutive weeks at around €4k each, and our team lead asked, very politely, over an espresso, whether maybe we should think about it.

The honest answer was: yes, six months ago.

What we were actually doing

Our annotation pipeline labels event-camera frames reconstructed from recordings (indoor robotics, low-light driving, drone footage). For each scene we run a VLM pass to produce captions, bounding box suggestions, and a sanity-check description. The VLM is the teacher; a smaller distilled model is what we deploy on the edge.

The honest answer was: yes, six months ago.

What we were actually doing

Capping VLM spend per CV researcher: hierarchical budgets in practice

Other newsrooms on this story

Capping VLM spend per CV researcher: hierarchical budgets in practice

Other newsrooms on this story

Related reading

Stop paying for idle GPUs in your CI: batching LLM eval jobs

How I Cut My LLM Costs by 90% Without Changing My App Logic

My Hermes agent spent $3 before I noticed. Now it can't.

Prefix caching in vLLM under multi-tenant agent traffic

How one bad prompt burned $40 of my Claude budget in 18 minutes

I burned my Anthropic org cap and waited 3 days. Then I built llmfleet.

Related reading

Stop paying for idle GPUs in your CI: batching LLM eval jobs

How I Cut My LLM Costs by 90% Without Changing My App Logic

My Hermes agent spent $3 before I noticed. Now it can't.

Prefix caching in vLLM under multi-tenant agent traffic

How one bad prompt burned $40 of my Claude budget in 18 minutes

I burned my Anthropic org cap and waited 3 days. Then I built llmfleet.