I charge per invoice extracted. One upload of a five-invoice PDF should cost five credits, a failed extraction should cost nothing, and a user with one credit left must never get two. That last rule is where it got interesting.

This is the billing design behind GSTExtract — a tool that reads Indian GST invoice PDFs into Excel — and the concurrency bug a second-pass audit found in it. Writing it up because "meter a paid API correctly" is one of those problems that looks trivial until you hold it up to the light.

The shape of the problem

Each extraction calls a vision model, which costs me real money. So the user spends a credit per invoice. The catch: I don't know how many invoices are in a PDF until after the model reads it. A single page can hold three invoices; a "10-page" file might be one. I can't charge up front because I don't know the count, and I can't charge after without a window where a user with a zero balance has already burned my API budget.

The answer most billing systems land on is reserve → settle → refund: