My CV is a PDF, and PDFs do not answer questions. So I built ask.hiten.dev: a streaming chat grounded in my actual career history, where a recruiter can ask "why should I hire you over another senior frontend engineer?" and get a real answer.
The constraint that made it interesting: the total inference budget is zero. No OpenAI bill, no hosted vector DB, nothing. Here is what that actually took.
Four free providers and a failover chain
No single free tier is reliable enough to put in front of strangers. Groq's free tier caps at 100k tokens/day, and I hit that cap on day one. OpenRouter's free models come and go. Cerebras occasionally queues you out at busy times.
The fix is boring and effective: an ordered provider chain, all OpenAI-compatible, walked per-request until one answers.






