My wife tracks her meals, and I watched her type "buckwheat, boiled, 100 g" into a calorie app for the hundredth time. Search, scroll, pick the wrong entry, fix the grams. Every meal, every day. At some point it's easier to teach a vision model to look at the plate.
So I built a Telegram bot. You send a photo of your food, it identifies the dishes, estimates portion weights, and replies with a card: calories, protein, fat, carbs. Text and voice work too ("2 eggs and a toast").
The borscht incident
The first version was hilariously confident about wrong answers. Borscht — a red beet soup, if you've never met one — came back as "berry compote" (a sweet berry drink). Red liquid in a bowl, what else could it be?
Adding more example dishes to the prompt made it worse: the model just got magnetized to whatever was on the list. A cod fillet became "syrniki" (cottage cheese pancakes) because syrniki were mentioned and both are pale and pan-fried.






