A while back I wrote about my terminal-inspired portfolio and the products it indexes. Two of those products lean on a language model: the portfolio terminal at smngvlkz.com that you can ask questions, and PayChasers, which generates OPTIONAL payment follow-up emails. Both started on Google's Gemini 3 Flash. Both now run on a model I host myself, with a fallback chain that keeps them alive when my hardware is not.

This is the story of that move. The experiment that started it, why I committed to it, what the architecture looks like, the night it broke, and the parts I still have not solved.

It started as an experiment

When Qwen 3.5 was announced, it made me curious about how far open models have actually come. Instead of reading benchmarks, I tested it the way I like to learn things, by running it.

It began as a small experiment on my base Mac mini. I pulled Qwen through Ollama just to see how capable the model would be running directly on a local machine. The results were far better than I expected. Good enough that I stopped thinking of it as a toy and started thinking about production.