How I Replaced Gemini with a Self-Hosted LLM for Two Production Apps

A while back I wrote about my terminal-inspired portfolio and the products it indexes. Two of those products lean on a language model: the portfolio terminal at smngvlkz.com that you can ask questions, and PayChasers, which generates OPTIONAL payment follow-up emails. Both started on Google's Gemini 3 Flash. Both now run on a model I host myself, with a fallback chain that keeps them alive when my hardware is not.

This is the story of that move. The experiment that started it, why I committed to it, what the architecture looks like, the night it broke, and the parts I still have not solved.

It started as an experiment

When Qwen 3.5 was announced, it made me curious about how far open models have actually come. Instead of reading benchmarks, I tested it the way I like to learn things, by running it.

It began as a small experiment on my base Mac mini. I pulled Qwen through Ollama just to see how capable the model would be running directly on a local machine. The results were far better than I expected. Good enough that I stopped thinking of it as a toy and started thinking about production.

This is the story of that move. The experiment that started it, why I committed to it, what the architecture looks like, the night it broke, and the parts I still have not solved.

It started as an experiment

When Qwen 3.5 was announced, it made me curious about how far open models have actually come. Instead of reading benchmarks, I tested it the way I like to learn things, by running it.

How I Replaced Gemini with a Self-Hosted LLM for Two Production Apps

How I Replaced Gemini with a Self-Hosted LLM for Two Production Apps

Other newsrooms on this story

Related reading

How I Let an AI Refactor My Whole Codebase (Using Gemini 3.5)

The Google I/O 2026 announcement that quietly broke my cost spreadsheet:…

Self-hosted low-code + open LLMs (DeepSeek/Qwen/GLM): real enterprise apps in 5…

Building an agentic PR reviewer with Antigravity SDK

Headline: Leveling up my dev workflow with Gemini Canvas.

Google’s Gemini Mac App Is Native, in a Distinctly Google Way, But Annoyingly…

Other newsrooms on this story

Related reading

How I Let an AI Refactor My Whole Codebase (Using Gemini 3.5)

The Google I/O 2026 announcement that quietly broke my cost spreadsheet:…

Self-hosted low-code + open LLMs (DeepSeek/Qwen/GLM): real enterprise apps in 5…

Building an agentic PR reviewer with Antigravity SDK

Headline: Leveling up my dev workflow with Gemini Canvas.

Google’s Gemini Mac App Is Native, in a Distinctly Google Way, But Annoyingly…