A while ago I tried to build a local coding assistant. I downloaded Qwen3, fired it up on my MacBook with 16GB of RAM, and within a day realized the output quality was nowhere close to Claude or GPT-5. The model could fit. It just couldn't compete.
So I changed the question.
If I can't make the model smarter on my hardware, can I make what I feed it smarter?
Where the tokens actually go
I started watching where my Claude / Cursor / Copilot sessions actually spent their tokens. The surprise: most of it wasn't reasoning. It was lookup.







