I Thought AI Was Slow Because It Wasn't Smart Enough. Turns Out It's Exhausted From Carrying Things.

I've been working on a question lately: can an AI run on a small local device without depending on...

mercoledì 27 maggio 2026 New tab

744 words~3 min read

I've been working on a question lately: can an AI run on a small local device without depending on the cloud?

I dug through a lot of material, and then one number stopped me cold.

A 7B parameter model needs to move roughly 14GB of weight data from memory to the compute unit every time it generates a single token. GPU memory bandwidth is around 2TB/s. Do the math: that's theoretically only 140 tokens per second — and in practice, even less.

I sat with that for a moment.

It's not that the compute isn't fast enough. It's that the carrying is too slow.

I Thought AI Was Slow Because It Wasn't Smart Enough. Turns Out It's Exhausted From Carrying Things.

I Thought AI Was Slow Because It Wasn't Smart Enough. Turns Out It's Exhausted From Carrying Things.

Other newsrooms on this story

Related reading

Ditching the cloud for local AI — how I use two mini PCs to process millions of…

Why Your Next AI Tool Might Be Bottlenecked by the Wrong Chip

How I Run a 50-Agent AI Workforce on a Single 6GB GPU

Does AI Know How Many Tokens It Is Burning

My Home AI's First Reply Took Four Minutes. Now It Takes Eleven Seconds.

Colibrì proof-of-concept gains frontier-level 1.5-TB AI model — novel approach…

Other newsrooms on this story

Related reading

Ditching the cloud for local AI — how I use two mini PCs to process millions of…

Why Your Next AI Tool Might Be Bottlenecked by the Wrong Chip

How I Run a 50-Agent AI Workforce on a Single 6GB GPU

Does AI Know How Many Tokens It Is Burning

My Home AI's First Reply Took Four Minutes. Now It Takes Eleven Seconds.

Colibrì proof-of-concept gains frontier-level 1.5-TB AI model — novel approach…