Your model got smarter.

But suddenly it got slower.

Why does increasing context length explode compute?

Because attention is O(n²).

And that becomes the real bottleneck in modern LLMs.