When people first hear about Transformers, they often encounter words like Query, Key, Value, and Attention Heads and feel confused.

But the main idea of attention is actually simple.

Attention answers one question:

While processing one word, which other words should the model pay attention to?

Why Was Attention Needed?