Most language models today are built around the Transformer paradigm.

That makes sense.

Transformers work.

They scale.

They dominate modern NLP.