Orthrus: Parallel Token Generation That Doesn't Change Your Model's Output
Orthrus injects diffusion attention into each layer of a frozen autoregressive Transformer to generate 32 tokens in parallel — without altering the base model's output distribution.