C++ source code is written in order. That does not mean the processor executes it in order.
This is the first correction. It is also the one many performance discussions manage to avoid.
Modern high-performance cores use out-of-order execution. They accept a sequential instruction stream, break it into internal operations, rename registers, place work into scheduling structures, execute ready operations early, and then retire the results in program order. The machine preserves the visible behavior of sequential execution. Internally, it is not taking attendance line by line.
For ordinary software, this is mostly invisible. For C++ intended to run in tens of nanoseconds, it is not invisible. At that scale, performance is not just about the number of instructions. It is about whether those instructions can be scheduled in parallel or whether the program quietly built a dependency chain and then acted surprised.
The processor is a dependency scheduler







