How Microsoft's ARTIST framework uses outcome-based RL to train LLMs that interleave tool calls inside reasoning chains — no step supervision required.