Good morning. Rarely does a 29-page scholarly paper merit the attention of top-level executives, but every business leader should be familiar with a recent study from OpenAI. It’s the best description yet of how AI can handle real-world tasks, showing which AI models are excelling, and hinting at what it all means for humans in the years ahead. The paper can be heavy going, but you can get a masterful summary from our AI Editor, Jeremy Kahn.
For leaders, three points stand out:
The study is highly realistic. It examined 44 occupations and 1,320 specialized tasks required by those occupations. For example: the final testing step in manufacturing a cable spooling truck for underground mining operations. Appropriate professionals (average experience: 14 years) vetted the tasks, all of which are elements of actual work deliverables. Previous research has almost always focused on less realistic tests. The AI results were graded by expert humans who didn’t know if they were looking at work from AI or from an expert human professional.
The best models are already nearly as good as human industry experts. The study examined seven AI models from Open AI, Google’s Gemini, xAI’s Grok, and Anthropic’s Claude. The clear winner was Claude Opus 4.1, which came within a few percentage points of reaching parity with human industry experts. The best models also completed tasks about 100 times faster and 100 times cheaper than the industry experts, though the comparisons ignore “the human oversight, iteration, and integration steps required in real workplace settings,” OpenAI says.






