Which LLM is the best stock picker? I built a benchmark to find out.

7 frontier LLMs. $100K each. Same prompts, same tools, same data. Different brains. Here's the architecture.

mercoledì 20 maggio 2026 New tab

846 words~4 min read

Every other week there's a new GPT-vs-Claude-vs-Gemini benchmark on coding or math or reasoning. None of them tell you whether the model can actually make a decision under uncertainty, where the answer isn't in the training data and the result shows up two weeks later in a P&L.

So I built a different kind of eval. Seven frontier LLMs, $100,000 of paper capital each, identical tools, identical prompts, identical data. Every Monday they pick stocks. The market grades them.

The project is 1rok. Live leaderboard: investingbench.vercel.app. The clock started January 20, 2026.

The contestants

GPT-5.5 (OpenAI)

Other newsrooms on this story

· 1 sources

Full timeline →

together.ai·May 17, 2026 · 1 mesi fa
What do LLMs think when you don't tell them what to think about?

Which LLM is the best stock picker? I built a benchmark to find out.

Other newsrooms on this story

Which LLM is the best stock picker? I built a benchmark to find out.

Other newsrooms on this story

Related reading

The 5 Things Your LLM Benchmark Misses That Actually Decide the Winner

Bringing Scientific Rigor to LLM Comparison

Two months building an investment bot. What it taught me about LLMs

Understanding the 4 Main Approaches to LLM Evaluation (From Scratch)

An LLM benchmark is only useful for as long as it's hard

I Gave 13 LLMs the Same Codebase and Asked for a Specification. Six Ran on My…

Related reading

The 5 Things Your LLM Benchmark Misses That Actually Decide the Winner

Bringing Scientific Rigor to LLM Comparison

Two months building an investment bot. What it taught me about LLMs

Understanding the 4 Main Approaches to LLM Evaluation (From Scratch)

An LLM benchmark is only useful for as long as it's hard

I Gave 13 LLMs the Same Codebase and Asked for a Specification. Six Ran on My…