Hedge fund Bridgewater and Thinking Machines Lab say a fine-tuned open-weight model outperforms the strongest AI models at evaluating financial documents, at a fraction of the cost. The numbers come from their own internal evaluation.

Investors get buried in news, analysis, corporate filings, and emails every day. According to a report from Bridgewater's AIA Labs and Thinking Machines Lab, the startup founded by former OpenAI CTO Mira Murati, reading isn't the real work. The real work is the constant stream of small, repeated judgment calls about what actually matters. That's the triage the researchers wanted to automate.

They defined six tasks drawn from an investor's daily routine. One example: deciding whether a financial article is relevant to an executive. Another: whether a central bank document signals the direction of future rate changes. For investors, these calls are trivial, but they can barely put their reasoning into words. The report gives a telling example. A headline about Trump's claim to Greenland gets flagged as irrelevant, while Trump's threat of new China tariffs is highly relevant. Both touch on geopolitics and finance.

Frontier models failed in the authors' tests. Variants of Gemini, Claude, and GPT hit only about 50 percent accuracy with a basic prompt. Expert-written instructions and a three-tier rating system ("relevant and interesting," "relevant but uninteresting," "irrelevant") pushed accuracy into the mid-70s. That still fell short of the 80 percent threshold the authors set for trustworthy deployment.