The AI data gold rush is here and Corporate America is ready

Photo by Sean Gallup/Getty Images

The era of free AI training data is over. Reddit $RDDT +6.44% charges millions for API access. The New York Times sued. Publishers are blocking scrapers. Even if AI companies could still vacuum up the public internet, they're running into a bigger problem: they need different kinds of data entirely for the next leap in abilities.

Large language models were built by scraping text and images from the web. But as AI systems move beyond chatbots, they need training data that was never publicly available in the first place. Data that's locked away, or scattered, or doesn't even exist yet.

New markets are emerging to unlock these sources. Here are three.

Most people think of personal data as Social Security numbers and health records. But nearly everything you do online generates data that platforms collect and use — your Spotify $SPOT -2.46% listening history, your email patterns, the documents you write in Google $GOOGL +0.58% Docs, your conversations with ChatGPT.

Photo by Sean Gallup/Getty Images

New markets are emerging to unlock these sources. Here are three.

The AI data gold rush is here and Corporate America is ready

The AI data gold rush is here and Corporate America is ready

Other newsrooms on this story

Related reading

The price of AI training data, from $5M to $250M

AI Is Moving From Gold Rush To Utility

From CLOs to ATMs, Wall Street Finds More Ways to Fund AI

AI safety tip: if you don’t want it giving bioweapon instructions, maybe don’t…

What Happens When The Industry Runs Out Of Data?

AI’s free-for-all phase may be coming to an end—as companies start counting the…

Other newsrooms on this story

Related reading

The price of AI training data, from $5M to $250M

AI Is Moving From Gold Rush To Utility

From CLOs to ATMs, Wall Street Finds More Ways to Fund AI

AI safety tip: if you don’t want it giving bioweapon instructions, maybe don’t…

What Happens When The Industry Runs Out Of Data?

AI’s free-for-all phase may be coming to an end—as companies start counting the…