d3sign / Getty Images
OpenAI's five-year licensing deal with News Corp is worth more than $250 million, according to the Wall Street Journal. That figure, reported when the deal was announced in May 2024, remains the largest disclosed AI content licensing agreement.
It is not an outlier. Reddit $RDDT +6.44% signed a content licensing deal worth about $60 million per year with Google $GOOGL +0.58% to train Gemini AI models, an agreement reached in early 2024, just before Reddit's IPO filing. In its IPO prospectus, Reddit disclosed data licensing arrangements with an aggregate contract value of $203 million and terms of two to three years, according to TechCrunch. Reddit CEO Steve Huffman had argued the platform's data should not be "given to some of the largest companies in the world for free."
These numbers map a market that barely existed three years ago. The global AI training dataset market was valued at $3.59 billion in 2025 and is projected to grow to $4.44 billion in 2026, according to Fortune Business Insights. A separate estimate from MarketIntelo values the dataset licensing segment at $4.8 billion in 2025, projected to reach $22.6 billion by 2034.
The era of free AI training data is over. What is replacing it has a price list.














