Harvard Business Review LogoJune 15, 2026dene398/Getty ImagesFrontier AI models are trained on the accumulated digital output of humanity, which AI companies acquired essentially for free. This is a problem for both AI companies and content creators. For AIThe fight over the data that trains artificial intelligence has become one of the defining economic conflicts of the decade. Publishers, authors, and visual artists argue that their work was taken without permission or payment. AI companies counter that training on available data constitutes fair use and that even if a market in data were desirable, compensating millions of creators is technically impossible: the cost of figuring out what any given piece of data is worth, researchers have argued, would swallow most of the value that data creates in the first place.
How AI Companies Can Pay Fair Rates for the Content They Need
Frontier AI models are trained on the accumulated digital output of humanity, which AI companies acquired essentially for free. This is a problem for both AI companies and content creators. For AI companies, future training will require new, high-quality human data. For creators, they face a choice between modest one-off licensing deals and copyright litigation, neither of which offers reasonable compensation. Yet AI companies already produce the two data sets required for pricing content, as a matter of course, every time a model is trained: data mixture and scaling laws capture how to divide the pie and how big it is. Further, collective management organizations (CMO), like those used in the music industry, offer a model for how to distribute payments. The technical objection that has kept both sides arguing in the dark—that data simply cannot be valued at scale—does not hold up. This framework offers both AI companies and content creators a fair path forward.
AI firms trained frontier models on free human content, sparking conflict with publishers and creators over compensation and fair use. Compensating creators could exceed the data's worth; tech teams must assess legal and reputational risk in their AI training pipelines.












