Collecting robot training data is dirty, unglamorous work. Some AI labs are already paying XDOF to do it

Two weeks ago, OpenAI said it would relaunch the robotics program it shuttered in 2021 — the latest signal that the biggest AI labs are racing to teach machines to operate in the physical world. But building capable robots requires something the AI industry doesn’t yet have, which is the training data to match that used for language models.

That gap is creating a new kind of infrastructure business. Unlike LLMs that were trained on a vast sea of publicly available text, robots need data that captures physical interaction, and that kind of data barely exists. YouTube videos and footage captured by gig workers are low-fidelity and hard to reconcile with the physical world.

XDOF (pronounced “ecks-doff”), emerging from stealth today, is betting that the next great bottleneck in AI isn’t models or chips, but the data feedback loop needed to teach robots how to interact with the physical world.

The startup aims to build the data pipelines, collection tools, and annotation systems that frontier labs and robotics companies can’t easily build themselves — and has raised $70 million from Thrive Capital, Spark Capital, a16z, Lux, and WndrCo to do it. Co-founder and CEO Philippe Wu says XDOF, which has about 60 employees, is already working with 20 customers including several frontier AI labs, but cannot name them.

Collecting robot training data is dirty, unglamorous work. Some AI labs are already paying XDOF to do it | TechCrunch

Collecting robot training data is dirty, unglamorous work. Some AI labs are already paying XDOF to do it | TechCrunch

Other newsrooms on this story

Related reading

Why AI Is Only As Effective As The World On Which It Trains

Sam Altman's OpenAI just made robotics its next frontier and it's hiring to…

Physical AI Data Is So Valuable This Startup Cleans Your Home For Free (To…

OpenAI wants you to have a personal robot; starts hiring for robotics division…

It’s a race to capture real-world AI training data

Mecka AI raises $60 million to train robots with human data sourced from body…