Alex de Vigan, CEO & Founder of Physicl, building world-ready data infrastructure powering robotics, world models, and Physical AI systems.gettyIn my last article, I wrote about the danger of building during an AI gold rush: when capital moves faster than understanding, founders can mistake momentum for foundations.That risk is now becoming especially clear in physical AI.The market is rightly excited about robots, world models and embodied systems that can understand and operate in real environments. The ambition is real. The investment is real. The demos are becoming more impressive every month.But beneath that excitement, the industry is approaching a more basic constraint: Physical AI cannot scale without the right data.AI systems are only as good as what they are trained on. For language models, the internet provided an enormous body of text, images and video. It was imperfect, but it existed. For physical AI, the equivalent training layer does not yet exist at anything close to the scale required.Robots and spatial models do not simply need more data. They need data that reflects reality: geometry, depth, materials, lighting, physics, occlusion and the countless variations that make the real world difficult to predict.That is the gap the industry now has to confront.The Real Bottleneck Is Training RealityThe phrase “synthetic data” is often used too broadly in AI discussions.For physical AI, synthetic data is only useful if it is grounded in physical consistency. It is not enough to generate visuals that look convincing to a human observer. A cup is not just a cup. It has a surface, a weight, a material, a center of gravity, a handle, a reflection, a shadow and a relationship to everything around it. Place that same cup on a glass table, inside a crowded sink, under poor lighting or beside a moving human hand, and the problem changes again. The underlying data must respect the rules of geometry, scale, motion and interaction. If a model is trained on data that looks realistic but behaves incorrectly, it may learn the wrong lessons. That matters far more in physical AI than in purely digital domains. A chatbot can produce a weak answer and be corrected. A physical system operating in the real world has a much smaller margin for error.This is the fundamental difference between generating an image of reality and training a system to operate within it.A useful analogy is flight simulation. The value of a simulator does not come from making the sky look beautiful. It comes from accurately reproducing the conditions a pilot must respond to: turbulence, instrument behavior, weather, failure modes and edge cases. The point is not visual realism alone. The point is operational realism.Physical AI requires the same shift in thinking.The industry needs training environments and datasets that capture not only what the world looks like, but how it works. That includes structured 3D assets, physically accurate digital twins, material properties, spatial metadata and the ability to generate variations at scale. It also includes the workflows to validate whether those representations are accurate enough for training.This is where the comparison with language AI becomes useful, but only up to a point.The LLM era was accelerated by the availability of internet-scale data and companies that built the infrastructure to make that data usable. Physical AI will require a similar infrastructure layer, but the raw material is different. The next training frontier is not the web. It is the physical world, translated into data that machines can understand.That translation layer does not yet exist at sufficient scale.This is the opportunity, but also the warning. If the physical AI ecosystem focuses too heavily on models and robots while underinvesting in data quality, progress will look faster than it really is. The industry will produce better demos before it produces better deployment.And deployment is where the true test begins.The Next Infrastructure Layer Will Be Built Below The SurfaceIn most technology waves, the companies that define the future are not always the ones that look most exciting at the beginning.The internet needed fiber, databases and content delivery networks. Cloud computing needed storage, orchestration and reliability infrastructure. Modern AI needed data labeling, evaluation and model operations.Physical AI will need world-ready data infrastructure.That means systems capable of turning incomplete physical information into structured, simulation-ready representations. It means building datasets that are diverse enough to capture real-world variation and precise enough to support reliable model training. It means creating feedback loops between simulation and deployment, so that every failure in the real world can improve the next version of the training environment.This is the shift we are focused on. The goal is not to build another robot or another foundation model. It is to help create the data infrastructure those systems will depend on: physics-aware 3D data that allows developers, researchers and AI teams to train systems with a deeper understanding of the physical world.As physical AI matures, progress will depend more and more on the strength of the full training pipeline. The winners will be the teams that can generate better data, validate it faster and iterate more reliably between virtual and real environments.This will also require a more disciplined conversation about what “real-world ready” actually means. It must mean the ability to operate across variation, uncertainty and edge cases without breaking down.That is a much harder standard. It is also the only standard that matters.For me, the next phase of AI will not be about making machines smarter. It will be about making the world legible to them.And that starts with the data they are trained on.Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?
Why AI Is Only As Effective As The World On Which It Trains
But beneath that excitement, the industry is approaching a more basic constraint: Physical AI cannot scale without the right data.










