Microsoft’s new Rho-alpha model brings tactile sensing to robotics - TechTalks

While large language models (LLMs) have mastered the art of processing text and images, they remain largely confined to the digital realm. Moving from generating code to folding laundry requires a fundamental shift in how AI perceives the world. Microsoft is attempting to bridge this gap with Rho-alpha (⍴ɑ), a new robotics foundation model designed to bring adaptivity to physical tasks.

Rho-alpha falls under the category of Vision-Language-Action (VLA) models. These systems ingest visual data and natural language commands to output robot arm actions. However, standard VLAs often struggle with precision tasks where vision is obstructed or insufficient, such as manipulating a slippery object or inserting a plug behind a desk. Rho-alpha addresses this by integrating tactile sensing directly into its decision-making process, a capability Microsoft refers to as “VLA+.”

The architecture of VLA+

The core innovation of Rho-alpha lies in how it processes sensory data. Most multimodal models attempt to tokenize every input, converting images and text into discrete units that a transformer can process. However, tactile feedback is a high-frequency, continuous signal that represents force and resistance and can’t be represented as discrete tokens.

The architecture of VLA+

Microsoft’s new Rho-alpha model brings tactile sensing to robotics - TechTalks

Microsoft’s new Rho-alpha model brings tactile sensing to robotics - TechTalks

Related reading

AlphaOne gives AI developers a new dial to control LLM ‘thinking’ and boost…

Researchers from UC Berkeley, Nvidia, and Stanford unveil T-Rex framework for…

Ai2’s MolmoAct model ‘thinks in 3D’ to challenge Nvidia and Google in robotics…

Robbyant Releases LingBot-VLA 2.0: An Open-Source 6B Vision-Language-Action…

Microsoft's first reasoning model is one of 7 AIs just released at Build - what…

Robbyant Unveils LingBot-Depth 2.0 and LingBot-Vision to Redefine Robotic…

Related reading

AlphaOne gives AI developers a new dial to control LLM ‘thinking’ and boost…

Researchers from UC Berkeley, Nvidia, and Stanford unveil T-Rex framework for…

Ai2’s MolmoAct model ‘thinks in 3D’ to challenge Nvidia and Google in robotics…

Robbyant Releases LingBot-VLA 2.0: An Open-Source 6B Vision-Language-Action…

Microsoft's first reasoning model is one of 7 AIs just released at Build - what…

Robbyant Unveils LingBot-Depth 2.0 and LingBot-Vision to Redefine Robotic…