⬤ Microsoft launched Rho-alpha, its first robotics-focused AI model from Microsoft Research, marking a significant leap in what the company calls VLA+ (Vision-Language-Action plus) systems. Built on the Phi family architecture, Rho-alpha goes beyond traditional VLA models by prioritizing adaptability in robotic behavior rather than just visual perception and language processing.
⬤ The standout innovation is tactile sensing integration, allowing robots to physically feel objects during manipulation tasks. This means robotic systems can now make decisions based on both visual data and direct touch feedback. Rho-alpha also features online learning—it continuously improves through human corrections delivered via teleoperation or 3D mouse controls, updating its capabilities in real time even after deployment instead of relying solely on pre-training.
⬤ Microsoft Research demonstrates that Rho-alpha already controls dual-arm robotic systems across practical applications. The model handles BusyBox manipulation through natural language commands, performs precision plug insertion, and executes coordinated bimanual movements for toolbox packing and object arrangement. This integration of language input, sensory feedback, and physical execution represents a meaningful shift toward more unified embodied AI.
⬤ Rho-alpha signals Microsoft MSFT's deeper commitment to robotics within the AI sector. By adding tactile input and post-deployment learning to conventional VLA frameworks, the company positions VLA+ as the next generation of adaptive robotics. This development reflects the industry's growing focus on multimodal AI systems that respond to changing environments and human guidance in real-world automation scenarios.
Victoria Bazir
Victoria Bazir