⬤ Spirit AI just dropped Spirit v1.5, their newest vision-language-action robotic foundation model that turns what robots see into actual physical movements. The model jumped straight to first place on the RoboChallenge Table30 benchmark, making it the top-performing open-source option available right now. It's a pretty big deal for anyone working on robot intelligence without massive corporate budgets.
⬤ The numbers tell the story—Spirit v1.5 hit a 50.33% success rate on RoboChallenge Table30 tasks, beating out Pi0.5's 42.67% and Wall-Oss v0.1's 35.33%. That's nearly an 8-point jump over the previous leader, showing real progress in how well these models can understand what they need to do and actually pull it off without messing up.
⬤ "Spirit v1.5 tightly integrates vision, language understanding, and action generation," according to the repository description, which highlights how the model connects seeing, thinking, and doing in one package. Spirit AI released the complete model implementation plus everything you need to run it and check the results yourself on RoboChallenge, making it easy for developers and researchers to actually use this thing instead of just reading about it.
⬤ This matters because getting robots to work reliably in the real world has always been incredibly tough. When a high-performing vision-language-action model goes open-source, it means more people can experiment with it, build on it, and apply it to actual problems without hitting paywalls or access restrictions. Spirit v1.5's benchmark win shows open-source robotics is catching up fast to what only big labs could do before.
Saad Ullah
Saad Ullah