Tencent's VLM Breakthrough: 58.9% Accuracy with 4D Reasoning

Tencent's new Vision-Language Model introduces 4D reasoning that beats existing video understanding systems by over 20%, achieving 58.9% accuracy while maintaining strong general video comprehension.

⬤ Tencent's ARC team has developed Vision-Language Models that can reason across four dimensions—combining spatial awareness with temporal understanding. Their DSR Suite creates geometric question-answer pairs directly from video footage, then applies 4D reasoning through a Geometry Selection Module. This approach represents a fundamental shift in how AI systems interpret video content.

⬤ The team combined their Qwen2.5-VL-7B model with GSM to hit 58.9% accuracy on DSR-Bench testing, crushing baseline models by more than 20 percentage points. The system excels at tracking how objects move and change through space over time—something traditional video models struggle to grasp consistently.

⬤ What makes this advancement particularly valuable is that the model doesn't sacrifice general video understanding to gain 4D capabilities. Most models that add dimensional complexity lose ground on standard video tasks, but Tencent's system maintains broad video processing ability. This balance opens doors for practical applications in autonomous vehicles, robotics, and interactive AI systems.

Qwen DeepResearch 2511 Brings Deeper Analysis and Dual Modes

Alibaba's Qwen team rolled out DeepResearch 2511, bringing deeper analysis, stronger search capabilities, and more reliable long-form research. The upgrade includes dual operating modes, file uploads, and a completely redesigned user experience.

⬤ The development signals real progress in video AI. Tencent has created a model that understands not just what's happening in a video, but how objects relate to each other in space as they move through time—a crucial capability for any system that needs to navigate or interact with the physical world.

News Source

#AI News #Qwen2.5-VL #Tencent ARC #VLM

Peter Smith E-mail

Peter Smith - web3.0 projects expert and writer exploring the intersection of blockchain, AI, and online entertainment.