Alibaba's Tongyi Lab has introduced a significant advancement in AI automation technology with the launch of Mobile-Agent-v3.5. This latest release is powered by the GUI-Owl-1.5 foundation model and brings comprehensive multi-platform automation capabilities to PC, mobile, browser, and other graphical user interfaces. The platform represents a major step forward in bridging intelligent agents with real-world human-computer workflows, offering capabilities that extend far beyond simple scripted interactions. The move follows broader developments in AI agent systems designed to enhance reasoning and environment interaction.
Multi-Platform Automation System Features 2B to 235B Parameter Models
Mobile-Agent-v3.5 is built on the GUI-Owl-1.5 native GUI agent foundation model, which includes both instruct and thinking variants ranging from 2B to 235B parameters. The system emphasizes seamless cloud-edge collaboration and real-time interaction through a hybrid data flywheel that combines simulated and sandbox data environments. This architectural approach enables the platform to handle complex automation tasks across multiple device types and operating systems simultaneously.
The platform's evolution represents a significant leap from earlier single-agent multimodal phone operation models to current multi-modal, multi-platform GUI agents enhanced with advanced tools, memory systems, and knowledge integration.
State-of-the-Art Performance Across 20+ GUI Benchmarks
Mobile-Agent-v3.5 delivers impressive benchmark results, achieving 56.5 on OSWorld, 71.6 on AndroidWorld, and 80.3 on ScreenSpotPro. These scores demonstrate the system's robust performance across diverse graphical user interface environments. The platform supports desktop, mobile, and browser automation through enhanced tool use, memory integration, and multi-agent coordination capabilities.
Key innovations include Unified Enhancement of Agent Capabilities, which strengthens tools, memory, and coordinated multi-agent behaviors, along with Multi-platform Environment RL Scaling (MRPO) to optimize performance across different operating environments. These technical advances enable agents to operate reliably in real-world GUI contexts with greater accuracy and flexibility.
Impact on Enterprise and Consumer Applications
The introduction of Mobile-Agent-v3.5 underscores a fundamental shift toward native GUI automation models capable of handling highly interactive workflows with intelligent agents. As enterprise and consumer applications increasingly adopt multimodal and cross-platform AI systems, solutions integrating robust automation, multi-agent coordination, and large-context reasoning are positioned to transform how software tasks are automated across devices and environments. Industry trends show growing demand for language models tailored for mobile and web code assistance, reflecting the broader movement toward comprehensive AI-powered automation. The GUI-Owl-1.5 model family's support for structured instruction and reasoning variants enables richer control flows and long-horizon task execution within graphical interfaces, opening new possibilities for practical AI deployment in everyday computing scenarios.
Eseandre Mordi
Eseandre Mordi