GUI-Owl-1.5 and Mobile-Agent-v3.5 Hit State-of-the-Art on 20+ GUI Benchmarks

GUI-Owl-1.5 and Mobile-Agent-v3.5 open-sourced: cross-platform AI agents hit state-of-the-art on 20+ GUI benchmarks.

Contents

1 Agent, 3 Platforms: Desktop, Mobile, and Browser
Benchmark Results: 56.5 on OSWorld, 71.6 on AndroidWorld

Two open-source AI agents built for real-world software interaction just landed. GUI-Owl-1.5 and Mobile-Agent-v3.5 are designed to work across desktop, browser, and mobile environments under one unified training framework.

Built on the Qwen3-VL architecture, they come in several sizes from 2B to 32B parameters. Instruct variants prioritize speed and task execution, while Thinking variants are tuned for planning and multi-step reasoning. This release continues the momentum of recent Qwen3-VL efficiency improvements, which cut training compute by 75% compared to earlier versions.

1 Agent, 3 Platforms: Desktop, Mobile, and Browser

A single agent can now operate across PC environments, mobile devices, and browser apps through cloud-based sandbox systems. The model observes graphical interfaces and executes actions through ADB, Playwright, and PyAutoGUI.

That means it can read what's on screen, interpret instructions, and carry out tasks across apps, while managing tool use, memory, and knowledge retrieval in parallel. Similar multi-environment agent frameworks are emerging elsewhere, such as Google's open-source ADK for always-on Gemini agents.

Benchmark Results: 56.5 on OSWorld, 71.6 on AndroidWorld

The numbers are strong across the board. GUI-Owl-1.5-32B-Instruct scored 56.5 on OSWorld-Verified. The 8B-Thinking variant hit 71.6 on AndroidWorld. The 32B-Thinking configuration reached 46.6 on VisualWebArena and 48.4 on WebArena. Using a two-stage crop-refinement method, the models also achieved 80.3 on ScreenSpot-Pro, with additional scores of 47.6 on OSWorld-MCP and 46.8 on MobileWorld.

On the architecture side, the team introduced a reinforcement learning method called MRPO for cross-platform policy learning, which stabilizes training on long tasks. Unified reasoning synthesis brings together world modeling, knowledge injection, and tool reasoning during agent execution. These advances are part of a broader shift in how AI memory systems are being redesigned, as RAG architectures give way to more integrated approaches offering up to 10x efficiency gains.

News Source

#AI #Mobile-Agent-v3.5 #GUI-Owl-1.5

Peter Smith E-mail

Peter Smith - web3.0 projects expert and writer exploring the intersection of blockchain, AI, and online entertainment.