Tencent's LongHorizonUI Can Now Handle Tasks With 15+ Steps

Tencent and its academic partners have built LongHorizonUI, an AI framework targeting a core weakness in today's agents: falling apart mid-task when workflows get long. Here's how it works.

Contents

3 Core Components That Keep AI Agents on Track
LongGUIBench Tests Real Performance Across Apps and Games

Most AI agents are great at short tasks and increasingly unreliable once those tasks stretch past 15 steps. That's the gap Tencent set out to close. In collaboration with Georgia Tech, Tsinghua University, and the Chinese Academy of Sciences, Tencent Turing Lab has introduced LongHorizonUI, a new AI framework built specifically for complex, multi-step GUI automation. The goal is simple: keep agents on track from start to finish, even when things go sideways.

3 Core Components That Keep AI Agents on Track

LongHorizonUI is built around three components working together. The first is a Multimodal Enhanced Perceiver, which reads on-screen elements using detection and text recognition to build structured, consistent representations of the interface. This gives the agent a reliable understanding of what it's looking at before it acts.

The second is a Deep Reflection Decider, which adds multi-level validation at every step, checking that actions align with the original goal, stay consistent with what's already happened, and are actually accurate. Think of it as a built-in quality check that runs continuously throughout the task. For more on Tencent's underlying AI infrastructure, see Tencent Upgrades AngelSlim to Deliver Up to 19x Faster AI Inference.

LongGUIBench Tests Real Performance Across Apps and Games

The third component, a Compensatory Action Executor, handles what happens when something goes wrong. Instead of failing silently or abandoning the task, the agent can roll back to a previous state and correct its course. For extended automation workflows, this kind of error recovery isn't a bonus feature, it's a requirement.

To evaluate the framework, the team used LongGUIBench, a benchmark designed specifically for tasks that go beyond 15 steps across both application and gaming environments. Results show that LongHorizonUI improves task success rates while keeping pace with existing benchmarks on shorter tasks. You can also explore related developments in Tencent Releases Hunyuan World 1.5 With Real-Time Rendering Capabilities.

The broader implication goes beyond any single benchmark. As AI automation expands across enterprise software and consumer apps, the ability to execute long sequences reliably, recover from mistakes, and maintain goal alignment over time becomes foundational. LongHorizonUI is Tencent's answer to that challenge, and a meaningful step toward agents that can actually be trusted with real work.

News Source

#AI News #AI Agents

Usman Salis E-mail

Usman has been in the blockchain space for 9 years and written dozens of articles about crypto in his career. He wants to put crypto on the global map.