When a new AI player suddenly grabs the No. 1 spot, the internet doesn't hold back. So what's Manus AI, and how did it claim the throne? Here's what we know—and why the skepticism tells us something important about how AI breakthroughs get measured and marketed.
What Is Manus AI?
That's exactly what happened this week when Zephyr tweeted, "LOL. How the hell is Manus No. 1??" The post struck a chord with AI researchers and enthusiasts who were equally confused about how this relatively unknown system topped the rankings. Manus AI is an autonomous agent built by Singapore's Butterfly Effect Pte. Ltd., launched in early 2025. Unlike basic chatbots, it's designed to run code, conduct research, analyze data, and automate workflows across cloud tools—essentially acting as a digital worker rather than just a conversational assistant.
The "No. 1" claim comes from reportedly stellar performance on the GAIA benchmark, which tests AI agents on real-world tasks like reasoning, planning, and using multiple tools. But here's the catch: those results mostly come from company statements and invite-only testing, not independent verification.
The hype boils down to three things:
- Bold performance claims on agent benchmarks that supposedly beat existing frameworks
 - Exclusivity factor with invite-only access creating scarcity and intrigue
 - Media buzz comparing it to breakthroughs like DeepSeek, positioning it as the next big leap in AI autonomy
 
The Problem with Benchmark Rankings
Sure, Manus posted impressive numbers on multi-step reasoning and tool use. But benchmark wins don't always mean real-world success. The issues are familiar: the company hasn't fully disclosed its architecture or data sources, the tests happened in controlled environments that don't reflect messy real-world conditions, and declaring yourself "No. 1" based on your own tests is an old marketing playbook.
As one analyst put it: "Every time someone claims a model is 'number one,' it reminds us we still don't have a standard for measuring true intelligence."
The Manus debate highlights a bigger shift—from chatbots to autonomous agents that can actually do things: plan tasks, execute workflows, and operate without constant hand-holding. That's exciting for research and business automation, but it also raises concerns about oversight, decision transparency, and bias amplification.
The Manus moment sends a clear signal: benchmarks attract attention but don't prove stability or safety. Transparency builds trust, and closed systems struggle to earn credibility without open testing. And the agent race is heating up fast—Manus is now competing with OpenAI's Assistants, Anthropic's Claude Agents, and others vying to power the next wave of AI automation.
If Manus delivers on its promises, it could push the field forward. If not, it'll be another lesson that in AI, hype usually runs ahead of proof.
                        Usman Salis
        
                                Usman Salis