In late October 2025, Alibaba dropped something unexpected into the AI world: Tongyi DeepResearch, an open-source language model designed specifically for deep, autonomous research. The technical report reveals a breakthrough architecture that achieves state-of-the-art results while running on a fraction of the computing power that Western models like GPT-4o require. This isn't just another incremental improvement — it's a fundamental rethinking of how intelligent systems should be trained.
A Smarter Training Paradigm
According to AI researcher and trader God of Prompt, who shared insights about the model on social media, Tongyi DeepResearch is "a 30B-parameter agent that beats GPT-4o and DeepSeek-V3 at deep research using only 3.3B active parameters.
 
            " The secret lies in Alibaba's three-phase training approach:
- Pre-training builds general language understanding and world knowledge
- Agentic mid-training teaches the model how to plan, reason, and gather information like an autonomous agent
- Post-training refines specific capabilities through reinforcement learning and supervised fine-tuning
The technical report emphasizes that this structure solves a critical problem plaguing most modern language models. Traditional training tries to balance reasoning ability and user preferences at the same time, which often weakens logical consistency. By separating agentic reasoning into its own training phase, Tongyi DeepResearch maintains stronger coherence across complex tasks.
Fully Automatic Data Generation
Perhaps the most impressive innovation is Alibaba's synthetic data pipeline. Instead of relying on expensive human annotation, the system generates its own training examples. It creates PhD-level research questions, multi-step reasoning chains, and even synthetic uncertainty to challenge itself during training. According to the engineers, roughly 20% of training sequences exceed 32,000 tokens and involve more than 10 tool invocations — what they call "superhuman complexity." This self-improving data engine makes the entire training process scalable and cost-effective.
Benchmark Results That Speak for Themselves
The numbers validate Alibaba's approach. Tongyi DeepResearch achieved record-breaking scores across multiple deep-reasoning benchmarks. On Humanity's Last Exam, it scored 32.9% compared to OpenAI DeepResearch's 26.6%. For BrowseComp, it hit 43.4% while DeepSeek-V3.1 managed only 30.0%. On xBench-DeepSearch, it reached 75.0% versus GLM-4.5's 70.0%, and on FRAMES, it posted the highest overall score at 90.6%. When the model switches to Heavy Mode — activating parallel agents and cross-synthesis reasoning — performance jumps even higher, reaching 38.3% on HLE and 58.3% on BrowseComp, setting new industry records.
Training Costs Under $500
While major AI labs burn millions of dollars training trillion-parameter models, Alibaba fine-tuned Tongyi DeepResearch for specific tasks using just two H100 GPUs over two days. Total cost? Under $500. This demonstrates that architectural intelligence and synthetic data can deliver performance gains previously thought to require massive computational budgets. It's a clear signal that bigger isn't always better.
Open-Source for Everyone
Alibaba made the entire system publicly available through GitHub, Hugging Face, ModelScope, and their official blog at tongyi-agent.github.io. The release includes the model itself, the training framework, and the benchmark suite. This level of transparency is rare among major AI companies and could accelerate global progress in developing more capable autonomous agents.
 Saad Ullah
                        Saad Ullah
         
                     Saad Ullah
                                Saad Ullah
             
                                     
                                    