⬤ Two open-weight AI models just proved they can build production-ready software completely on their own. In a recent test by Kilo Code, GLM 4.7 and MiniMax M2.1 were challenged to create CLI task runners from scratch—tools that handle YAML parsing, topological sorting with cycle detection, process management, and file hashing. Both models knocked it out in 10-14 minutes flat.
⬤ GLM 4.7 went all-in with a 741-line architecture plan, ultimately generating 1,850 lines of code across 18 files. It included a thorough 363-line README and complete documentation for $0.30. MiniMax M2.1 took a leaner approach with a 284-line plan and 9 files in a flat structure. While it skipped the README, it impressed by catching and fixing its own parsing bug during testing—all for just $0.15.
"The models showcased their ability to autonomously plan, code, debug, and test—skills that once required significant human expertise."
⬤ Here's the kicker: despite their different approaches, both models nailed all 20 requirements and produced functionally identical results. GLM 4.7 delivered more modular, well-documented code, while MiniMax M2.1 focused on efficiency and cost savings. Either way, watching AI systems plan, build, and debug complex software without human intervention marks a real shift in what's possible.
⬤ This comparison shows how AI development is becoming more accessible. With tools like Kilo Code's Parallel Mode, developers can now run multiple AI models side-by-side to find the sweet spot between performance and budget. What used to require days of senior developer time can now be tested and deployed in minutes—opening new possibilities for businesses and developers alike.
Usman Salis
Usman Salis