⬤ Ollama just dropped version 0.15, and it's bringing some practical upgrades to the table. The standout feature is a new launch command that lets you run multiple coding models—Claude Code, Codex, Droid, and OpenCode—all from one place in your Ollama environment. No more jumping between different setups when you want to switch between models.
⬤ The other big change involves GLM 4.7 Flash, which has been reworked to use way less memory when dealing with longer contexts. We're talking about support for 64,000 tokens and beyond, which is a serious improvement if you're working with complex code or extensive documentation.
⬤ For those hitting hardware limits on local machines, Ollama's also pointing users toward their cloud option. The cloud service gives you access to GLM 4.7 with full precision and complete context length—basically removing the memory bottleneck that might slow you down locally.
⬤ This update matters because it shows how AI tools are adapting to real developer needs. Between juggling multiple models, handling longer contexts, and choosing between local or cloud resources, Ollama 0.15 is making the workflow more flexible. As models get bigger and contexts get longer, updates like this could shape how developers actually use these tools day-to-day.
Alex Dudov
Alex Dudov