Mistral Devstral-Small-2 Hits 36 Tokens Per Second on Apple M3 Ultra

Mistral's Devstral-Small-2 achieves impressive local inference speeds on Apple M3 Ultra using MLX, with 6-bit quantization proving to be the sweet spot for balanced performance.

⬤ Recent testing shows Mistral Devstral-Small-2 delivering solid inference performance when running locally on Apple's M3 Ultra through MLX. The model ran via Mistral Vibe CLI with 6-bit quantization, hitting peak speeds around 36 tokens per second. Video demonstrations captured the setup's responsiveness on Apple silicon, with accelerated playback showing the system's capabilities.

⬤ The test used inferencerlabs/Devstral-Small-2-24B-Instruct-2512-MLX-6.5bit with a 0.2 temperature setting. While 4-bit quantization was tested, it showed noticeable errors after several conversation turns that hurt output quality. The 6-bit option struck a better balance, delivering faster speeds while keeping outputs reliable during extended sessions.

⬤ In a separate run on M3 Ultra with 512GB memory using LM Studio, average speeds tracked around 27 tokens per second on the console. The video showed mixed playback speeds—normal speed initially, then 2x afterward—demonstrating consistent performance rather than just brief spikes.

Mistral's New Devstral 2 Models Hit 72.2 Score on SWE-Bench

Mistral just dropped two new open-source coding models—Devstral Small 2 and Devstral 2—that are crushing it on SWE-Bench with state-of-the-art results. They also launched Mistral Vibe, a CLI tool that automates your entire workflow.

⬤ These results matter because they show how optimized MLX workflows can deliver high-throughput local inference for large language models on accessible hardware. Strong performance from Mistral Devstral-Small-2 on Apple M3 Ultra proves you can run advanced AI models locally without cloud dependency. As MLX tooling, quantization methods, and local inference setups continue improving, configurations like this could reshape how developers deploy and test open-weight models.

News Source

#AI #AI News #Apple #Mistral Devstral-Small-2

Alex Dudov E-mail

Alex Dudov - writer with expertise in crypto, global markets, and the intersection of AI and blockchain innovation.