⬤ ModelScope has announced the release of ACE-Step v1.5, an open-source music foundation model aimed at efficient, high-quality song generation on consumer hardware. The model can run locally with less than 4GB of VRAM and generate full songs in under two seconds on an A100 GPU or under ten seconds on an RTX 3090. The update focuses on lowering hardware requirements while maintaining competitive output quality.
⬤ The accompanying benchmark table compares ACE-Step v1.5 with a range of commercial and open-source music generation models across multiple evaluation dimensions including audio quality, coherence, musical structure, style alignment, and lyric alignment. The results show ACE-Step v1.5 scoring at or near the top in several categories, indicating that its performance is competitive with, and in some cases exceeds, that of established proprietary systems.
⬤ Beyond benchmark performance, ACE-Step v1.5 introduces practical features aimed at customization and usability. The model supports training personalized LoRA adapters using only a small number of audio samples, allowing users to adapt the system to specific musical styles with limited data. It's built on a hybrid architecture combining a language model with a diffusion transformer and incorporates internal reinforcement learning rather than external reward models. The release also expands language coverage to more than 50 languages and includes editing capabilities such as covers, audio repainting, and vocal-to-background-music conversion.
⬤ This release demonstrates continued progress in open-source generative audio. Faster generation speeds, lower memory requirements, and competitive benchmark results suggest that locally run models are becoming increasingly viable alternatives to cloud-based systems. As open-source tools continue to close the gap with commercial offerings, developments like ACE-Step v1.5 may influence adoption patterns and innovation across music production, research, and creative workflows.
Marina Lyubimova
Marina Lyubimova