11/26 2024 401
Compiled/Qianfang Intelligence
NVIDIA has recently launched a new AI audio model named Fugatto (full name: Foundational Generative Audio Transformer Opus 1). This model can not only generate music and sound effects based on text prompts but also modify and transform existing audio to create unprecedented sound combinations.
Image Source: NVIDIA
According to NVIDIA, Fugatto boasts several unique capabilities, such as converting piano music into vocals, adjusting accents and emotions in speech, and even creating surreal sound effects like a 'screaming saxophone' or 'barking trumpet.' The model employs innovative ComposableART technology, which combines audio features generated separately during training to produce entirely new sound effects.
On the technical front, the research team used approximately 20 million audio samples from various global open-source datasets for training, resulting in a large-scale model with 2.5 billion parameters. The project was collaboratively developed by researchers from India, Brazil, China, Jordan, South Korea, and other countries. This diverse team composition enables the model to excel in handling multiple languages and accents.
Bryan Catanzaro, Vice President of Applied Deep Learning Research at NVIDIA, stated that generative AI technology will bring new creative possibilities to musicians, gamers, and creators in general. However, considering the potential risks associated with generative technology, NVIDIA currently has no plans to release this technology externally.