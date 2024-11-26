Audio/Video

Internet

See all Internet

IT

See all IT

Mobile Tech

See all Mobile Tech

Security

See all Security

Technology

See all Technology

Newsletters

See all Newsletters

Nvidia Reveals ‘Swiss Army Knife’ of AI Audio Tools: Fugatto

NVIDIA headquarters in Santa Clara, California

High-powered computer chip maker Nvidia on Monday unveiled a new AI model developed by its researchers that can generate or transform any mix of music, voices and sounds described with prompts using any combination of text and audio files.

The new AI model called Fugatto — for Foundational Generative Audio Transformer Opus — can create a music snippet based on a text prompt, remove or add instruments from an existing song, change the accent or emotion in a voice, and even produce sounds never heard before.

According to Nvidia, by supporting numerous audio generation and transformation tasks, Fugatto is the first foundational generative AI model that showcases emergent properties — capabilities that arise from the interaction of its various trained abilities — and the ability to combine free-form instructions.

“We wanted to create a model that understands and generates sound like humans do,” Rafael Valle, a manager of applied audio research at Nvidia, said in a statement.

“Fugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale,” he added.

Nvidia noted the model is capable of handling tasks it was not pretrained on, as well as generating sounds that change over time, such as the Doppler effect of thunder as a rainstorm passes through an area.

The company added that unlike most models, which can only recreate the training data they’ve been exposed to, Fugatto allows users to create soundscapes it’s never seen before, such as a thunderstorm easing into dawn with the sound of birds singing.

Breakthrough AI Model for Audio Transformation

“Nvidia’s introduction of Fugatto marks a significant advancement in AI-driven audio technology,” observed Kaveh Vahdat, founder and president of RiseOpp, a national CMO services company based in San Francisco.

“Unlike existing models that specialize in specific tasks — such as music composition, voice synthesis, or sound effect generation — Fugatto offers a unified framework capable of handling a diverse array of audio-related functions,” he told TechNewsWorld. “This versatility positions it as a comprehensive tool for audio synthesis and transformation.”

Vahdat explained that Fugatto distinguishes itself through its ability to generate and transform audio based on both text instructions and optional audio inputs. “This dual-input approach enables users to create complex audio outputs that seamlessly blend various elements, such as combining a saxophone’s melody with the timbre of a meowing cat,” he said.

Additionally, he continued, Fugatto’s capacity to interpolate between instructions allows for nuanced control over attributes like accent and emotion in voice synthesis, offering a level of customization not commonly found in current AI audio tools.

“Fugatto is an extraordinary step towards AI that can handle multiple modalities simultaneously,” added Benjamin Lee, a professor of engineering at the University of Pennsylvania.

“Using both text and audio inputs together may produce far more efficient or effective models than using text alone,” he told TechNewsWorld. “The technology is interesting because, looking beyond text alone, it broadens the volumes of training data and the capabilities of generative AI models.”

Nvidia at Its Best

Mark N. Vena, president and principal analyst at SmartTech Research in Las Vegas, asserted that Fugatto represents Nvidia at its best.

“The technology introduces advanced capabilities in AI audio processing by enabling the transformation of existing audio into entirely new forms,” he told TechNewsWorld. “This includes converting a piano melody into a human vocal line or altering the accent and emotional tone of spoken words, offering unprecedented flexibility in audio manipulation.”

“Unlike existing AI audio tools, Fugatto can generate novel sounds from text descriptions, such as making a trumpet sound like a barking dog,” he said. “These features provide creators in music, film, and gaming with innovative tools for sound design and audio editing.”

Fugatto deals with audio holistically — spanning sound effects, music, voice, virtually any type of audio, including sounds that have not been heard before — and precisely, added Ross Rubin, the principal analyst with Reticle Research, a consumer technology advisory firm in New York City.

He cited the example of Suno, a service that uses AI to generate songs. “They just released a new version that has improvements in how generated human voices sound and other things, but it doesn’t allow the kinds of precise, creative changes that Fugatto allows, such as adding new instruments to a mix, changing moods from happy to sad, or moving a song from a minor key to a major key,” he told TechNewsWorld.

“Its understanding of the world of audio and the flexibility that it offers goes beyond the mask-specific engines that we’ve seen for things like generating a human voice or generating a song,” he said.

Opens Door for Creatives

Vahdat pointed out that Fugatto can be useful in both advertising and language learning. Agencies can create customized audio content that aligns with brand identities, including voiceovers with specific accents or emotional tones, he noted.

At the same time, in language learning, educational platforms will be able to develop personalized audio materials, such as dialogues in various accents or emotional contexts, to aid in language acquisition.

“Fugatto technology opens doors to a wide array of applications in creative industries,” Vena maintained. “Filmmakers and game developers can use it to create unique soundscapes, such as turning everyday sounds into fantastical or immersive effects,” he said. “It also holds potential for personalized audio experiences in virtual reality, assistive technologies, and education, tailoring sounds to specific emotional tones or user preferences.”

“In music production,” he added, “it can transform instruments or vocal styles to explore innovative compositions.”

Further development may be needed to get better musical results, however. “All these results are trivial, and some have been around for longer — and better,” observed Dennis Bathory-Kitsz, a musician and composer in Northfield Falls, Vt.

“The voice isolation was clumsy and unmusical,” he told TechNewsWorld. “The additional instruments were also trivial, and most of the transformations were colorless. The only advantage is that it requires no particular learning, so the development of musicality for the AI user will be minimal.”

“It may usher in some new uses — real musicians are wonderfully inventive already — but unless the developers have better musical chops to begin with, the results will be dreary,” he said. “They will be musical slop to join the visual and verbal slop from AI.”

AGI Stand-In

With artificial general intelligence (AGI) still very much in the future, Fugatto may be a model for simulating AGI, which ultimately aims to replicate or surpass human cognitive abilities across a wide range of tasks.

“Fugatto is part of a solution that uses generative AI in a collaborative bundle with other AI tools to create an AGI-like solution,” explained Rob Enderle, president and principal analyst at the Enderle Group, an advisory services firm in Bend, Ore.

“Until we get AGI working,” he told TechNewsWorld, “this approach will be the dominant way to create more complete AI projects with far higher quality and interest.”

John P. Mello Jr.

John P. Mello Jr. has been an ECT News Network reporter since 2003. His areas of focus include cybersecurity, IT issues, privacy, e-commerce, social media, artificial intelligence, big data and consumer electronics. He has written and edited for numerous publications, including the Boston Business Journal, the Boston Phoenix, Megapixel.Net and Government Security News. Email John.

Leave a Comment

Please sign in to post or reply to a comment. New users create a free account.

Related Stories
Nvidia Blackwell Architecture
Nvidia Blackwell Is One Hot Processor
November 25, 2024
businessman thinking about competitive landscape
Intel vs. AMD, Nvidia, Qualcomm: What the Hell Happened?
October 14, 2024
Panasonic ET FMP50 Series Media Processor live setting
AV Tech Innovations Transforming Higher Education
July 19, 2024
More by John P. Mello Jr.
view all
information security professionals
Outdated Risk Management Frameworks Face Growing Criticism
November 19, 2024
AI-enhanced search
AI Search Threatens Digital Economy, Warns Researcher
November 12, 2024
multifamily residences
Multifamily Residences Turn to Tech for Tenant Appeal, Efficiency: Report
October 29, 2024
robot holding 2025 sign
Agentic AI, Cyborgs Featured on Gartner’s Tech-To-Watch List for 2025
October 23, 2024
cybersecurity ecosystem with honeypot nodes
Microsoft Turns Honeypot Into Sour Grapes for Web Marauders
October 22, 2024
smartphone retail store
Global Smartphone Shipments Rise in Q3 as Growth Streak Continues
October 16, 2024
Apple Vision Pro app windows displayed in a virtual environment
Apple Vision Pro Ecosystem Shows Sluggish Growth
October 15, 2024
remote control smart TV streaming video content on demand
Streaming TV Industry Snooping on Viewers at Grand Scale: Report
October 9, 2024
young boy playing on a handheld gaming device
New Research Waves Red Flag Over Gaming Scams Aimed at Kids
October 8, 2024
social media mobile apps
Reddit, Snapchat Ranked Least-Intrusive Social Media Platforms
October 2, 2024
More in Audio/Video
HP Imagine 2024 event
HP Imagine Showcases Unique AI Innovations
October 7, 2024
Apple AirPods Pro
Apple Weaves AI Into Latest Watch, AirPods, iPhone Models
September 10, 2024
Panasonic ET FMP50 Series Media Processor live setting
AV Tech Innovations Transforming Higher Education
July 19, 2024
Heavys and Sonos headphones packaging
Compelling New Headphones From Heavys and Sonos
July 2, 2024
A Cameraman shooting, filming process in a studio film set
OpenAI’s Sora, ElevenLabs, and the End of Video Media as We Know It
February 26, 2024
Apple Vision Pro unboxed, product review
Apple Vision Pro Impressions: One Week Later
February 20, 2024
Apple Vision Pro headset with battery
Apple’s Vision Pro: The Slow Birth of Spatial Computing
February 5, 2024
Apple Vision Pro mixed reality
Did CES Produce Any Serious Competition for Apple’s Vision Pro?
January 18, 2024
Monoprice 27-inch CrystalPro Productivity Monitor
Monoprice CrystalPro 27″ Monitor Delivers Productivity, Convenience at a Bargain Price
December 7, 2023
couple holiday shopping online
The Essential Tech Gift Guide for 2023 Holiday Shoppers
December 4, 2023

What's your outlook for the business climate in 2025?
Loading ... Loading ...

Technewsworld Channels

Applications

Applications

Tor and Tails Team Up for Better Online Privacy Protections

Audio/Video

Audio/Video

Apple Vision Pro Ecosystem Shows Sluggish Growth

Chips

Chips

Arm Goes to War Against Qualcomm: Stupid Squared

Computing

Computing

How RHEL and Fedora Shape Red Hat’s Linux Offerings

Cybersecurity

Cybersecurity

Paid Training, Placement Program Eases Cybersecurity Hiring Challenges

Data Management

Data Management

Database Admins See Brighter Job Prospects Amid IT Challenges

Developers

Developers

Strong Basics: The Building Blocks of Software Engineering

Emerging Tech

Emerging Tech

AMD Is Making Great Strides in AI, May End Up Merging With Intel

Exclusives

Exclusives

More Linux Malware Means More Linux Monitoring

Gaming

Gaming

New Research Waves Red Flag Over Gaming Scams Aimed at Kids

Hacking

Hacking

Microsoft Turns Honeypot Into Sour Grapes for Web Marauders

Hardware

Hardware

Apple Addresses Apple Intelligence: What Microsoft Missed With Copilot

Health

Health

Why and How Lenovo Is Outpacing the Competition in AI

Home Tech

Home Tech

Multifamily Residences Turn to Tech for Tenant Appeal, Efficiency: Report

How To

How To

Upgrading to Ubuntu 24.10 Oracular Oriole: Step-by-Step Guide

Internet of Things

Internet of Things

Streaming TV Industry Snooping on Viewers at Grand Scale: Report

IT Leadership

IT Leadership

HP Imagine Showcases Unique AI Innovations

Malware

Malware

Poisoned Data in AI Training Opens Back Doors to System Manipulation

Mobile Apps

Mobile Apps

Mobile Wallets Gain Ground as Key Complement to Retail Apps

Operating Systems

Operating Systems

Kubuntu Focus Iridium Laptops Set New Built-for-Linux Standard

Privacy

Privacy

Reddit, Snapchat Ranked Least-Intrusive Social Media Platforms

Reviews

Reviews

Acer Chromebook Plus 516 GE Redefines Gaming Style, Computing Performance

Science

Science

Apple Weaves AI Into Latest Watch, AirPods, iPhone Models

Search Tech

Search Tech

AI-Enhanced Searches May Pose Threat to Creators, Publishers

Servers

Servers

Intel Announces New Tech To Battle in AI Market

Smartphones

Smartphones

Global Smartphone Shipments Rise in Q3 as Growth Streak Continues

Social Networking

Social Networking

Meta and Nvidia: The Future of Social Media and Personal Technology

Space

Space

Amazon’s Competitor to Musk’s Starlink Takes Critical Step Toward Deployment

Spotlight Features

Spotlight Features

How To Leverage Gen AI Without Losing the Corporate Shirt

Tablets

Tablets

Apple Muscles Up iPad Pro With M4 Silicon and Tandem OLED Display

Tech Buzz

Tech Buzz

Agentic AI, Cyborgs Featured on Gartner’s Tech-To-Watch List for 2025

Tech Law

Tech Law

Protecting Kids From Immersive Tech Could Lead to Over-Censorship

Transportation

Transportation

A Glimpse Into the Future of AI Electric Cars

Virtual Reality

Virtual Reality

Vision Pro Revives One-and-Done App Purchases

Wearable Tech

Wearable Tech

AI-Enhanced Next-Gen Smart Glasses Could Revolutionize Wearables

Women In Tech

Women In Tech

AI-Powered Software Offers Breakthrough for Treating Dyslexia

More from ECT News Network

E-Commerce Times

Fake Web Stores, Evolving Cyberattacks Pose New Perils for Holiday Shoppers
Fake Web Stores, Evolving Cyberattacks Pose New Perils for Holiday Shoppers
November 27, 2024
Paradoxical Behavior Makes Consumers Unpredictable: Report
Paradoxical Behavior Makes Consumers Unpredictable: Report
November 20, 2024
Porch Pirates Pilfer $12 Billion in 2024: Report
Porch Pirates Pilfer $12 Billion in 2024: Report
November 13, 2024

LinuxInsider

How RHEL and Fedora Shape Red Hat’s Linux Offerings
How RHEL and Fedora Shape Red Hat’s Linux Offerings
November 21, 2024
Tor and Tails Team Up for Better Online Privacy Protections
Tor and Tails Team Up for Better Online Privacy Protections
November 5, 2024
Upgrading to Ubuntu 24.10 Oracular Oriole: Step-by-Step Guide
Upgrading to Ubuntu 24.10 Oracular Oriole: Step-by-Step Guide
October 14, 2024

CRM Buyer

2025 Looks Like Another ‘Meh’ Year for CX
2025 Looks Like Another ‘Meh’ Year for CX
November 12, 2024
CX Strategies That Drive Retention and Profitability
CX Strategies That Drive Retention and Profitability
October 29, 2024
Automating CRM Productivity Falling Short of Improving CX
Automating CRM Productivity Falling Short of Improving CX
October 22, 2024