Guides
Updated on
Aug 22, 2024

How to Use AudioCraft for Music Generation

Learn how to use Meta's AudioCraft to generate music for content production

How to Use AudioCraft for Music Generation
We Help You Engage the Top 1% AI Researchers to Harness the Power of Generative AI for Your Business.
(Image generated by Stable Diffusion)

Suppose you’re crafting a social media campaign for your company, and you are in dire need of some catchy background tunes to add that extra spice to your posts. You want an original music track, and you are bored of the options that online stock music provides. Over time, you have made use of most of them and now you are looking for something new. The inventive and creative spirit is running high and there’s already a tune buzzing in your mind. You want to pick out popular melodies from famous artists like Dua Lipa, Katy Perry, Mark Ronson and remix them with your background beats - jazz perhaps?

But you don’t know where to start from. There isn’t enough time and resources to hire a music studio to do the job. Wouldn’t it be simpler if you could just type out your requirements on a console, and generate the desired music, you wonder? Well, with the advent of AI, that kind of thing is exactly possible - composing and generating music is now at your fingertips. With just a few lines of text, you can now produce and experiment with tunes from the comfort of your laptop screen.

One such AI that is quite good at this task is AudioCraft, released by Meta. This open-source AI can paint vibrant sonic landscapes from mere words, enabling anyone with a spark of creativity to transform emotions into audio magic. In this blog post, we’ll take you through a step-by-step guide to use AudioCraft to generate audio snippets for your music remixes. We will take inspiration from one of our favorite singers - Ed Sheeran - and his groovy song, 'Shape of You'.

AudioCraft by Meta

Meta's AudioCraft represents a leap forward in the realm of generative AI for audio, offering a versatile and comprehensive codebase for a wide range of audio needs. Their suite comprises three distinct models—MusicGen, AudioGen, and EnCodec.

1. MusicGen: For Music Composition

MusicGen’s generative AI model empowers users to generate music from scratch or refine existing compositions with ease. By providing textual or melodic input, users can prompt MusicGen to craft musical pieces that resonate with their artistic vision. Musicians can collaborate with MusicGen to explore uncharted genres which are beyond their own style and expertise.

2. AudioGen: Crafting Immersive Soundscapes

AudioGen is another pivotal component of Meta's AudioCraft, engineered to generate captivating sound effects. Whether you're working on a video game, film, virtual reality project, or any application that relies on immersive audio, AudioGen offers a wealth of possibilities.

Users can input textual descriptions of the desired soundscapes or effects, and AudioGen responds by generating audio that matches the specified criteria. This level of customization allows content creators to tailor their audio assets precisely to their needs.

3. EnCodec: Elevating Audio Compression and Reconstruction

EnCodec, the final component of Meta's AudioCraft, addresses the critical challenge of high-fidelity audio compression and reconstruction. It is a tool designed to maintain audio quality while minimizing file sizes, making it an essential asset for any application where bandwidth or storage constraints are a concern.

EnCodec utilizes advanced AI techniques to compress audio files without compromising on audio quality. This is particularly valuable for streaming platforms, content distribution, or any scenario where efficient use of resources is imperative.

In essence, Meta's AudioCraft presents a suite of tools that empowers users to harness the power of generative AI for audio across various fields. Let’s begin to install AudioGen and MusicGen on our system to create our pop track from minimal text inputs.

Installation

First, install Audiocraft.

# Best to make sure you have torch installed first, in particular before installing xformers.
# Don't run this if you already have PyTorch installed.
!pip install 'torch==2.1.0'

# You might need the following before trying to install the packages
!pip install setuptools wheel

# Then proceed to one of the following
!pip install -U audiocraft  # stable release
!pip install -U git+https://git@github.com/facebookresearch/audiocraft#egg=audiocraft  # bleeding edge

We already have the melody; we want something that sounds like Ed Sheeran’s ‘Shape of You’, but in a different style. Maybe add a touch of electric guitar to it to make it sound a little more like rock? Or, perhaps, convert it into an EDM track to add a peppy vibe to it? Alternatively, we can go with something safer like adding a slow and pleasant jazz touch to our melody.  

Let’s take Ed Sheeran’s ‘Shape of You’ as our base melody and then we’ll generate three soundtracks with the following descriptions - happy rock, energetic EDM, and sad jazz.

Download the melody of ‘Shape of You’ and place it in your Colab Workspace.

import torchaudio
from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write

model = MusicGen.get_pretrained('facebook/musicgen-melody')
model.set_generation_params(duration=8)  # generate 8 seconds.
wav = model.generate_unconditional(4)    # generates 4 unconditional audio samples
descriptions = ['happy rock', 'energetic EDM', 'sad jazz']
wav = model.generate(descriptions)  # generates 3 samples.

melody, sr = torchaudio.load('/content/shape_of_you.mp3')
# generates using the melody from the given audio and the provided descriptions.
wav = model.generate_with_chroma(descriptions, melody[None].expand(3, -1, -1), sr)

for idx, one_wav in enumerate(wav):
    # Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
    audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)

Listen to the output here:

AudioCraft does a fancy job of using ‘Shape of You’s’ melody and giving it the touch we mentioned in our description.

Similarly, if we want to generate audio effects, we will have to use the other model - AudioGen - in the following manner:

import torchaudio
from audiocraft.models import AudioGen
from audiocraft.data.audio import audio_write

model = AudioGen.get_pretrained('facebook/audiogen-medium')
model.set_generation_params(duration=5)  # generate 5 seconds.
descriptions = ['description 1', 'description 2', 'description 3']
wav = model.generate(descriptions)  # generates 3 samples.

for idx, one_wav in enumerate(wav):
    # Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
    audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)

Conclusion

A technology like AudioCraft can help companies stay ahead of the curve by providing an innovative and minimal effort solution for audio creation and manipulation. One can enhance one’s social media posts by adding a musical touch to it with the help of AudioCraft. Whether you do business in music, content creation, game development, or any industry reliant on audio, AudioCraft offers many possibilities by simplifying the process of audio generation and sound effects. So, if you are in need of any expertise in setting up an advanced AI tech like AudioCraft for your workflow, please feel free to reach out to us at founders@superteams.ai.

References

https://github.com/facebookresearch/audiocraft/tree/main

Authors