MelodyFlow
This is the demo for MelodyFlow,
a fast text-guided music generation and editing model based on a single-stage flow matching DiT
presented at: "High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching".
Use of this demo is subject to Meta's AI Terms of Service.
for longer sequences, more control and no queue.
More details
The model will generate or edit up to 30 seconds of audio based on the description you provided. The model was trained with description from a stock music catalog, descriptions that will work best should include some level of details on the instruments present, along with some intended use case (e.g. adding "perfect for a commercial" can somehow help).
You can optionally provide a reference audio from which the model will elaborate an edited version based on the text description, using MelodyFlow's regularized latent inversion.
You can access more control (longer generation, more models etc.) by clicking the (you will then need a paid GPU from HuggingFace). This gradio demo can also be run locally (best with GPU).
See github.com/facebookresearch/audiocraft for more details.