by Stability AI
Stable Audio 2.0 by Stability AI is a music and audio generation model that produces full tracks with coherent musical structure up to 3 minutes long at 44.1kHz stereo. It introduced audio-to-audio generation — upload and transform audio samples using natural language prompts. Trained exclusively on a licensed dataset from AudioSparx.
Focus on genre, instruments, mood, and production style:
Cinematic orchestral score, building tension, low strings and
brass, timpani rolls, gradually increasing intensity,
dramatic and epic, suitable for a movie trailer
Thunder rolling across a mountain valley with echoes,
heavy rain on a metal roof, occasional wind gusts
Upload source audio and describe the transformation:
Transform this acoustic guitar recording into a synthwave version
with pulsing bass, retro synth pads, and drum machine
Stable Audio generates from descriptions, not lyrics. No [Verse]/[Chorus] tags.
| Parameter | Description |
|---|---|
| duration | Up to 180 seconds (3 minutes) |
| steps | Diffusion steps 50-200 (default 100) |
| cfg_scale | Guidance scale 1-15 (default 7.0) |
| negative_prompt | What to avoid in the audio |
| seed | Reproducibility seed |
Quick tips from the community about what works with Stable Audio 2.0 right now.
Sign in to share a tip.
No tips yet. Add a tip for this model.