stable-audio-open-1.0

Description

Stable Audio Open 1.0 is a state-of-the-art text-to-audio model developed by Stability AI, capable of generating stereo audio up to 47 seconds long from textual prompts. It utilizes an autoencoder for waveform compression, a T5-based text embedding for conditioning, and a transformer-based diffusion model for audio generation. The model is ideal for research and experimentation in AI-driven audio and music creation, although it has limitations in generating realistic vocals and performing well in non-English languages and diverse music styles.

Capabilities

txt2aud

stable-audio-open-1.0

Description

Capabilities

Images