PixArt-Σ is an advanced Diffusion Transformer model designed for 4K resolution text-to-image generation. It significantly improves over its predecessor PixArt-α by offering higher fidelity images and better alignment with text prompts thanks to high-quality training data and efficient token compression. With a model size of just 0.6B parameters, it rivals state-of-the-art systems while being more efficient, making it ideal for producing ultra-high-resolution visual content, such as posters and wallpapers, in industries like film and gaming. The model can run on GPUs with less than 8GB VRAM when optimized.

pixart-sigma

Description

Capabilities

pixart-sigma

Description

Capabilities

Images