The model is trained on 500,000 real photographs and allows for generating photo-realistic images using text prompts. The model is based on Stable Diffusion v 2.1 & llama70b-v2-chat