Did Imagen really achieve something new, or did others already publish similar work, and Google is just taking credit for polishing it?

ModernSlave

2 months ago

119

Before Imagen came out in May 2022, models like DALL·E, GLIDE, and Latent Diffusion had already made big progress in turning text into images. Imagen stood out because it used a large frozen T5 language model to understand text, which gave it much better results. It also found that making the text model bigger mattered more than making the image model bigger. Imagen produced very realistic 1024×1024 images and set a new quality benchmark. Google didn’t invent the idea but refined it in a smarter way, combining language and diffusion models more effectively.