Comparing DALL-E 2 and Stable Diffusion 2.0

In this post, I'm comparing the two popular text-to-image models based on a specific query

Nov 27, 2022

∙ Paid

Image of ice cream — Image by Steve Buissinne from Pixabay

I've been writing frequently on the development of text-to-image AI models lately. This weekend, I decided to compare two popular frameworks (DALL-E 2 and Stable Diffusion 2.0) based on a specific query.

I typed “A man in his forties eating an ice cream”. Nothing too surreal (this could be a concept for a commercial stock image). This is what I got:

DALL-E 2

Man eating ice cream — Image Credit: DALL-E 2

The skin texture looks realistic but besides that… the “uncanny valley” aspect is undeniable. Also the ice cream looks rather rough.

This is a bit better overall, but look at the strange orange eyebrows and mismatched eyes…

Image of man eating ice cream — Image Credit: DALL-E 2

Among all the renderings, this is the only image featuring a non-Caucasian man. Overall, this is not bad. The eyes look unnatural and mismatched still.

Keep reading with a 7-day free trial

Subscribe to The PhilaVerse to keep reading this post and get 7 days of free access to the full post archives.