Comparing DALL-E 2 and Stable Diffusion 2.0
In this post, I'm comparing the two popular text-to-image models based on a specific query
![Image of ice cream Image of ice cream](https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8761799d-3b85-4824-b3f2-e32329e52fef_1920x1280.jpeg)
I've been writing frequently on the development of text-to-image AI models lately. This weekend, I decided to compare two popular frameworks (DALL-E 2 and Stable Diffusion 2.0) based on a specific query.
I typed “A man in his forties eating an ice cream”. Nothing too surreal (this could be a concept for a commercial stock image). This is what I got:
DALL-E 2
The skin texture looks realistic but besides that… the “uncanny valley” aspect is undeniable. Also the ice cream looks rather rough.
This is a bit better overall, but look at the strange orange eyebrows and mismatched eyes…
Among all the renderings, this is the only image featuring a non-Caucasian man. Overall, this is not bad. The eyes look unnatural and mismatched still.
Keep reading with a 7-day free trial
Subscribe to The PhilaVerse to keep reading this post and get 7 days of free access to the full post archives.