Generative AI:  OpenAI DALL-E-2 and Midjourney

Generative AI: OpenAI DALL-E-2 and Midjourney

Posted by Scott Phillips on 16th Jan 2023

With this post, we are going to compare two of the hottest new Generative AI image tools that create realistic art and graphics from natural language text and see what they can do.  The two are OpenAI’s DALL-E-2 and Midjourney, an independent research lab offering a beta product via Discord servers.  

Both were recently featured in the New York Times and are generating significant excitement.  For one article, a journalist reported on how a single line of text - ‘production still from 1976 of Alejandro Jodorowsky’s Tron’ - was used to generate images of a movie that was never created in the style of the famous Chilean-French Film Director, Alejandro Jodorowsky.

I tested that same line of text - without adjustments or variations - on both the DALL-E-2 and Midjourney tools to see what they produced.

Here is the DALL-E-2 image created from ‘production still from 1976 of Alejandro Jodorowsky’s Tron’:

Here is the Midjourney image to that same line of text:

Because the original line of text was used on Midjourney a number of times to craft the images in the New York Times article, it’s possible that the AI was ‘trained’ on this specific line of text.  So, I tried another text entry that was more aligned with Cloud Astronauts and our theme of learning about Cloud and AI using Space as a theme.

My text was:  'Domed city on the moon, digital photo, 8k'.   I was hoping for something that I might be able to use on the cover of one of our Mission documents.

Here is the DALL-E-2 image that came back:

And here is the Midjourney image:

These comparison images are not meant to imply any judgment or rating on which tool is better or worse.  They are meant to illustrate how much variation there is between different Generative AI engines in how they react to the very same line of text and the images that are ultimately produced.  Creativity isn’t generic, but very individual and specific, even for an AI.  Why the images are so dramatically different is likely impacted by different libraries of images they are trained on as well the underlying AI engine and how it is constructed.   

It’s also true that neither image is what I wanted or is usable in the context I was considering.  For my business case (an illustration I might be able to use as a cover for a Mission), this was an effort that, while not wasted (it was interesting), was also not productive for my use case.  (Although it did become good grist for a blog article.)

Will these tools replace graphic artists all over the world?  I strongly doubt it.  Will it create a powerful augmentative set of tools that can be used to create rapid concepts that can then be finalized and perfected by humans with or without additional AI help?  This seems almost certain.  

Given how vastly different the two tools evaluated here are, it also seems like there will not be a single Generative AI winner-take-all image generation tool for every use case, but likely different use cases with different specializations. There seems to be a lot of room for different tools to specialize. 

You will also probably see new skills needed for artists and developers to make the most productive use of these tools in the marketplace.

I can imagine a couple of different products and markets:

  1. Augmented Graphic Design/Animation.  These AI tools seem useful for generating storyboards for movies, comics, or initial graphic designs for creative art projects.  They might be able to generate animated and digital films at a much higher rate.  Think of these use cases as accelerators for graphic artists and digital artists, like playing augmented human-AI chess.  To be successful, they would need to be able to generate images within a set of rules, stay consistent between frames, generate an animation (live action or still) that is faster, better, and cheaper.  You’ll know this is real when Disney becomes a customer.
  1. Entertainment channels.  These tools could also be a powerful entertainment tool.  People can create endless variations of images that pique their curiosity or build on what goes by in a stream of like-minded people.  There could be a big benefit to communities that want to create visions on a common theme.  There could be ‘rooms’ or ‘servers’ for very different thematic content.   Unicorns and fairies.  Space rockets and moon bases.  Ceramic teapot designs.  Some of these channels could be aligned with existing content to create entire new worlds and foster new communities.  I can imagine a Star Wars channel.  Again, if Disney buys into this, we’ll know it is serious.
  1. Urban design and architecture.  Can these Generative AI tools be used to paint a picture to very specific design specs and then run design iterations with specific rules in mind?  Can you start with an existing image or map and then build on it using specific instructions.  Call this one the Augmented Urban Planner/Architect.

Capabilities I would like to see: 

a. Rule sets to define a theme, tone, or environment for consistent design use in a dedicated channel.  An animated film has different rules from an architectural design job.

b. Image-to-image design flow for consistency in frame-by-frame production.  Can you make it easy for anyone to be a comic book illustrator?

c. Integration with existing design/art tools.  Can the output images be easily and quickly ingested and used in existing design tools?

d. Upload and extend.  I would love to see a capability where you can upload existing images (in cases where you own the copyright for it) and then build on them to create unique and original creations in a blend of styles.  

The possibilities are endless. The real question is whether the economics are practical and real products can be built out of them rather than mere curiosities. On that one, the jury is still out. 

Speaking of juries, Midjourney, OpenAI, and others are being sued for copyright infringement by a legal team representing artists.  The claim is that their work (billions of images) were used to train these Generative AI engines to create art that they claim then represents copies of what these artists created and uses their styles illegally.  It will be interesting to see how these legal cases develop.   Artists today can view works online, take inspiration, print off copies, and then use these inspirations to create their own work, some of which can be very close copies or represent the influences of multiple different artists.  What the AI is doing is largely different in scale and quality, but not necessarily in substance.  The copyright claim from general model training does not seem valid, but the legal arguments will have to play out in court.