Prompt Engineering AI breakthroughs for visual media have blindsided the world, causing a mix of excitement and worry among creators. Illustration, concept art, UX design, and more are poised to be impacted by AI’s ability to generate imagery from the written word.
How we create images will change forever.
This guide introduces the world of generative image AI. The scope is intentionally broad to be relevant for a wide-range of readers – from professionals among diverse industries, to aspiring creators alike.
You will learn about:
- The three foundational models
- Picking one to start with
- How to use it
This is a living, breathing document – it will be continuously updated to stay relevant with the latest advances in the field.
New opportunities await, let’s get started.
The Three Foundational Models
There are three independent organizations that develop and publicly release generative visual AI models. Every app or feature that uses image AI is derived from one of these core models.
- DALL-E 2 – created by OpenAI
- Midjourney – created by Midjourney (David Holz et al.)
- Stable Diffusion – created by Stability AI
Each model has their own advantages and disadvantages. We will go in depth on how to decide which to choose in an upcoming section. For now, we briefly introduce each of these three AI models below, and provide a comparison chart to see pricing and differentiation at a glance.
This was the first of the AI models to drop on the public, and it opened the world’s eyes to the creative possibilities that generative AI wields.
DALL-E 2 is generally considered one of the easiest to use, but also the most expensive. Each image generation costs $0.13 (which is way more than Midjourney and Stable Diffusion apps).
It is ‘closed source’ – meaning the general public does not have access to the code, it is all proprietary. Because of that, there is no way to train their model to be better at more specific types of styles or use-cases.
It does have a unique combination of features over its competition:
- Outpainting – which allows you to expand an image from its borders.
- Inpainting – which allow you to change and fix select parts of an image.
- An API, which allows the integration of DALL-E 2 in other software services and workflows.
The latest Midjourney model (version 4) is considered to be the best in terms of photorealistic quality.
It is more of a middle ground between the three in terms of difficulty – if you are familiar with Discord servers, it will be easy to start. If you are not, expect a little learning curve.
Unfortunately, like DALL-E 2, there is no way to customize Midjourney – it is closed source.
And for those who want to go deeper, like incorporating it into an advanced workflow or your own application, you’ll be bummed to learn that there is no API.
Stable Diffusion has an ace in the hole over its competition: it is open source. This means anyone can download the AI model for themselves to customize, control, or use it in unique ways.
Because of this, there are many apps that run on Stable Diffusion. So it’s likely that there’s an app out there that has the look, features, and pricing terms you like better than DALLE-2 and Midjourney.
And the latest version (2.1) approaches Midjourney in terms of quality.
For beginners who want to use SD, we recommend sticking to one of the popular apps that use it. We provide more info on this in the Stable Diffusion – Getting Started section.
But remember, there is a volcano-sized rabbit hole of advanced use-cases with Stable Diffusion if you are willing to get technical with it. Beginners who have aspirations to be power users of generative AI will want to consider that in their choice of AI. This also provides one of the few ways to use generative image AI for practically free and with complete control (when using a local instance).
Price and Features at a Glance
All prices in United States dollars
* These scores are based on the writer’s interpretation, quality is a subjective matter.
** Assumes you will have high usage, so you won’t be skimming on free credits.
This chart is provided to help you make a decision on an AI model to start with.
When decided, skip to the ‘Getting Started’ section for the AI model of your choice.
Getting Started: DALL-E 2
To start, create an account
New accounts start off with 50 free credits. A ‘credit’ is used whenever you generate imagery. You get 15 free credits every month thereafter.
Note: you can only use the first 50 free credits in your first month; after that, you get 15 free credits per month, but any unused credits of the initial 50 will be gone.
When you run out of credits, you will have to top up with a purchase of at least $15 dollars, which gives you 115 credits.
Once you’ve created an account and logged in, you’re greeted with the following screen:
DALL-E 2 initial user screen on desktop
All you need to do now is type up a description of something you’d like to see (1), then hit the generate button (2).
My prompt: A well drawn illustration of a medieval dragon, reading a large, ancient book, inside a grand library. Great composition.
The results are presented, you get four variations at a time.
That’s it, you’ve summoned your first creations from artificial intelligence!
You can click through the results to see which you like best. Here is my pick.
You now know enough to start experimenting in DALL-E 2.
There are more tips and features to discuss, so this section will be updated soon.
Getting Started: Midjourney
You don’t need to use Discord, but you do need to log in with those credentials.
Once you have your Discord account, you can either:
- Log in to the Midjourney app.
- Or follow this link for an invitation to the official Midjourney Discord server.
But you want us to hold your hand, don’t you? So read on below.
Ways to use
There are four ways to use Midjourney.
- Midjourney.comThis is their official web app, available at https://www.midjourney.com – note you still need a discord account to log in.
- Midjourney Discord ChannelsYou can command the Midjourney bot in the official Midjourney Discord channels.
- Your own Discord serverYou can invite the Midjourney Bot to your own server and use it from there. To do this, invite the bot to your server and give it elevated rights. Voila, you’re ready to issue commands to it directly in your server.For more instructions on how to do this, check their official guide.
- Direct message the bot (on Discord)You can direct message the Midjourney Bot, forgoing the need for any server use. Instructions in the FAQ
Midjourney has an entire guide going over their billing – you can follow that for the direct source and all possible info.
If you took a peak and groaned, we understand. Midjourney’s isn’t making this straightforward. So we tried our best to demystify what’s important here.
Starting off, you get 25 free images to generate. Think of this as your free trial.
After that, regular users will need to subscribe to one of three tiers: basic, standard, and pro.
You can see the subscription page for the comparison of features between plans. We will explain the key features and differences below without the marketing speak.
Subscription Comparison Chart
* Yearly pricing discounts 20% from the monthly price.
This means you get a high priority to have your work generated by the limited GPU resources Midjourney has. As of this writing, this equates to about 1 minute per generation, which leads to the approximate number of images per month in the chart.
If you run out of your allotted fast minutes, you can buy ‘credit top ups’, which is like buying more fast generations. And if you have the standard or pro plan, you have a back up called ‘relaxed’ generations.
‘Slow’ / relaxed generation
Run out of fast generations? You fall back to unlimited slow generations (they call it ‘relaxed’ instead of slow).
Unluckily for basic users, this does not apply. When basics run out of fast usage, they need to wait for the next month or buy more credits.
The pro plan gets a unique perk called stealth. It’s an optional setting that hides your images and prompts from other Midjourney users.
And yeah, that means by default the work you generate in Midjourney automatically gets posted to the Midjourney feed (think of it as a social media feed with AI images), along with the prompt used.
Other Midjourney users can even download your precious images from the feed.
So if you’re a genius at writing prompts and really want to protect them, you’ll have to fork over the dough for the pro plan. (More on writing prompts later).
So, which plan?
Easy peezy – since you’re reading this guide, we’ll assume you’re a beginner.
Our advice: just start with the basic plan and see if you even use up the 200 fast generations in a month. Just upgrade the plan later if needed.
Getting Started: Stable Diffusion
Diving into Stable Diffusion can be like getting lost in a cave with no end in sight. So we tried to keep this section as concise and clear as possible.
There are two primary ways to use Stable Diffusion: web apps, and self-hosted.
The simplest path, so we recommend most beginners start here.
There are many Stable Diffusion-based apps in the wild. And they usually offer credit-based systems for using them, meaning you have to pay per image you create (like DALL-E 2). There are ways to get free use though.
Here’s a super short list of the most popular ones:
- DreamstudioThe official app ran by the creators of SD. You’ll get the latest updates here and official support. On default settings, the per image cost beats DALL-E 2 and Midjourney by a lot.
- Nightcafe Studio5 free credits per day. An third-party app that’s easy to use and well designed. They provide multiple versions of SD to choose from.
- The Hugging Face demoFree to use, but very basic in functionality. The latest version gets hosted here by the creators of SD (like with Dreamstudio). Unfortunately, it doesn’t always work due to high demand.
The self-hosted options are advanced, and require either some tech-savviness or a willingness to chug through a learning curve.
Despite the tech hurdle, there are some considerable advantages with self-hosted options.
- No waiting in queues for GPU resources. This means you can expect to generate images faster compared to web apps on equivalent settings and GPUs.
- There are virtual GPU rental services that provide fast GPU speeds for a consistent experience in terms of generation time. Many apps have varying GPU speeds, sometimes trapping you with slow speeds.
- Per image costs on virtual GPU services are usually more cost effective than apps. Web apps themselves are using these services to generate, so they need to charge you more to make money.
- On a local instance, it’s free if you already have a GPU that is powerful enough to run generations at decent time intervals. And if you buy a GPU, spreading the upfront costs over all the images you generate can approach zero (given enough usage).
- Users with coding backgrounds can build their own custom workflows.