When you type your prompt into Midjourney it breaks down the prompt into tokens or a series of ideas. For example, the prompt "a photorealistic portrait of a cat" would be interpreted as "photorealistic", "portrait", and "cat."
Then it gets transformed into a mathematical representation and fed into a machine learning model that creates the image.
Midjourney tends to place more emphasis on the tokens at the beginning of a prompt. The longer the prompt, the less emphasis will be placed on each token. Starting with a simple 3-5 word prompt (or less) is a great way to build a foundation for your final prompt.
Additive Prompt Structure
Nick St. Pierre has created a technique to structure your prompt that he calls "Additive Prompting". The idea is that you start with your basic idea "photorealistic portrait of a cat", then slowly add more tokens or details once Midjourney has given you an output resembling what you expect to see. Below is the order of the separate parts or types of tokens that can be used to transform your prompt from simple to advanced:
General Style (street style, editorial style, food photography, 1990s Punk style)
Composition (Off-center closeup, medium-full side angle, full body shot, two-shot)
Medium (photo, film still, illustration, sketch, sculpture)
Film Type (Kodak Gold, Agfa Vista, Kodachrome)
Subject Description (a woman walking, a man talking, a dog running)
Before we hop into the tutorial below, it's important to note that this technique is not the only way to structure your prompt. Sometimes the order doesn't matter and you get the result you want, but this structure serves as a foundation and you can rest assured that using this technique will get you very close to the outcome you're looking for.
The Joker Tutorial
We're big fans of The Joker and all things Batman, so we decided to start with a basic prompt of Joaquin Phoenix and transform him into The Joker without using "The Joker" in the prompt. Using a specific artist or aesthetic gets you close to the endpoint faster, but Additive Prompting gives you much more control over each element in the piece. Let's take a look at how this works.
Step 1: General Style
Our starting point is a basic prompt that gives us a street style photo of Joaquin Phoenix. We wanted to start as simple as possible, almost as if we are taking him from the street to the movie set where we then start applying all the details that make him The Joker.
Step 2: Composition
Next, we apply "medium shot" to the prompt because we want to see him from the waist up. Here are a few other shots we could have used.
Step 3: Medium
Then we apply "film still" to the prompt so that the image has a more cinematic feel to it.
Step 4: Film Type
Next, we add "Kodak Gold" to the prompt to give it a more vintage look. Kodak Gold is great for portraits because it adds warm color and has medium contrast properties.
Step 5: Subject Description
Then, we add "walking" to the prompt so that there is a sense of motion. You could use "dancing" to get an output that resembles the movie trailer scenes. Scroll all the way to the last step to see our final version where we used "dancing" and added "--ar 16:9" to give the image a more cinematic composition.
Step 6: Subject Styling
Now comes the fun part. Step 6 is two-part step because we applied the styling slowly. For this first part we simply added "red wool suit".
Step 6: Subject Styling (Part Two)
Then, we added "dark green slicked-back hair, clown makeup" to the prompt. We could have applied those separately but we got excited.
Step 7: Environment
In this step we added "1970s new york city". The environment doesn't change much, but his suit definitely does! The step above looks much more modern.
Step 8: Lighting
This is a subtle step where we added "overcast" and the image became a bit more gray. The suit is much less vibrant.
Step 9: Atmosphere
Next, we add "foggy" to the prompt. This creates a much more dramatic scene and the makeup is looking better somehow.
Step 10: Mood
Finally, we add "depressing", replaced "walking" with "dancing" and added "--ar 16:9" so that we could try and replicate a scene from the movie trailer. We're really happy with how it turned out.
What do you think?
Weekly newsletter (soon)
No spam. Just the latest releases and tips, interesting articles, and exclusive interviews in your inbox every week.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Join 2500+ subscribers and get a weekly delivery of advanced AI art tips, tricks, and inspiring business ideas.