Back

Updated Formula for Consistent Great ‘Photos’ with AI

Listen to the automated audio version of this article:

Exactly a year ago, I published my proven ChatGPT formula to achieve consistent great ‘photos' with AI (Midjourney). It has helped me produce many very good-looking, realistic images. But with the pace at which improvements are released, does the formula stand the test of time? Or is it due an update? This is what we aim to find out today.

Before we dive in, as I did in the original post, I want to clarify that whatever you achieve with this formula is not a photo: it's an image. And I don't highlight this difference only for semantics.
I hope you will not use the formulas I publish on this blog to try to fool everyone into thinking you have taken a photo. Perhaps you can get away with it on social media and avoid the backlash of being spotted, but you won't land a job at National Geographic with it.

My goal with all I share here is simply to help you create beautiful images you can enjoy. Please refrain from using this to cheat.
And with this disclaimer out of the way, let's begin.

A Different AI Scenario

As already mentioned, my formula helped me consistently achieve great ‘photos' with AI. However, the scenario was quite different a year ago. ChatGPT was still an unchallenged force in the world of large language models (LLM), while Midjourney had some stiff competition with image generation.
Today, their reality seems to be the exact opposite, or at least that's how I feel about it. Midjourney reigns supreme (quality-wise, at least), while ChatGPT's top spot is challenged daily.

Because of this shift, I will examine how many different language models respond to the formula and then let Midjourney do its magic.

So, let's start by seeing if different LLMs output different prompts. I will keep using the free versions for this test, to keep this usable for everyone. For now, I will be using the same set of instructions and ask for a single result:

A photograph of a [subject, one or more] [engaged in an action scene] with [background context] during [time of day] with [type of lighting] and shot with a [type of camera and lens: brand, focal length and aperture] using [type of composition] and captured on a [type of film or film simulation]

All the details about this formula are written in last year's article, so I won't write them down again.
I'm not going to digress into each LLM's offer because that is already extensively covered online, and it would take way too long (and a few different posts). Here, I am sticking to the purpose of this post, which is to provide you with a proven formula to achieve consistent great images with AI generation.

By the way, please mention this blog as your source if you re-use the formula or present it in your social media content… Let me know if I was able to help you!

Now, before we start exploring the different LLMs, let's talk about Midjourney's new version.

Midjourney v6

A few months ago, Midjourney released version 6. One of the most exciting new features is the ability to directly incorporate text into your generated images. Which may play a part in our image generation, if we have signs, billboards, newspapers, etc.
Version 6 also focuses on generating images with:

  • better coherence and consistency;
  • smoother transitions between elements;
  • more logical compositions;
  • a stronger sense of unity throughout the artwork;
  • greater visual diversity, exploring a wider range of styles and techniques.

More importantly for our test, Midjourney v6 has improved its prompt following and can handle longer prompts. When I first created the formula, my outputs were longer than the maximum allowed number of tokens (72, 40+ words), though they still produced great images. Now, it seems that the limit is pushed higher almost sixfold (350+ words).

In theory, we should avoid complex prompts. The longer they are, the more tasks Midjourney needs to perform and the more relationships between keywords (giving less statistical weight to each) need to be figured out.
But it has never been an issue with the formula. And even less now with v6.

There are also new prompting methods. You can now use hashtags, brackets, and slashes to modify your prompts and get different effects (still in an Alpha stage). This should work well with AI responses that have brackets included, particularly Gemini. In fact, it seems that Google keeps its language model updated to the most recent resources, at least more than its competitors (some of which are actually disconnected from the live Internet).
You may think Google has an unfair advantage in this regard, but so should Microsoft, and yet you will see that its model failed once.

Testing LLMs

OK, so let's dive in! I asked each LLM the same question: “Please give me 1 example for a Midjourney prompt based on this formula; they can be any genre. [formula]”. What follows are their answers and the relative image generated with Midjourney v6.

I am also interested in seeing what kind of scenario the AI “thinks” of when reading this formula for the first time. I am not going to provide extra inputs, like a photography genre, for example. Will the AI give me a wide-angle landscape, or will it go close-up and be personal? How much context can it create without me adding data?

To test with a specific set of data, I could simply change one or more variables. Say I wanted Batman in all images, I would swap a [subject, one or more] with Batman. If I wanted to focus on Street Photography, I would specify that. And maybe this will be material for a future article…

Anyway, it's time to look at some prompts and images. All of the pictures are generated in a 2:1 aspect ratio (using the –ar 2:1 parameter).

OpenAI ChatGPT

Of course, we begin with the model we first used. A year on, ChatGPT 3.5 is still the base engine for the free version. But now, you get a limited daily amount of requests you can pass to the new ChatGPT 4o. Let's try both to see if there is any difference.

Tip: while you are using your 4o daily allowance, you can switch to 3.5 to ask for a different result. Just click on the thunder icon on the bottom left of the reply.

3.5

A photograph of a bustling street market in Marrakech, with vendors haggling over fragrant spices amidst a labyrinth of vibrant stalls during the early morning hours, illuminated by the soft glow of dawn, shot with a Fujifilm X-T4 and a 23mm f/1.4 lens using a street photography composition and captured on Fujifilm Classic Chrome film simulation.

You can probably tell how well these depict Marrakech. ChatGPT's precise context makes a difference in the final image.

4o

A photograph of a street artist spray-painting a mural on a gritty alley wall in downtown Tokyo during nighttime with vibrant, neon lighting and shot with a Sony A7R IV and 35mm f/1.4 lens using a wide-angle, rule-of-thirds composition and captured on a bold, high-contrast black and white film.

Annoyingly, one of the images doesn't abide by our instructions and is generated in colour. But you can see from the signs that, while they mean nothing, they make us imagine this is Japan.

At first glance, there doesn't seem to be much difference in the prompts. Maybe 3.5 gives more details, which is surprising, but they're both similar.
Photography-wise, 4o uses film on a digital camera, where film simulation would have been more correct.
On the other hand, 3.5 seems to understand the Fujifilm ecosystem very well and recommends using a lens well suited for Street Photography.

But in the end, these are merely instructions to create consistent, great ‘photos' with AI. And I think that using this combination of ChatGPT and Midjourney still works.

Anthropics Claude AI

Next, let's have a look at ClaudeAI from Anthropics, arguably the first LLM to rival ChatGPT. When asking ClaudeAI to use the formula, this is its answer:

A photograph of a professional skateboarder [subject] [engaged in an action scene] performing a mid-air trick, with [background context] a gritty urban landscape in the background during [time of day] the golden hour with [type of lighting] warm, diffused lighting, and shot with a [type of camera and lens] Sony Alpha 7R IV with a 24-70mm f/2.8 lens at f/4 using [type of composition] a low-angle, rule-of-thirds composition and captured on a [type of film or film simulation] Kodak Portra 400 film simulation.

Annoyingly, ClaudeAI writes down our bracketed variables in the answer. The text in brackets comes from the formula and is too generic to make a difference with the new enhanced prompting in Midjourney v6. So, we will have to clean this up before we can generate the images.

I really like the fourth one (bottom right). Again, the Sony A7R IV is mentioned, which we know is quite popular.
Perhaps Midjourney should have fully utilised the 24-70mm zoom capabilities, but it's a start.

By the way, ClaudeAI, at least in its free version, does not save your conversations. So, you will have to restart every time you refresh the page or visit again.

Google Gemini

Now, let's have a look at how Gemini, the AI from Google, responds.

A photograph of a trench coat-clad detective (male) [smoking a cigarette] hunched over a dead body sprawled on the rain-slicked pavement of a deserted city street. Neon signs cast harsh shadows from towering skyscrapers in the background. The scene is shrouded in fog during the early morning hours with only a single street lamp providing light. Shot with a wide-angle lens (Leica M3 with a 35mm Summicron lens, f/5.6) using a leading lines composition to draw the eye towards the body. Captured on digital with a classic black and white film simulation emphasizing grain and contrast.

Gemini produced a longer and more descriptive prompt, which is in line with what Midjourney requires in terms of prompts these days. I assume it's probably due to having direct and updated access to online resources (as expected from Google).
Aside from the brackets, it feels more natural and creative, as if it were telling a story. However, some details are incoherent, like having “neon signs cast harsh shadows” together with “only a single street lamp providing light”.

Enhanced prompting test

This is a good time to test the new prompting methods in the Alpha stage. If we modify the first sentence, we can try to get different results using a single prompt.
Like so: A photograph of a trench coat-clad detective (male) [smoking a cigarette/writing notes/holding an umbrella/with a gun in his left hand]

You can see why this is still in the Alpha stage. The variations produced some differences in the images, but they did not follow our instructions (and are in a different order). The gun is in the right hand, the umbrella is held by someone else, while drunk from the whiskey he lost the glass, the notebook is not there.
But this has lots of potential, so let's hope the Midjourney developers will refine the feature.

As per our ‘consistent great photos with AI' task… They're pretty good, right?
The red accent colour feels a lot like Sin City, so have we just found out one source used to train the model?

Microsoft Copilot

So, let's see Microsoft…

With Copilot, you have access to GPT 4 without paying the subscription. Which is probably the main reason to choose this model.
Out of the box, you can choose between 3 different conversation styles: Creative, Balanced, or Precise. So, let's see how each performs.

Creative

A photograph of a group of friends laughing engaged in an action scene of throwing colorful powder at each other with a festival atmosphere in the background during midday with bright, natural sunlight and shot with a Nikon D850 and an 85mm f/1.4 lens using a centered composition and captured on a Fujifilm Velvia 50 film simulation.

Balanced

  • Style: Epic fantasy painting
  • Subject: A dragon and a knight
  • Setting: A mist-covered mountaintop
  • Composition: Dramatic close-up
  • Lighting: Moonlight piercing through clouds
  • Additional Info: Ancient ruins in the background
  • Prompt: “An intense battle between a fierce dragon and a valiant knight atop a mist-shrouded mountain peak under the moonlight, with crumbling ruins in the distance.”

Precise

A photograph of a mother and child sharing a moment of laughter in their rustic kitchen during late afternoon with warm, natural lighting streaming in from the window. The shot was taken with a Sony Alpha a7 III and an 85mm f/1.8 lens using a frame within a frame composition and captured on a Ilford HP5 Plus 400 film simulation.

Now, I don't know how the “Balanced” version decided to go so far off-piste. The list it returned as an answer is unusable for our purposes.
But the other 2 examples seem totally fine and in line with the rest.

Also, the number of answers you can generate seems to be limited (remember, I'm using the free version), so you may need to restart after a while.

Creative

A photograph of a group of friends laughing engaged in an action scene…

Precise

A photograph of a mother and child sharing a moment of laughter…

OK, there's a lot to process here. To begin with, these indeed replicate wide-aperture 85mm shots (you'd probably capture the mother-child from a longer distance). The composition and light are spot-on.
The Nikon D850 is also a favourite among portrait photographers, so it made sense in the prompt. The Sony is not the best, so you would mask the poor skin tones with monochrome (though I doubt it's IlfordHP5-like).

What is really evident here is how good Midjourney has become with portraits. Though looking at the Precise style generation, it may still need to work on ethnic diversity.

Perplexity AI

I access Perplexity mostly on the go from its iPhone app, but it is also available on desktop browsers. I think the answer it gives is very interesting:

Prompt: A photograph of a couple walking hand in hand with a graffiti-covered alleyway in the background during the late afternoon with natural, diffused lighting and shot with a Canon RF 50mm f/1.8 STM lens using leading lines composition and captured on Kodak Portra 400 film.
Explanation: This prompt highlights a candid moment of a couple in an urban setting, with the graffiti adding a vibrant, artistic element. The late afternoon lighting provides a natural, soft illumination, and the Canon RF 50mm f/1.8 STM lens is perfect for capturing intimate, detailed shots. The leading lines composition draws the viewer's eye through the image, and the Kodak Portra 400 film is known for its excellent color reproduction and fine grain, enhancing the overall aesthetic.

I left the entire answer because it's peculiar: Perplexity adds the reasons why it gave its answer.
This is fascinating as an insight into how the AI worked its magic, but it also provides more context that could be used to refine the input into Midjourney.

Speaking of context, and based on my travels, these feel like photos taken in Toronto, don't they? Except maybe the third (bottom left) which reminds me more of Belfast.

Mistral AI

Lastly, let's have a look at the French LLM, Mistral. Lately, this has been in the news for its impressive capabilities, proving to be a very serious contender in the AI space.

A photograph of a group of mountain climbers suspended on a steep cliffside, with snow-capped peaks towering in the distance during the early morning, with soft golden light illuminating the scene, shot with a Nikon D850 and a 24-70mm f/2.8 lens using the rule of thirds composition and captured on a Kodak Portra 400 film simulation.

There is incredible detail in the landscape, that is for sure. But without providing specific input, would I choose a scene like this as my default AI-generated image? I don't know. But this is the only AI that chose a natural landscape out of the box rather than a city.


Wrapping Up

Each of the AI Large Language Models seems to perform the requested task very well. With one exception: the Microsoft Copilot “Balanced” style. So, in terms of passing the test of time, the formula to generate consistent great photos with AI seems to have fared well.
Ultimately, we want to find which of the tested models gives us the best prompts to produce consistent great photos with AI.

How Did Each LLMs Performed?

Only ChatGPT gave us context in the form of a well-known city, meaning the resulting image should feel more familiar to the viewer. In general, ChatGPT seems to favour context over subject when crafting the prompt. So, this might be something to keep in mind if we want to achieve that.

Google Gemini seems to be using the very last resources available from Midjourney to craft the prompt: more descriptive, using brackets, etc. Midjourney v6 wants prompts that feel more natural, like if you were conversing with it. So, as things stand, this may now be the preferred choice.

Microsoft Copilot went very personal, putting people and emotions front and centre (when it worked). It's the only one that decided to go so intimate on its very first attempt. Perhaps if you want to create portraits, this may be the way to go?

Claude AI is unusable for this specific purpose, as it writes a prompt that requires further polishing and is not so mind-blowingly good that it is worth the hassle.

Perplexity adds a full explanation of how it made its choice. And Mistral seems to prefer landscape as its first output.

How Was Their Output, From a Photographer's Perspective?

In terms of gear, there seem to be favourites among all AI models: the Sony A7 series and the Nikon D850 are definitely the most popular cameras. With Fujifilm and Kodak being favourites as film simulations. There seems to be more variety with lenses, though a wide Aperture is a constant (and Midjourney handles depth of field well).

The composition and light varied every time, which was great because it allowed us to produce more creative outputs.
I noticed that the subjects tend to be positioned in the left half of the image and look or move towards the right. It would be interesting to know why that is and if this has anything to do with the developers being a majority of left-to-right readers.

Interestingly (or annoyingly), Midjourney doesn't always give us four straight black-and-white images when instructed to do so. It seemed to do a better job when ChatGPT asked for “black-and-white film” rather than “black-and-white film simulation” in the other tests. Maybe it's something to consider.

Is There a Winner?

There is nothing wrong with any of the AI models except for one hallucination. But to me, there are two that stand out: Copilot for the choice of going more intimate and Gemini for the prompt crafting. And since our goal is always to find the best tool for the job, I'd say Gemini comes out as our winner today because it gives us the best, up-to-date, prompt with our formula.

So…

Now, the initial question was, “can we still create consistent great photos with AI” in the new 2024 scenario, using the proven formula? I suppose we can.
Does the formula need an update? Perhaps not just yet. But when the new improvements to Midjourney will exit the Alpha stage, it probably will. So come back here to see where this will take us.

In the meantime, here is my favourite image from this test, produced by the Copilot-Midjourney combo.

The runners-up are Claude (for its use of low-angle composition, though the hand and face details are poor) and Gemini (for the storytelling and cinematic vibe).

So, here we are at the end. Please share your thoughts. And let me know in the comments (further down on this page) if you would like me to elaborate further on the topic.
Cheers!


Help Support this Blog

If you like this post then you can see more of my work and follow me on Instagram , Twitter , YouTube , TikTok , Mastodon , Linkedin and my Facebook Page .

If you find any of the content on this blog useful, or if you kindly decide to support my work and help me create more content for you, you can donate via PayPal . Donation can be as low as £1 or as high as you want, but know that I think you are a wonderful human being and I can't thank you enough.
I also accept small donations on Ko-fi . Every little helps!

Purchasing anything from my store goes a long way in supporting my work and allowing me to create more content for this blog and my platforms. Items start at £2.97 only. In the store you will find prints, presets, books and my tuition offers. Many thanks in advance!
You can find more of my prints on Etsy and Society6 (on Society6 I only publish 10 items at a time, on a bi-monthly rotation).

If you want to receive regular updates and exclusive content, notices of occasional special offers, etc, then sign up for the newsletter. There's also a 10% discount coupon for you upon signing and regular offers that are only available to subscribers.

To find out more about my photo gear, I created a dedicated list on Amazon and Kit.co


Disclosure — Please know that some of the links in this blog are affiliate links and if you go through them to make a purchase I will earn a small commission. Always keep in mind that I link companies and their products because of their quality and not because of the commission I receive from your purchases.
The decision is yours, and whether or not you decide to buy something is completely up to you. Purchasing via these links will make no difference to the cost to you (if anything, you might even get a discount) but the commission I receive will help me pay a percentage of the costs for hosting and maintaining this blog.
Thank you!

fabienb
fabienb
https://fabienb.blog
Creative. Nomad. Photographer. (he/him) /// formerly: Creative Director, UX Lead, DesignOps Manager, Web/Graphic Designer, Photographer, YouTuber, DJ, Public Speaker, Content Creator, AI-enthusiast, Food-Blogger... /// Award-winning Designer and Photographer, published and exhibited worldwide /// also known as Koan (DJ, Design)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.