Prompting AI Image Generators: A Practical Guide That Isn't Magic Words

How to write image prompts that work — a practical, mechanism-based guide. Structure a prompt like a brief, use style and lighting vocabulary deliberately, set seed, steps and guidance like you know what they do, and iterate like a director instead of a slot-machine player.

You now know the machinery: words become embeddings, every region of the canvas consults them at every denoising step, and sliders like guidance and steps have precise meanings. This chapter turns that knowledge into technique. None of it is magic words — it's mechanism, applied.

If you've read our prompt engineering guide for text models, the philosophy transfers directly: be specific, iterate deliberately, and understand what the model actually receives.

Write a brief, not a spell

The internet is full of "secret Midjourney phrases." Ignore them. A prompt is compressed art direction — the model fills every decision you don't make, so a good prompt simply makes the decisions that matter. Cover what a director would:

Slot	Question it answers	Example
Subject	Who/what, doing what?	"an elderly clockmaker inspecting a pocket watch"
Setting	Where, when?	"in a cluttered workshop, dusk"
Medium/style	Photo? Painting? Whose sensibility?	"35mm documentary photograph"
Lighting	The single biggest mood lever	"single warm desk lamp, deep shadows"
Framing	Camera distance and angle	"close-up, shallow depth of field"

Assembled: "An elderly clockmaker inspecting a pocket watch, cluttered workshop at dusk, 35mm documentary photograph, single warm desk lamp with deep shadows, close-up with shallow depth of field."

That's it. No incantations — five decisions, stated plainly. The corgi-astronaut version of this from chapter 1 worked for exactly the same reason.

Style vocabulary: a few strong levers

Because the encoder was trained on captioned images from the real world, words that photographers and artists actually use map to strong, well-learned patterns. A small vocabulary goes far:

Lighting: golden hour, overcast, soft studio light, harsh noon sun, neon, candlelit, backlit, volumetric light
Medium: oil painting, watercolor, charcoal sketch, 3D render, film photograph, risograph, ukiyo-e
Camera: macro, wide-angle, telephoto compression, drone shot, fisheye, tilt-shift
Texture/finish: film grain, glossy, matte, weathered, pristine

Precision beats volume. "Cinematic, epic, stunning, masterpiece, 8K, best quality" is mostly noise — vague superlatives map to nothing specific, while one concrete term like "backlit through fog" redirects the whole image. (Some older Stable Diffusion checkpoints did respond to quality-spam because their communities trained on it; modern models largely don't need it.)

The director's loop: fix the seed

Here's the workflow upgrade that separates deliberate work from slot-machine pulls, straight from chapter 2: the seed determines the "marble block," so freeze it.

Generate a few seeds

Pick the best composition

Freeze that seed

Change ONE thing per regen

Iterating like a director: one variable at a time against a frozen seed.

Generate a small batch with random seeds — you're auditioning compositions.
Pick the most promising one and note its seed.
Now iterate with the seed fixed: adjust the lighting phrase, swap the medium, nudge guidance. Each regen changes only what you changed, so you can see cause and effect.

Without a fixed seed, every regeneration reshuffles everything and you can't tell whether your prompt edit helped. With one, you're directing.

Settings, now that you know what they mean

Every one of these maps to machinery from earlier chapters:

Setting	What it really is	Sensible default
Steps	Number of denoising passes	20–30; more is mostly slower
Guidance (CFG)	How hard the prompt-direction is exaggerated	5–9; raise for literalism, expect frying past ~12
Seed	The starting noise — the marble block	Random to explore, fixed to iterate
Aspect ratio	Shape of the latent canvas	Match intent — it changes composition, not just crop
Negative prompt	The steer-away baseline	Sparingly: name observed problems, don't paste rituals

Two notes. Aspect ratio is compositional: a portrait canvas doesn't crop a landscape image, it composes a different image — models learned that tall frames hold portraits and towers, wide frames hold panoramas. And negative prompts work best reactively: add "watermark" when you see watermarks, rather than opening with a fifty-term pasted ritual that muddies the guidance signal.

Know what prompting can't fix

The most practical skill is recognising failures that are mechanical limits, not prompt problems:

Swapped or bleeding attributes ("red cube on blue sphere" comes back recoloured) — the encoder's meaning-soup, from chapter 4. Workarounds: simplify the scene, or generate elements separately and composite.
Garbled small text — latent compression, from chapter 3. Workaround: use a recent model with strong text rendering, keep text large and short, or add it in an editor.
Anatomy at the margins (hands, crowds, distant faces) — see the limits chapter. Workarounds: inpainting/regional edits, or a different seed.

Rewriting your prompt for the ninth time won't fix what the architecture can't represent. Knowing when to stop prompting and switch tools — inpaint, upscale, edit — is the skill.

Recap

A prompt is a brief, not a spell: decide subject, setting, medium, lighting, framing — the model fills everything you leave open.
Lead with what matters: early tokens weigh more and long prompts truncate.
Use concrete style vocabulary (lighting and medium words especially); skip vague superlative-spam.
Freeze the seed to iterate — one variable per regeneration turns generation into direction.
Settings are machinery you now understand: steps ≈ 20–30, CFG ≈ 5–9, aspect ratio composes, negative prompts reactively.
Recognise mechanical limits (attribute swaps, tiny text, hands) and switch to editing tools instead of prompt-thrashing.

Still images are one denoising loop. Video asks the same machinery to hold the world steady across hundreds of frames — a much harder trick. Continue to How AI video generation works.