Chapter 90·Beginner·10 min read
Prompting AI Image Generators: A Practical Guide That Isn't Magic Words
How to write image prompts that work — a practical, mechanism-based guide. Structure a prompt like a brief, use style and lighting vocabulary deliberately, set seed, steps and guidance like you know what they do, and iterate like a director instead of a slot-machine player.
July 19, 2026
You now know the machinery: words become embeddings, every region of the canvas consults them at every denoising step, and sliders like guidance and steps have precise meanings. This chapter turns that knowledge into technique. None of it is magic words — it's mechanism, applied.
If you've read our prompt engineering guide for text models, the philosophy transfers directly: be specific, iterate deliberately, and understand what the model actually receives.
Write a brief, not a spell
The internet is full of "secret Midjourney phrases." Ignore them. A prompt is compressed art direction — the model fills every decision you don't make, so a good prompt simply makes the decisions that matter. Cover what a director would:
| Slot | Question it answers | Example |
|---|---|---|
| Subject | Who/what, doing what? | "an elderly clockmaker inspecting a pocket watch" |
| Setting | Where, when? | "in a cluttered workshop, dusk" |
| Medium/style | Photo? Painting? Whose sensibility? | "35mm documentary photograph" |
| Lighting | The single biggest mood lever | "single warm desk lamp, deep shadows" |
| Framing | Camera distance and angle | "close-up, shallow depth of field" |
Assembled: "An elderly clockmaker inspecting a pocket watch, cluttered workshop at dusk, 35mm documentary photograph, single warm desk lamp with deep shadows, close-up with shallow depth of field."
That's it. No incantations — five decisions, stated plainly. The corgi-astronaut version of this from chapter 1 worked for exactly the same reason.
Style vocabulary: a few strong levers
Because the encoder was trained on captioned images from the real world, words that photographers and artists actually use map to strong, well-learned patterns. A small vocabulary goes far:
- Lighting: golden hour, overcast, soft studio light, harsh noon sun, neon, candlelit, backlit, volumetric light
- Medium: oil painting, watercolor, charcoal sketch, 3D render, film photograph, risograph, ukiyo-e
- Camera: macro, wide-angle, telephoto compression, drone shot, fisheye, tilt-shift
- Texture/finish: film grain, glossy, matte, weathered, pristine
Precision beats volume. "Cinematic, epic, stunning, masterpiece, 8K, best quality" is mostly noise — vague superlatives map to nothing specific, while one concrete term like "backlit through fog" redirects the whole image. (Some older Stable Diffusion checkpoints did respond to quality-spam because their communities trained on it; modern models largely don't need it.)
The director's loop: fix the seed
Here's the workflow upgrade that separates deliberate work from slot-machine pulls, straight from chapter 2: the seed determines the "marble block," so freeze it.
- Generate a small batch with random seeds — you're auditioning compositions.
- Pick the most promising one and note its seed.
- Now iterate with the seed fixed: adjust the lighting phrase, swap the medium, nudge guidance. Each regen changes only what you changed, so you can see cause and effect.
Without a fixed seed, every regeneration reshuffles everything and you can't tell whether your prompt edit helped. With one, you're directing.
Settings, now that you know what they mean
Every one of these maps to machinery from earlier chapters:
| Setting | What it really is | Sensible default |
|---|---|---|
| Steps | Number of denoising passes | 20–30; more is mostly slower |
| Guidance (CFG) | How hard the prompt-direction is exaggerated | 5–9; raise for literalism, expect frying past ~12 |
| Seed | The starting noise — the marble block | Random to explore, fixed to iterate |
| Aspect ratio | Shape of the latent canvas | Match intent — it changes composition, not just crop |
| Negative prompt | The steer-away baseline | Sparingly: name observed problems, don't paste rituals |
Two notes. Aspect ratio is compositional: a portrait canvas doesn't crop a landscape image, it composes a different image — models learned that tall frames hold portraits and towers, wide frames hold panoramas. And negative prompts work best reactively: add "watermark" when you see watermarks, rather than opening with a fifty-term pasted ritual that muddies the guidance signal.
Know what prompting can't fix
The most practical skill is recognising failures that are mechanical limits, not prompt problems:
- Swapped or bleeding attributes ("red cube on blue sphere" comes back recoloured) — the encoder's meaning-soup, from chapter 4. Workarounds: simplify the scene, or generate elements separately and composite.
- Garbled small text — latent compression, from chapter 3. Workaround: use a recent model with strong text rendering, keep text large and short, or add it in an editor.
- Anatomy at the margins (hands, crowds, distant faces) — see the limits chapter. Workarounds: inpainting/regional edits, or a different seed.
Rewriting your prompt for the ninth time won't fix what the architecture can't represent. Knowing when to stop prompting and switch tools — inpaint, upscale, edit — is the skill.
Recap
- A prompt is a brief, not a spell: decide subject, setting, medium, lighting, framing — the model fills everything you leave open.
- Lead with what matters: early tokens weigh more and long prompts truncate.
- Use concrete style vocabulary (lighting and medium words especially); skip vague superlative-spam.
- Freeze the seed to iterate — one variable per regeneration turns generation into direction.
- Settings are machinery you now understand: steps ≈ 20–30, CFG ≈ 5–9, aspect ratio composes, negative prompts reactively.
- Recognise mechanical limits (attribute swaps, tiny text, hands) and switch to editing tools instead of prompt-thrashing.
Still images are one denoising loop. Video asks the same machinery to hold the world steady across hundreds of frames — a much harder trick. Continue to How AI video generation works.