The Limits of AI Image Generation: Artifacts, Copyright & Deepfakes

An honest accounting of AI image generation's limits — why hands and text still glitch, where the copyright fight actually stands, what deepfakes and provenance standards like C2PA mean, and which limits are engineering problems versus permanent trade-offs.

Every guide on this site ends the same way: with the honest accounting. The LLM guide closes on hallucination and brittleness; this one closes on melted hands, courtrooms, and forged faces.

The frame to carry through: some limits below are engineering problems — they're being fixed release by release. Others are structural — they follow from what a diffusion model is, and no amount of scale makes them go away. Telling the two apart is most of the wisdom.

The artifact taxonomy

By now you have the tools to diagnose every classic AI-image glitch by the layer that produces it:

Artifact	Layer responsible	Status
Six fingers, impossible joints	Diffusion's local plausibility — each patch of hand looks fine; nothing counts fingers	Much improved with scale; not solved
Garbled small text	Latent compression discards glyph detail	Largely fixed in newer models
Swapped/bleeding attributes	Text encoder blends the prompt's meaning	Improving with LLM-grade encoders
Melted faces in crowds	Too few latent values per distant face	Improving with bigger latents
Teleporting objects in video	No object registry — local plausibility over time	The frontier problem

Why are hands the famous failure? They're the perfect storm for a statistical sampler: extremely articulated (dozens of valid configurations), frequently occluded in training photos, and small in frame — so the model learned "hand-ness" as a texture more than a structure. Nothing in the denoising loop counts fingers, because nothing in it counts anything.

Copyright: the input-side fight

The models in this guide learned from billions of scraped image–caption pairs — overwhelmingly copyrighted, overwhelmingly unlicensed. Whether that ingestion is lawful is the biggest open question in generative AI, and it's being fought in court right now:

Artists and stock agencies (most prominently Getty Images, and artist class actions) argue training on their work without licence or payment is infringement at industrial scale.
Model makers argue training is transformative fair use: the model extracts statistical patterns, stores no copies (as chapter 1 explained mechanically), and outputs new images.
Complicating both positions: models can sometimes memorise and regurgitate near-copies of images that appeared many times in training data — rare, but demonstrated, and legally potent.

There is no settled answer yet, and rulings so far have split along fact-specific lines. The most significant image-side decision to date — the UK High Court's November 2025 judgment in Getty Images v. Stability AI — mostly went against Getty: the court held that because the trained model doesn't store the training images, it isn't itself an "infringing copy." But it settled less than the headlines suggested — Getty had dropped the core question of whether training itself infringes mid-trial, and its appeal was cleared to proceed in 2026. Meanwhile, jurisdictions diverge (the EU's AI Act pushes toward training-data transparency and opt-outs, while several US rulings have leaned toward fair use for training — with carve-outs, such as for pirated source libraries). The pragmatic consequences already visible: licensed-data models marketed as "commercially safe" (Adobe Firefly's pitch), opt-out registries, and indemnification offers from big vendors to their paying customers.

Copyright: the output side

A separate question with a clearer current answer: who owns what you generate?

Under present US Copyright Office doctrine, copyright requires human authorship — a purely AI-generated image gets none. Prompting alone hasn't been enough; protection attaches to the human contributions: selection and arrangement, substantial editing, composite work. Practically:

Your raw Midjourney output starts life unprotected — anyone can reuse it.
Your curated, edited, composited final piece can be protected as to your contributions.
Platform terms layer on top: most tools grant you broad usage rights to outputs, which is a contract question, not a copyright one.

None of this is legal advice, and it's a fast-moving area — but "the picture I generated is automatically mine" is currently wrong in an important way.

Deepfakes and the provenance turn

The same machinery that paints corgi astronauts generates photorealistic people doing things they never did. That capability is here, cheap, and improving — non-consensual imagery, fraud (the CFO-on-a-video-call scam is no longer hypothetical), and political disinformation are the sharp edges.

The instructive part is how the response is evolving:

Detection arms race (losing)

Invisible watermarks (SynthID-style)

Signed provenance metadata (C2PA)

Verify origin, not vibes

The industry's shift: from detecting fakes to proving provenance.

Detection is a losing arms race — every detector becomes training signal for the next generator, and detection accuracy on frontier models keeps decaying. So the industry's weight has shifted to provenance:

Invisible watermarks (like Google's SynthID) embedded in generated pixels, robust to resizing and light editing — flagging this was generated.
C2PA / Content Credentials: cryptographically signed metadata recording how media was created and edited, adopted by camera makers, Adobe, and the major AI vendors — proving where this came from.

Neither is complete (open-weight models needn't watermark; metadata can be stripped), but the strategic shift is the story: the future of trust in images is verifying origin, not eyeballing artifacts. Teaching people to "spot the six fingers" is already obsolete advice.

The structural limits

Strip away everything fixable and three limits remain, because they are the design:

Plausible over true, forever. A diffusion model samples from "what images look like." It can always render confident nonsense — the visual twin of LLM hallucination. Scale shrinks the error rate; nothing sets it to zero.
The training distribution is the box. Models render what cameras photographed and artists drew. Genuinely novel viewpoints, underrepresented subjects, tomorrow's aesthetics — outside the distribution, quality falls off a cliff, and biases in the data (whose faces, whose beauty standards, whose defaults) come along invisibly.
No intent, no accountability. The model doesn't know it made an image, let alone why. Every consequential decision — what to depict, whether it's appropriate, whether it's honest — remains exactly where it always was: with the human publishing it.

Recap

Every classic artifact has a mechanical address: hands fail in sampling, text in compression, attributes in the encoder — capacity limits get fixed, philosophy limits only shrink.
The training-data copyright fight is unresolved: the UK's Getty v. Stability ruling (Nov 2025) went mostly against Getty but left the core training question undecided and under appeal; memorisation edge cases and jurisdiction differences keep it genuinely open.
Purely AI-generated images currently get no US copyright — protection attaches to human contribution; platform terms are a separate, contractual layer.
On deepfakes, the field has pivoted from detection (a losing race) to provenance — SynthID-style watermarks and C2PA signed credentials.
The permanent limits: plausible-over-true sampling, the training distribution as a box, and zero intent — the judgment stays with the human.

That's the complete guide — from static to Sora to the courtroom. For the machinery underneath the text side of these systems, continue to How Large Language Models Work; for steering any generative model well, Prompt Engineering from Beginner to Advanced.