Expedition 06·Intermediate·12 min read
The Honest Limits: What Generative AI Is Still Bad At
Generative AI is genuinely powerful — and genuinely limited. An honest, no-hype look at what large language models are still bad at, why those weaknesses exist, and how to use AI well by working with the grain instead of against it.
June 15, 2026
We'll end this guide where the hype usually doesn't: with an honest accounting of what generative AI is still bad at. Not to dismiss it — these models are genuinely transformative — but because using a tool well means knowing its edges.
The encouraging part: by now, none of these limits should be mysterious. Every one of them follows from the mechanics we've already covered — next-token prediction, training on past data, the context window. The weaknesses and the strengths are two sides of the same design.
Strengths and weaknesses come from one place
It's tempting to imagine the flaws will simply be patched away in the next version. Some will shrink. But the core limitations are structural — they're the flip side of what makes the technology work at all.
| What makes it powerful | The same thing makes it... |
|---|---|
| Predicts plausible text | ...indifferent to whether text is true |
| Learns patterns from data | ...biased and frozen at its training cutoff |
| Generalises across tasks | ...unreliable on precise, rule-bound tasks |
| Samples for natural variety | ...inconsistent and non-repeatable |
Keep that table in mind. It reframes "limits" from "bugs to wait out" into "trade-offs to design around."
Limit 1: it doesn't guarantee truth
The deepest limit, covered in full in Why AI hallucinates: a model is optimised to produce likely text, not true text. It has no internal fact-checker and no sense of its own uncertainty.
That means confident wrongness is built in. It's not a defect of a particular model; it's a property of the objective. You can reduce it with grounding and tools, but you can't assume any unverified output is correct — especially on specifics, obscure topics, or recent events.
Limit 2: it imitates reasoning rather than doing it
Models can produce step-by-step "reasoning" and often land on the right answer. But under the hood, they're still continuing patterns of text that look like reasoning — not running a logic engine with guarantees.
This shows up clearly when problems leave the well-trodden path:
- Precise arithmetic on unfamiliar numbers — the token-level reader (see tokens) is shaky with exact digits.
- Novel multi-step logic that isn't similar to anything in training.
- Strict constraints — "exactly 7 words," "never use the letter E" — which it routinely violates.
- Counting and precise structure, for the same reasons.
Techniques like "think step by step" and tool use (handing math to a calculator) genuinely help. But the lesson stands: it's pattern-matched reasoning, powerful but not dependable on its own for rigorous logic.
Limit 3: it's frozen in time and blind to the world
A base model knows only what was in its training data, up to a knowledge cutoff (from What's inside a model). On its own it can't tell you today's news, prices, or weather — and it can't perceive anything. It has no eyes, no clock, no access to your files or the live internet.
Modern AI products paper over this with tools: web search, code execution, file access. That's the right fix — but it means the model isn't doing the knowing; the surrounding system is fetching it. Without those tools, the model is a brilliant, well-read mind locked in a room with no windows and an out-of-date newspaper.
| Without tools | With tools |
|---|---|
| Frozen at training cutoff | Can look up current info |
| Can't act in the world | Can run code, call APIs |
| Answers from memory | Answers from fetched sources |
| Higher hallucination risk | Grounded, more reliable |
Limit 4: it isn't consistent
Because models sample their output (the temperature idea from chapter one), the same prompt can produce different answers. Worse, small, meaningless changes to wording can shift results, and the order of information can change conclusions.
For creativity, that variability is a feature. For anything that needs to be reliable and repeatable — automated pipelines, consistent formatting, reproducible decisions — it's a real engineering challenge. You design around it with lower temperature, strict output formats, validation, and retries, not by assuming the model will behave identically twice.
Limit 5: the subtler human gaps
Beyond the mechanical limits are quieter ones worth naming:
- No genuine understanding or intent. It models patterns of language about the world, not the world itself. It doesn't want anything or mean what it says.
- Inherited bias. Trained on human text, it absorbs human biases, which can surface in its output.
- Shallow long-horizon coherence. Over very long or complex tasks, it can drift, lose the thread, or quietly contradict earlier parts.
- Sycophancy. Tuned to be agreeable, it often tells you what you seem to want to hear rather than pushing back.
- No accountability. It can't be responsible for a decision. A human always has to own the outcome.
How AI fails — at a glance
A rough sense of where reliability drops, drawn from everything above:
The shape is the takeaway: AI is strongest where there's no single right answer and the source material is in front of it, and weakest where it must be precise, current, or verifiably correct from memory.
Using AI well: work with the grain
Put it all together and a simple strategy emerges — play to the strengths, guard the weaknesses.
| Lean on AI for... | Keep humans / tools for... |
|---|---|
| First drafts and rewrites | Final accuracy and sign-off |
| Summarising and reformatting | Facts, numbers, citations |
| Brainstorming and options | Judgment and consequential decisions |
| Explaining and translating | Precise math and strict rules |
| Transforming text you provide | Anything legal, medical, or safety-critical |
The whole guide, in one view
You now have a complete, honest mental model of generative AI:
- It works by predicting the next token, over and over (chapter 1).
- It got here through the Transformer, the GPT recipe, scaling, and alignment (chapter 2).
- Inside, it's billions of parameters tuned by training — knowledge as numbers, not rules (chapter 3).
- It hallucinates because it optimises for plausible, not true (chapter 4).
- It reads in tokens and "forgets" because of the context window (chapter 5).
- And its limits are structural — the price of the same design that makes it powerful (this chapter).
That mental model is the real goal. With it, you can predict how these tools will behave, trust them where they're strong, check them where they're weak, and cut straight through the hype in either direction. No mystique, no doom — just a clear picture of what's actually happening under the hood.