Chapter 81·Beginner·9 min read
Claude Fable 5 Capabilities: Benchmarks, Coding, Vision & Long-Horizon Work
What can Claude Fable 5 actually do? A plain-English tour of its capabilities — the Stripe 50-million-line migration, frontier coding benchmarks, vision that reads scientific figures, million-token focus, and the life-sciences results that surprised researchers.
July 10, 2026
The previous chapter established what Claude Fable 5 is: Anthropic's most capable widely released model, in a new tier above Opus. This chapter answers the obvious follow-up — capable of what, exactly?
Everything below comes from Anthropic's launch announcement and the customer evaluations it cites. As always with vendor benchmarks, treat exact rankings as claims — but the shape of the capabilities is consistent across independent testers.
The headline story: Stripe's migration
The most concrete capability claim comes from Stripe. Their engineering team pointed Fable 5 at a 50-million-line codebase migration — the kind of grinding, months-long project that large teams dread.
Fable 5 completed it in one day. Stripe's estimate for the same work done by an engineering team: two months. Their summary: "Fable 5 compressed months of engineering into days."
Coding and knowledge-work benchmarks
Three named evaluations from the launch materials:
| Evaluation | Who ran it | Result |
|---|---|---|
| FrontierCode | Cognition | "Fable 5 scores highest among frontier models" |
| Finance Benchmark | Hebbia | "The highest score of any model" |
| Trading analysis | IMC | Fable 5 "aced their trading-analysis evaluations" |
The pattern: it's not only a coding model. Finance analysis and trading evaluations are dense, judgment-heavy knowledge work — the same tier of difficulty as engineering, in a different domain.
Vision: reading, not just seeing
Earlier models could describe an image. Fable 5's vision is closer to working from an image:
- It can extract precise numbers from detailed scientific figures — reading values off dense charts the way an analyst would.
- It can rebuild a web app's source code from screenshots of the running app.
- On a hard visual benchmark, "Fable 5 beat FireRed with a minimal, vision-only harness" — where earlier models struggled even with helper tools bolted on. The scaffolding got simpler and the results got better at the same time.
Long-context focus
Fable 5's 1M-token context window isn't new territory on paper — recent Opus and Sonnet models have one too. What's new is how well it uses it. Anthropic's claim: "Fable 5 stays focused across millions of tokens in long-running tasks."
That word focused is the key. Big context windows have historically come with a catch — models lose the thread, forget early instructions, or drift as the session grows (we cover the mechanics in the context window chapter of the LLM guide). Fable 5's differentiator is holding attention across the whole horizon: hours of tool calls, millions of tokens, one coherent task.
Life sciences: the Mythos result
One result in the launch materials came from the unrestricted Mythos 5 rather than Fable 5, and it points at research rather than engineering: in blind comparisons, scientists preferred Mythos 5's novel molecular-biology hypotheses roughly 80% of the time over comparable models'.
Generating hypotheses researchers actually prefer is a step beyond summarising papers — it's the model participating in the creative end of science. It's also exactly the category of capability (biology, at the frontier) that explains why Fable 5's safety classifiers treat biology requests specially, as we'll see in the next chapter.
How to think about the jump
A useful mental model: Fable 5's gains are concentrated above what earlier models could do at all, rather than spread evenly over what they already did well.
| Task type | Opus-tier models | Fable 5 |
|---|---|---|
| Everyday chat, summaries, short code | Excellent — and cheaper | Excellent, but overkill |
| Hard single questions | Very good | Better, at higher cost |
| Multi-hour autonomous engineering | Hit-or-miss, needs babysitting | The design target |
| Million-token reasoning jobs | Degrades over the horizon | Stays focused |
That's also the honest guidance on cost: at $10/$50 per million tokens — double Opus 4.8 — Fable 5 earns its price on the top rows of that table, not the bottom ones.
Recap
- Stripe's 50M-line migration in one day (vs an estimated two team-months) is the defining demo: long-horizon autonomous engineering on production code.
- Fable 5 topped Cognition's FrontierCode, Hebbia's finance benchmark, and IMC's trading-analysis evaluations.
- Vision works at analyst level: precise numbers from scientific figures, apps rebuilt from screenshots, and benchmark wins with a vision-only harness.
- The long-context story is focus — staying coherent across millions of tokens, not just accepting them.
- Scientists preferred Mythos 5's molecular-biology hypotheses ~80% of the time — frontier capability in research, and exactly why the safety story matters.
Those capabilities are also the risk surface. Next: the guardrails Anthropic built around them. Continue to How Claude Fable 5's safety system works.