Code Safari

Chapter 81·Beginner·9 min read

Claude Fable 5 Capabilities: Benchmarks, Coding, Vision & Long-Horizon Work

What can Claude Fable 5 actually do? A plain-English tour of its capabilities — the Stripe 50-million-line migration, frontier coding benchmarks, vision that reads scientific figures, million-token focus, and the life-sciences results that surprised researchers.

July 10, 2026

The previous chapter established what Claude Fable 5 is: Anthropic's most capable widely released model, in a new tier above Opus. This chapter answers the obvious follow-up — capable of what, exactly?

Everything below comes from Anthropic's launch announcement and the customer evaluations it cites. As always with vendor benchmarks, treat exact rankings as claims — but the shape of the capabilities is consistent across independent testers.

The headline story: Stripe's migration

The most concrete capability claim comes from Stripe. Their engineering team pointed Fable 5 at a 50-million-line codebase migration — the kind of grinding, months-long project that large teams dread.

Fable 5 completed it in one day. Stripe's estimate for the same work done by an engineering team: two months. Their summary: "Fable 5 compressed months of engineering into days."

Coding and knowledge-work benchmarks

Three named evaluations from the launch materials:

EvaluationWho ran itResult
FrontierCodeCognition"Fable 5 scores highest among frontier models"
Finance BenchmarkHebbia"The highest score of any model"
Trading analysisIMCFable 5 "aced their trading-analysis evaluations"

The pattern: it's not only a coding model. Finance analysis and trading evaluations are dense, judgment-heavy knowledge work — the same tier of difficulty as engineering, in a different domain.

Vision: reading, not just seeing

Earlier models could describe an image. Fable 5's vision is closer to working from an image:

  • It can extract precise numbers from detailed scientific figures — reading values off dense charts the way an analyst would.
  • It can rebuild a web app's source code from screenshots of the running app.
  • On a hard visual benchmark, "Fable 5 beat FireRed with a minimal, vision-only harness" — where earlier models struggled even with helper tools bolted on. The scaffolding got simpler and the results got better at the same time.

Long-context focus

Fable 5's 1M-token context window isn't new territory on paper — recent Opus and Sonnet models have one too. What's new is how well it uses it. Anthropic's claim: "Fable 5 stays focused across millions of tokens in long-running tasks."

That word focused is the key. Big context windows have historically come with a catch — models lose the thread, forget early instructions, or drift as the session grows (we cover the mechanics in the context window chapter of the LLM guide). Fable 5's differentiator is holding attention across the whole horizon: hours of tool calls, millions of tokens, one coherent task.

One well-specified goal
Plan · act · verify, repeated
Self-correct along the way
Deliver a finished result
The long-horizon loop Fable 5 is built for — sustained over hours, not minutes.

Life sciences: the Mythos result

One result in the launch materials came from the unrestricted Mythos 5 rather than Fable 5, and it points at research rather than engineering: in blind comparisons, scientists preferred Mythos 5's novel molecular-biology hypotheses roughly 80% of the time over comparable models'.

Generating hypotheses researchers actually prefer is a step beyond summarising papers — it's the model participating in the creative end of science. It's also exactly the category of capability (biology, at the frontier) that explains why Fable 5's safety classifiers treat biology requests specially, as we'll see in the next chapter.

How to think about the jump

A useful mental model: Fable 5's gains are concentrated above what earlier models could do at all, rather than spread evenly over what they already did well.

Task typeOpus-tier modelsFable 5
Everyday chat, summaries, short codeExcellent — and cheaperExcellent, but overkill
Hard single questionsVery goodBetter, at higher cost
Multi-hour autonomous engineeringHit-or-miss, needs babysittingThe design target
Million-token reasoning jobsDegrades over the horizonStays focused

That's also the honest guidance on cost: at $10/$50 per million tokens — double Opus 4.8 — Fable 5 earns its price on the top rows of that table, not the bottom ones.

Recap

  • Stripe's 50M-line migration in one day (vs an estimated two team-months) is the defining demo: long-horizon autonomous engineering on production code.
  • Fable 5 topped Cognition's FrontierCode, Hebbia's finance benchmark, and IMC's trading-analysis evaluations.
  • Vision works at analyst level: precise numbers from scientific figures, apps rebuilt from screenshots, and benchmark wins with a vision-only harness.
  • The long-context story is focusstaying coherent across millions of tokens, not just accepting them.
  • Scientists preferred Mythos 5's molecular-biology hypotheses ~80% of the time — frontier capability in research, and exactly why the safety story matters.

Those capabilities are also the risk surface. Next: the guardrails Anthropic built around them. Continue to How Claude Fable 5's safety system works.

Claude Fable 5 Capabilities: Benchmarks, Coding, Vision & Long-Horizon Work | Code Safari