Morten Laske AI × Business Central
← Writing

Claude Fable 5: What the New Top Tier Changes for Your Code

Claude Fable 5: What the New Top Tier Changes for Your Code

A new top model dropped and your team is asking the usual two questions: do we switch, and will our code even run on it. The second one bites first — a request that runs fine on Opus 4.8 returns a 400 on Fable 5, not because of anything exotic, but because you explicitly turned thinking off.

Claude Fable 5 shipped on June 9 as the first model of Anthropic’s Mythos class — a tier that now sits above Opus, the way Opus sits above Sonnet. That framing matters more than any single benchmark number, because it tells you how to route it: not as “the new default,” but as a tier you reach for deliberately.

CAPABILITY one call multi-step task hours of autonomy Fable 5 · $10/$50 Opus 4.8 · $5/$25 ≈ interchangeable here the gap you pay 2× for
Haiku 4.5$1 / $5
classify · route
Sonnet 4.6$3 / $15
production default
Opus 4.8$5 / $25
hard single tasks
Fable 5$10 / $50
long-horizon agents
The lead grows with task length, not task difficulty. On a single call the top tiers are nearly interchangeable; over a long autonomous run the curves diverge — that widening band is what Fable 5's 2× price buys, and the only place it pays for itself.

A tier above Opus, not a new Opus

The shape of the model is familiar: 1M-token context window, 128K max output, adaptive thinking, the full effort range up to max. The price is not: $10 input / $50 output per million tokens — exactly double Opus 4.8’s $5/$25. The model ID is claude-fable-5, no date suffix.

ModelInput $/MTokOutput $/MTokContextMax output
Haiku 4.51.005.00200K64K
Sonnet 4.63.0015.001M64K
Opus 4.85.0025.001M128K
Fable 510.0050.001M128K

Doubling the price for the same context window and output ceiling only makes sense if the model buys you something the cheaper tiers can’t. So what does it buy?

Read the benchmarks by task length, not by score

The published numbers all point in one direction. On SWE-bench Pro, Fable 5 scores 80.3% against Opus 4.8’s 69.2% — a solid 11-point lead. On the much harder FrontierCode Diamond split, the lead stops being incremental: 29.3% vs 13.4%, more than double. And it’s the first model to break 90% on Anthropic’s benchmark for long-running analytical tasks, a ten-point jump over Opus.

Notice the pattern: the gap is smallest on bounded, single-shot work and widest on the longest, most autonomous tasks. That’s not “10% smarter across the board.” It’s a model that holds a plan together over horizons where the previous tier starts to drift — multi-hour agent runs, large migrations, deep analytical work that spans hundreds of tool calls.

Which gives you the routing rule directly: the value of Fable 5 scales with the length of the task you hand it. For a classification call or a one-file fix, Opus 4.8 — or Sonnet 4.6 — gets you essentially the same answer at half the price or less. For an overnight refactor that has to still be coherent at step 400, the 2× per token can come back as fewer wasted turns, fewer wrong branches, and less human correction at the end.

The API keeps getting stricter — on purpose

Fable 5 keeps the request surface of Opus 4.7/4.8, which already removed temperature, top_p, top_k and the fixed thinking budget (budget_tokens) — all of those return a 400. Assistant-turn prefills are gone too. Fable 5 adds exactly one new rejection, and it’s the one that will catch real code:

# Ran fine on Opus 4.8 — returns 400 on Fable 5
client.messages.create(
    model="claude-fable-5",
    max_tokens=16000,
    thinking={"type": "disabled"},   # explicit disable is rejected
    messages=[...],
)

On Opus 4.8, thinking: {"type": "disabled"} and omitting the parameter are equivalent. On Fable 5 the explicit disable is invalid — if you don’t want thinking, you say nothing:

# Fable 5: omit thinking entirely, or opt into adaptive and steer with effort
client.messages.create(
    model="claude-fable-5",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    output_config={"effort": "high"},   # low | medium | high | xhigh | max
    messages=[...],
)

If your codebase builds requests from a shared config object, this is the line to grep for: any code path that writes thinking: disabled as an explicit default will pass on every current model and fail the day someone flips the model string to claude-fable-5.

There’s a direction in this, and it’s worth internalizing rather than patching around. Every release in this family has traded mechanism knobs for intent knobs. You no longer say “sample at temperature 0.3 with an 8,192-token thinking budget” — you say “this task deserves xhigh effort” and let the model decide how to spend it. Code that survives model upgrades is code that expresses intent (effort, adaptive thinking, structured output schemas) instead of mechanism (sampling parameters, token budgets, prefilled assistant turns).

Two smaller notes for the migration checklist: thinking text is omitted from responses by default (set thinking.display: "summarized" if you surface reasoning to users), and the minimum cacheable prompt prefix is 2,048 tokens — half of Opus 4.8’s 4,096, so some mid-sized system prompts that silently failed to cache on Opus will cache on Fable 5.

The speed you feel on every turn

Capability is only half of what you experience at the keyboard. The other half is latency, and here the published numbers say something blunt: Fable 5 is not a model you want in an interactive loop. Once it starts streaming, throughput is unremarkable — roughly 60 tokens/second, the same ballpark as Opus 4.8. The difference is in time-to-first-token. A typical Opus 4.8 coding turn comes back in 3–15 seconds. The same turn on Fable 5 can take 60 seconds to several minutes, because before it emits a single token it reasons, runs tools, and checks its own output. That’s not a defect — it’s the entire design intent: hand it a complete task and walk away.

Same 4-turn session — identical output, very different wall-clock Opus 4.8 Fable 5 wait — time-to-first-token output stream (~60 t/s, both)
Same work, very different wait. Both models stream at roughly the same speed once they start — the gap is time-to-first-token. Opus answers a coding turn in 3–15s; Fable 5 reasons, runs tools and checks itself for 60s+ before the first token. In an interactive session you pay that wait on every turn, which is exactly why the lower tier feels better day-to-day. Walk away from a multi-hour run and the same wait amortises to nothing.

Which is exactly why your first instinct is right: for everyday development, Opus 4.8 (often Sonnet 4.6) simply feels better. Not despite being the lower tier — because everyday development is interactive by definition, and interactive work is the one regime where Fable 5’s strength is dead weight. You’re in the loop: you read a diff, you nudge, you re-run. Every one of those turns pays the first-token tax, and you pay it with your own attention. A minute of upfront reasoning is invisible on an overnight migration and intolerable on the fourth round of a debugging session.

So this isn’t a separate consideration from the capability curve — it’s the same divergence felt from the other side. The left of that curve is where you’re in the loop, latency dominates, and the capability gap is near zero, so the fast model wins outright. The right is where you’ve walked away, the wait amortises over hours, and coherence is all that’s left to optimise. The honest read after a week: Fable 5 didn’t make Opus 4.8 obsolete for daily work. It made it the correct default and carved out a new lane above it for the runs you don’t sit through.

Where the 2× is cheap, and where it’s a tax

Per-token price is the wrong unit for agentic work; price per completed task is the right one. A model that finishes a long migration in one coherent run, instead of two runs plus an afternoon of human cleanup, is allowed to cost double per token and still be the cheaper option. That’s the honest case for Fable 5, and it’s also its limit: nothing in the published numbers justifies routing your default traffic to it.

My routing as of this week:

  • Haiku 4.5 — classification, extraction, routing decisions. Anything with a rubric.
  • Sonnet 4.6 — the production default. Most features should still land here.
  • Opus 4.8 — hard interactive work: tricky debugging, design discussions, single complex tasks.
  • Fable 5 — long-horizon autonomous runs: overnight refactors, large codebase migrations, research tasks that chain hundreds of tool calls. Hand it the full task spec up front and a high effort level; its strength is exactly the part where you’re not in the loop.

If switching models is more work for you than editing one config value, that’s the actual problem to fix first — it’s the whole argument of route, don’t call. A new tier appearing above your current top model is precisely the event a gateway exists for: you add one route for the long-horizon workloads and touch zero call sites.

The mental model

Stop asking “is the new model better” — at the top of the range, the answer is always yes and always useless. Ask two questions instead. How long is the task horizon? That decides whether Fable 5’s premium is an investment or a tax, because its lead grows with task length, not task difficulty — and the minute of latency it spends before the first token only amortises once you’ve stepped out of the loop. And does my code express intent or mechanism? Every release of this family removes another mechanism knob, and Fable 5’s rejection of an explicit thinking: disabled is just the latest installment. Write requests that state what the task deserves and let the model allocate — that code upgrades with a one-line model swap. Code that micromanages the sampler gets a 400 with every new tier.

Found this useful? Share on LinkedIn · email me a correction or follow-up.

Related