Claude Opus 4.7 costs 20–30% more per session

While Anthropic claims a 20-30% increase in token usage for Claude Opus 4.7, real-world tests on technical documentation show spikes up to 47%. This trade-off aims for better instruction following but significantly impacts user budgets and rate limits.

Anthropic's Claude Opus 4.7 migration guide says the new tokenizer uses "roughly 1.0 to 1.35x as many tokens" as 4.6. I measured 1.47x on technical docs. 1.45x on a real CLAUDE.md file. The top of Anthropic's range is where most Claude Code content actually sits, not the middle.

Same sticker price. Same quota. More tokens per prompt. Your Max window burns through faster. Your cached prefix costs more per turn. Your rate limit hits sooner.

So Anthropic must be trading this for something. What? And is it worth it?

I ran two experiments. The first measured the cost. The second measured what Anthropic claimed you'd get back. Here's where it nets out.

What does it cost?

To measure the cost, I used POST /v1/messages/count_tokens — Anthropic's free, no-inference token counter. Same content, both models, one number each per model. The difference is purely the tokenizer.

Two batches of samples.

First: seven samples of real content a Claude Code user actually sends — a CLAUDE.md file, a user prompt, a blog post, a git log, terminal output, a stack trace, a code diff.

Second: twelve synthetic samples spanning content types — English prose, code, structured data, CJK, emoji, math symbols — to see how the ratio varies by kind.

Real-world Claude Code content

Weighted ratio across all seven: 1.325x (8,254 → 10,937 tokens).

Content-type baseline (12 synthetic samples)

| Content type | chars | 4.6 | 4.7 | ratio | |---|---|---|---|---| | Technical docs (English) | 2,541 | 478 | 704 | 1.47 | | Shell script | 2,632 | 1,033 | 1,436 | 1.39 | | TypeScript code | 4,418 | 1,208 | 1.640 | 1.35 | | Spanish prose | 2,529 | 733 | 986 | 1.34 | | Markdown with code blocks | 2,378 | 604 | 812 | 1.34 | | Python code | 3,182 | 864 | 1,112 | 1.29 | | English prose | 2,202 | 508 | 611 | 1.20 | | JSON (dense) | 48,067 | 13,939 | 15,706 | 1.13 | | Japanese/Chinese prose | ~900 | ~800 | ~810 | 1.01 |

What changed in the tokenizer

English and code moved 1.20–1.47x on natural content. Chars-per-token on English dropped from 4.33 to 3.60. TypeScript dropped from 3.66 to 2.69. The vocabulary is representing the same text in smaller pieces.

Smaller tokens force attention over individual words. That's a documented mechanism for tighter instruction following, character-level tasks, and tool-call precision.

Does 4.7 actually follow instructions better?

I ran a direct test using IFEval. The results showed a small but directionally consistent improvement on strict instruction following (85% to 90%). The extra tokens bought something measurable: +5pp on strict instruction-following. Small, but real.

Source: Hacker News