Opus 4.7 Survival

Claude Code has just been 35% more expensive since Opus 4.7 dropped. Same prompt, same answer — just more tokens. Here's why and five settings that kill it fast.

What happened

Anthropic shipped Opus 4.7. Sounds like a normal upgrade, but they rolled out a new tokenizer with it. Result: same prompt as last week, same answer — you just burn 35% more tokens.

The limit was already a problem. A lot of us — me included — grabbed the $200 plan just to keep working. Now it's worse. If Claude Code feels off right now, that's probably why.

Setting 1 — Haiku for the small stuff

Claude Code runs the most expensive model by default — even for tasks where Haiku would crush it: reading files, small edits, grep passes, research.

The trick: let Claude decide when Haiku is enough. You need a rule in your CLAUDE.md that tells Claude: "Flag simple tasks so I can switch to Haiku first."

From that moment on, every small task starts with: "Small task — switch to /model haiku before I start." You run the command, the task runs on Haiku, then /model opus to switch back.

Option 1 — add it to CLAUDE.md manually:

Add this to your CLAUDE.md (project root or global in ~/.claude/CLAUDE.md):

## Model switching
If you classify a task as simple (reading files, grep, small edits, research without architectural impact), tell me BEFORE you start: "Small task — switch to Haiku with /model haiku."

After the task, remind me: "Back to Opus with /model opus."

For complex tasks (architecture, debug, multi-file refactor) stay on Opus and start directly.

Option 2 — prompt Claude Code (it adds the rule for you):

Add a rule to my CLAUDE.md: if you classify a task as simple (reading files, grep, small edits, research without architectural impact), tell me FIRST so I switch to Haiku with /model haiku. After the task, remind me to switch back to Opus. For complex tasks you stay on Opus.

Savings: ~15% — if you actually follow through on the switches.

Setting 2 — Kill unused MCPs

Every MCP you have installed ships its entire tool schema with EVERY message. Even if you never call it in the session. That's 5–10k tokens of baseline load per message easy — Playwright, Figma, Notion, Supabase add up fast.

Option 1 — Terminal:

/mcp

Shows every active MCP server. Anything you don't need right now — disable it.

Option 2 — Prompt Claude Code:

Show me every active MCP server in my current project (/mcp) and tell me which ones I don't need for my current workflow. Then help me disable the ones I don't need.

Pro tip: When you install new MCPs, use --scope project instead of --scope user — they only load in the project where you actually need them, not everywhere globally.

Savings: ~10-15%

Setting 3 — Cap MAX_THINKING_TOKENS

The most hidden cost sink. Default is unlimited. Opus can burn 50,000+ thinking tokens on a single answer — you never see them, but you pay full price.

Option 1 — Prompt Claude Code:

Cap my thinking token budget at 8000. Set MAX_THINKING_TOKENS in my .claude/settings.json under env — if the file doesn't exist yet, create it. Then verify it was applied correctly.

Option 2 — Manually:

Add to .claude/settings.json:

{
  "env": {
    "MAX_THINKING_TOKENS": "8000"
  }
}

8,000 tokens covers most tasks. For heavy architecture calls you can bump it higher one-off.

Savings: ~20-30% on complex tasks

Setting 4 — /model opusplan

The best quality/cost compromise. Opus plans (expensive, but good), Sonnet executes (cheap).

Option 1 — Terminal:

/model opusplan

From here on: Plan Mode runs on Opus — where the real architecture calls happen. The second it moves to implementation, Claude switches to Sonnet automatically.

Option 2 — Prompt Claude Code:

Turn on /model opusplan for me. Explain briefly when Opus runs and when Sonnet takes over so I know what it means for cost and output quality.

You only pay Opus pricing for the planning. The entire implementation — generating code, editing files, writing tests — runs on Sonnet.

Savings: up to 68% on planning-heavy tasks

Setting 5 — /effort per task

Default is high thinking budget for everything. For "refactor this one function" that's total overkill. Most users leave it at default (max) — and pay 3x more than they need to for simple tasks.

Option 1 — Terminal:

/effort minimal    # single files, simple edits
/effort medium     # multiple files, logic changes
/effort max        # architecture, debug, design

Option 2 — Prompt Claude Code:

Adjust the /effort level to my task automatically: minimal for small edits, medium for multi-file changes, max only when I say architecture or debug. Ask me if you're not sure which level fits.

A conscious switch before every bigger action saves massive amounts.

Savings: ~20-30%

What it adds up to

Setting	Savings
Haiku for small stuff	15%
Kill unused MCPs	10-15%
MAX_THINKING_TOKENS	20-30%
opusplan instead of pure Opus	up to 68%
/effort on purpose	20-30%

Stacked, you realistically land at 40-70% fewer tokens — without losing quality on the tasks that matter.

💸 These are the settings I've been running since Opus 4.7. If you want to go deeper on token optimization — inside the community I share which MCPs are actually worth it, how I build CLAUDE.md rules that force Claude to be concise, and which hook setups keep bash output tiny automatically. → Join the Claude Code Mastery Community