Save Tokens
The biggest levers for cheaper, faster sessions.
The biggest levers for cheaper, faster sessions.
What it is
Claude Code burns tokens on every message, every tool call, every file read. The more context loaded, the more expensive AND slower each response gets. These are the levers that actually move the number โ no micro-optimizations, just the stuff that makes sessions measurably cheaper.
Keep CLAUDE.md small
It ships with every single message. Stay under 100 lines. Only rules that apply to every interaction belong here. Project-specific stuff goes in a project CLAUDE.md inside the project folder.
One CLAUDE.md per project
Claude Code auto-loads the CLAUDE.md from the current directory. That way Claude only gets the context that's actually relevant โ not everything at once.
Structure your memory system
Memory files do NOT load automatically โ only MEMORY.md as the index. Keep the index short (under 200 lines), detail files separate. No duplicates. Clean it out regularly.
Use /compact and /clear often
/compactโ compresses your context so far. Rule of thumb: run it after every 30+ messages./clearโ fresh start. Use it when switching topics.
Subagents instead of main context
Every piece of research, exploration, or parallel analysis goes into a subagent. The subagent has its own context and only returns the result. Your main window stays clean.
Plan Mode before code
A short plan up front saves 3โ5x the tokens the plan itself costs. Claude without a plan writes, corrects, rewrites โ all tokens.
Grep before Read
Never load an entire file when you need one specific spot. Grep first, then read with offset/limit. On a 500-line file you save 90%.
Sonnet vs Opus vs Haiku, on purpose
- Haiku โ mini tasks, quick summaries.
- Sonnet โ 90% of everything. Code, edits, research.
- Opus โ only for deep architecture calls, nasty bugs, or when Sonnet fails multiple times.
Switch with /model in chat.
Pro tip
The biggest lever usually isn't model choice โ it's context hygiene: CLAUDE.md under 100 lines, subagents instead of everything-in-main, /compact before it gets sluggish. Skip that and you pay 3โ5x more, no matter which model you pick.
๐ Inside the community I share my full setup including prompt strategies that save tokens without hurting output. โ Join the Claude Code Mastery Community
Updated regularly โ follow @vine.codes for more.
Want more?