demarily.dev

Your CLAUDE.md Is Costing You Thousands of Tokens Per Turn

Your CLAUDE.md Is Costing You Thousands of Tokens Per Turn

I hit the wall. Conversations dying mid-feature, responses slowing down, costs climbing. I was getting great work out of Claude Code — but I was burning through tokens like I had infinite context.

Turns out, most of the damage was self-inflicted.

The hidden tax you're paying every turn

Claude Code loads your CLAUDE.md files on every single turn. Not once per session — every turn. The loading chain walks from your working directory up to ~/.claude/CLAUDE.md, concatenating every CLAUDE.md it finds along the way.

I ran the numbers on mine:

Source Size ~Tokens Loaded when?
Project CLAUDE.md 18.7 KB ~6,000 Every turn
Parent dir CLAUDE.md 2.9 KB ~1,000 Every turn
Global CLAUDE.md 4.7 KB ~1,500 Every turn
System prompt + tools ~20 KB ~6,000 Every turn
Fixed floor ~14,500 Before you say a word

That's ~14,500 tokens of overhead before Claude even reads your message. Over a 50-turn conversation, that's ~725K tokens just on context that never changes.

My project CLAUDE.md alone — 18.7KB of stack descriptions, command references, code organization maps, and domain conventions — was eating 6,000 tokens per turn. Most of that information was only relevant 10% of the time.

The fix: slim router + on-demand references

The core insight is simple: not everything in CLAUDE.md needs to be there.

CLAUDE.md should contain only what Claude needs to know on every single turn — the stuff that prevents catastrophic mistakes. Everything else should live in files Claude reads on demand when the task requires it.

What stays in CLAUDE.md

What moves out

I moved all of this into per-component CLAUDE.md files that Claude reads once at the start of a task — not 50 times per conversation.

Result: 18.7KB → 2.6KB. An 86% reduction in per-turn overhead.

The memory file trap

Claude Code's auto-memory system is great for preserving context across sessions. But memory files accumulate. Mine had grown to 78KB across 13 files — and several of them were 10-12KB each, packed with full code examples and war stories.

The audit:

  1. Delete duplicates. Two of my memory files contained information already in CLAUDE.md. Gone.
  2. Condense to rules. A 12KB file about framework gotchas became 2KB of key rules without the code examples and commit-by-commit narratives. The rule is what matters, not the story of how you learned it.
  3. Move reference material. Detailed API endpoint maps and algorithm documentation belong in project docs, not memory. Memory should be a pointer: "the algorithm is documented at path/to/file."

Result: 78KB → 21KB across 12 files. 73% reduction.

Workflow discipline

Beyond the files, three habits were burning tokens:

Oversized plan documents. One of my implementation plans was 54KB — a small novel. Plans should be 15KB max: task list, key decisions, edge cases. Reference existing patterns by name instead of inlining code.

Reading entire files. Claude Code's Read tool accepts offset and limit parameters. If you need 20 lines from a 500-line file, grep for the line range first, then read just that section. Don't dump entire files into context.

Unnecessary sub-agents. Every sub-agent starts with a fresh context window and gets its own full briefing. For tasks that take fewer than 5 tool calls, just do them in the main context.

How to audit yours

Want to check your overhead? Look at your CLAUDE.md file sizes:

ls -lh ~/.claude/CLAUDE.md ./CLAUDE.md

If your project CLAUDE.md is over 5KB, you're carrying dead weight on every turn. Rough math: 1KB ≈ 300 tokens, multiplied by every turn in your conversation.

The results

Source Before After Per-session savings
CLAUDE.md (×50 turns) ~300K tokens ~40K tokens 260K
Memory files ~25K tokens ~7K tokens 18K
Leaner plans + reads Variable ~30% less ~40K

Conversations last longer before hitting context limits. Responses come back faster. Costs go down. And the quality of work hasn't changed — Claude still has access to all the same information, it just loads it when it needs it instead of carrying it every turn.

Thirty minutes of auditing. No code changes. No feature loss. Just less waste.