15.6 Cost Management

Course: Claude Code - Enterprise Development Section: Enterprise Deployment Video Length: 2-5 minutes Presenter: Daniel Treasure

Opening Hook

AI usage costs money — every token in and every token out. When you're running Claude Code across a team or in automation, those costs can add up quickly if you're not intentional about it. The good news is Claude Code gives you multiple levers to control spend without sacrificing productivity.

Key Talking Points

1. Understanding Token Economics

Every interaction has a cost: input tokens (your prompt + context) and output tokens (Claude's response)
Larger contexts = more expensive (CLAUDE.md, file contents, conversation history all count)
Different models have different price points: Haiku < Sonnet < Opus
Automation can amplify costs — a CI job running on every commit adds up

What to say: "The first step is understanding where your tokens go. It's not just your prompt — it's everything Claude reads to understand your question. File contents, CLAUDE.md, conversation history, tool outputs. A single 'review this PR' can consume thousands of tokens just in context."

What to show on screen: Show /usage to display token counts for a session. Highlight input vs output breakdown.

2. Model Selection Strategy

Use the right model for the job — don't default to the most powerful
Haiku: Fast, cheap. Great for simple queries, formatting, boilerplate generation
Sonnet: Balanced. Good for most development tasks, code review, analysis
Opus: Most capable, most expensive. Reserve for complex reasoning, architecture decisions
Set defaults per-project or per-agent to enforce this automatically

What to say: "The single biggest cost lever is model selection. If a subagent is just running a linter check, it doesn't need Opus. Set it to Haiku and save 90% on that operation. Reserve Opus for the tasks that actually need deep reasoning."

What to show on screen: Show setting model in an AGENT.md frontmatter: model: haiku

3. Budget Controls

--max-budget-usd flag caps spend per invocation
--max-turns limits agentic loops (prevents runaway tool use)
CLAUDE_CODE_MAX_OUTPUT_TOKENS caps response length (default 32000, max 64000)
Combine these for defense-in-depth cost control

What to say: "Always set a budget cap in automation. --max-budget-usd 1.00 means if something goes wrong — an infinite loop, a massive file read — it stops before it gets expensive. Think of it as a circuit breaker."

What to show on screen: claude -p "review PR" --max-budget-usd 1.00 --max-turns 5

4. Context Management

/compact reduces conversation history (saves tokens on long sessions)
Auto-compact setting handles this automatically
Be selective about what goes in CLAUDE.md — every line is read every turn
Use @file references strategically rather than including everything

What to say: "Your CLAUDE.md is read on every single turn. If it's 500 lines of project rules, that's 500 lines of tokens on every interaction. Keep it tight. Use imports for section-specific rules that only load when needed."

What to show on screen: Show a lean CLAUDE.md with imports vs a bloated one.

5. Team-Level Cost Governance

Track usage by user, team, and automation pipeline
Set quotas and rate limits at the organizational level
Review weekly usage reports to identify optimization opportunities
Allocate budgets per-team using API key separation or monitoring

What to say: "At the org level, you want visibility and limits. Know which teams are spending what, set reasonable budgets, and review weekly. The 15.5 monitoring setup feeds directly into this — you can't manage what you can't measure."

What to show on screen: A usage report or dashboard showing per-team token consumption.

Demo Plan

Show /usage — Display current session token consumption
~15 seconds
Demonstrate model selection — Run the same prompt on Haiku vs Sonnet, compare token costs bash claude -p "explain this function" --model haiku claude -p "explain this function" --model sonnet
~30 seconds
Show budget cap — Run with --max-budget-usd bash claude -p "review this codebase" --max-budget-usd 0.50 --max-turns 3
~20 seconds
Show /compact — Before and after context size
~15 seconds
Show agent model configuration — AGENT.md with model: haiku for a simple task
~15 seconds
Wrap up — Summarize the cost control stack
~15 seconds

Code Examples & Commands

Budget-capped automation:

claude -p "review changes" --max-budget-usd 1.00 --max-turns 5

Model selection in CLI:

claude --model haiku    # Fast, cheapest
claude --model sonnet   # Balanced (default)
claude --model opus     # Most capable, most expensive

Model in agent definition:

---
name: linter
description: Run style checks on code
model: haiku
tools:
  - Read
  - Grep
---
Check code style and report issues.

Token limit:

export CLAUDE_CODE_MAX_OUTPUT_TOKENS=16000  # Half the default

Compact context:

/compact

Auto-compact in settings.json:

{
  "autoCompact": true
}

Cost tracking in JSON output:

claude -p "analyze code" --output-format json | jq '.usage'
# Returns: { "input_tokens": 1234, "output_tokens": 567 }

Fallback model for overload:

claude --model sonnet --fallback-model haiku

Gotchas & Tips

--max-budget-usd is per-invocation, not per-session. If you call Claude 10 times with a $1 cap, that's up to $10 total. For session-wide caps, use monitoring alerts.
Auto-compact is your friend. Enable it in settings to automatically compress context when it gets large. This saves tokens on every subsequent turn in a long session.
CLAUDE.md imports save tokens. Instead of a 500-line CLAUDE.md, use imports: @rules/frontend-rules.md only loads when the context requires it. Glob patterns help too: @rules/*.md.
Haiku is surprisingly capable. Many developers default to Sonnet for everything. Try Haiku first for formatting, boilerplate, and simple analysis — you may be surprised how well it works at a fraction of the cost.
Monitor automation especially closely. A CI job that runs on every push to every branch can generate significant costs. Use --max-turns 1 for simple analysis tasks and --max-budget-usd for everything automated.
/usage shows weekly reset times. Usage limits reset on a weekly cycle. Check /usage to see when your next reset is if you're approaching limits.

Lead-out

"Cost management is about being intentional, not restrictive. Use the right model for the job, set budget guardrails, keep your context lean, and monitor what's happening. That wraps up Enterprise Deployment — next we're moving into Advanced MCP Integrations where we'll connect Claude Code to databases, error monitoring, and more."

Reference URLs

Prep Reading

Review Claude API pricing page for current token costs per model
Check /usage output format and what metrics it shows
Test --max-budget-usd to see how it reports when budget is reached
Review auto-compact behavior to demo before/after context size

Notes for Daniel: This is a practical, money-saving video. Enterprise viewers care deeply about cost control. Lead with the model selection strategy since that's the biggest lever — most teams are probably using Sonnet for everything when Haiku would work for half their tasks. The budget cap demo should show what happens when the cap is hit (Claude reports it reached the limit). Keep the tone pragmatic, not preachy.

Quick Reference