15.6 Cost Management
Course: Claude Code - Enterprise Development Section: Enterprise Deployment Video Length: 2-5 minutes Presenter: Daniel Treasure
Opening Hook
AI usage costs money — every token in and every token out. When you're running Claude Code across a team or in automation, those costs can add up quickly if you're not intentional about it. The good news is Claude Code gives you multiple levers to control spend without sacrificing productivity.
Key Talking Points
1. Understanding Token Economics
- Every interaction has a cost: input tokens (your prompt + context) and output tokens (Claude's response)
- Larger contexts = more expensive (CLAUDE.md, file contents, conversation history all count)
- Different models have different price points: Haiku < Sonnet < Opus
- Automation can amplify costs — a CI job running on every commit adds up
What to say: "The first step is understanding where your tokens go. It's not just your prompt — it's everything Claude reads to understand your question. File contents, CLAUDE.md, conversation history, tool outputs. A single 'review this PR' can consume thousands of tokens just in context."
What to show on screen: Show /usage to display token counts for a session. Highlight input vs output breakdown.
2. Model Selection Strategy
- Use the right model for the job — don't default to the most powerful
- Haiku: Fast, cheap. Great for simple queries, formatting, boilerplate generation
- Sonnet: Balanced. Good for most development tasks, code review, analysis
- Opus: Most capable, most expensive. Reserve for complex reasoning, architecture decisions
- Set defaults per-project or per-agent to enforce this automatically
What to say: "The single biggest cost lever is model selection. If a subagent is just running a linter check, it doesn't need Opus. Set it to Haiku and save 90% on that operation. Reserve Opus for the tasks that actually need deep reasoning."
What to show on screen: Show setting model in an AGENT.md frontmatter: model: haiku
3. Budget Controls
--max-budget-usdflag caps spend per invocation--max-turnslimits agentic loops (prevents runaway tool use)CLAUDE_CODE_MAX_OUTPUT_TOKENScaps response length (default 32000, max 64000)- Combine these for defense-in-depth cost control
What to say: "Always set a budget cap in automation. --max-budget-usd 1.00 means if something goes wrong — an infinite loop, a massive file read — it stops before it gets expensive. Think of it as a circuit breaker."
What to show on screen: claude -p "review PR" --max-budget-usd 1.00 --max-turns 5
4. Context Management
/compactreduces conversation history (saves tokens on long sessions)- Auto-compact setting handles this automatically
- Be selective about what goes in CLAUDE.md — every line is read every turn
- Use
@filereferences strategically rather than including everything
What to say: "Your CLAUDE.md is read on every single turn. If it's 500 lines of project rules, that's 500 lines of tokens on every interaction. Keep it tight. Use imports for section-specific rules that only load when needed."
What to show on screen: Show a lean CLAUDE.md with imports vs a bloated one.
5. Team-Level Cost Governance
- Track usage by user, team, and automation pipeline
- Set quotas and rate limits at the organizational level
- Review weekly usage reports to identify optimization opportunities
- Allocate budgets per-team using API key separation or monitoring
What to say: "At the org level, you want visibility and limits. Know which teams are spending what, set reasonable budgets, and review weekly. The 15.5 monitoring setup feeds directly into this — you can't manage what you can't measure."
What to show on screen: A usage report or dashboard showing per-team token consumption.
Demo Plan
- Show
/usage— Display current session token consumption -
~15 seconds
-
Demonstrate model selection — Run the same prompt on Haiku vs Sonnet, compare token costs
bash claude -p "explain this function" --model haiku claude -p "explain this function" --model sonnet -
~30 seconds
-
Show budget cap — Run with
--max-budget-usdbash claude -p "review this codebase" --max-budget-usd 0.50 --max-turns 3 -
~20 seconds
-
Show
/compact— Before and after context size -
~15 seconds
-
Show agent model configuration — AGENT.md with
model: haikufor a simple task -
~15 seconds
-
Wrap up — Summarize the cost control stack
- ~15 seconds
Code Examples & Commands
Budget-capped automation:
claude -p "review changes" --max-budget-usd 1.00 --max-turns 5
Model selection in CLI:
claude --model haiku # Fast, cheapest
claude --model sonnet # Balanced (default)
claude --model opus # Most capable, most expensive
Model in agent definition:
---
name: linter
description: Run style checks on code
model: haiku
tools:
- Read
- Grep
---
Check code style and report issues.
Token limit:
export CLAUDE_CODE_MAX_OUTPUT_TOKENS=16000 # Half the default
Compact context:
/compact
Auto-compact in settings.json:
{
"autoCompact": true
}
Cost tracking in JSON output:
claude -p "analyze code" --output-format json | jq '.usage'
# Returns: { "input_tokens": 1234, "output_tokens": 567 }
Fallback model for overload:
claude --model sonnet --fallback-model haiku
Gotchas & Tips
-
--max-budget-usdis per-invocation, not per-session. If you call Claude 10 times with a $1 cap, that's up to $10 total. For session-wide caps, use monitoring alerts. -
Auto-compact is your friend. Enable it in settings to automatically compress context when it gets large. This saves tokens on every subsequent turn in a long session.
-
CLAUDE.md imports save tokens. Instead of a 500-line CLAUDE.md, use imports:
@rules/frontend-rules.mdonly loads when the context requires it. Glob patterns help too:@rules/*.md. -
Haiku is surprisingly capable. Many developers default to Sonnet for everything. Try Haiku first for formatting, boilerplate, and simple analysis — you may be surprised how well it works at a fraction of the cost.
-
Monitor automation especially closely. A CI job that runs on every push to every branch can generate significant costs. Use
--max-turns 1for simple analysis tasks and--max-budget-usdfor everything automated. -
/usageshows weekly reset times. Usage limits reset on a weekly cycle. Check/usageto see when your next reset is if you're approaching limits.
Lead-out
"Cost management is about being intentional, not restrictive. Use the right model for the job, set budget guardrails, keep your context lean, and monitor what's happening. That wraps up Enterprise Deployment — next we're moving into Advanced MCP Integrations where we'll connect Claude Code to databases, error monitoring, and more."
Reference URLs
Prep Reading
- Review Claude API pricing page for current token costs per model
- Check
/usageoutput format and what metrics it shows - Test
--max-budget-usdto see how it reports when budget is reached - Review auto-compact behavior to demo before/after context size
Notes for Daniel: This is a practical, money-saving video. Enterprise viewers care deeply about cost control. Lead with the model selection strategy since that's the biggest lever — most teams are probably using Sonnet for everything when Haiku would work for half their tasks. The budget cap demo should show what happens when the cap is hit (Claude reports it reached the limit). Keep the tone pragmatic, not preachy.