Video 23.3: Cost Management for Multi-Agent

Course: Claude Code - Parallel Agent Development (Course 4) Section: 23: Orchestration and Best Practices Video Length: 4–5 minutes Presenter: Daniel Treasure

Opening Hook

Here's the uncomfortable truth: multi-agent teams can cost 4–15 times more than a single agent. Each agent has their own context window, their own tokens, their own thinking time. Before you spin up 10 parallel agents, you need to understand the cost curve and know how to keep it in control. Today, we're learning cost-aware parallelization.

Key Talking Points

What to say:

Token Scaling Reality - Single agent on a large project: ~1M tokens (depending on codebase size and interactions). - Four-agent team on same project: 4–6M tokens (not 4x, because of summarization overhead, but still much more). - Eight-agent team: 8–15M tokens (coordination cost gets expensive). - Rule of thumb: multiply the single-agent cost by the number of agents, then add 10–20% for coordination.

When Multi-Agent Justifies the Cost - Speed gain must be worth the cost multiplication. If you save 2 hours wall-clock time but spend 4x more money, that's only worth it if time is extremely valuable. - Good use cases: - High-value projects (time-sensitive, expensive delays). - Large codebases where parallelization truly cuts wall-clock time in half. - Teams with many independent, well-scoped tasks (less coordination overhead). - Bad use cases: - Simple projects (single agent finishes in 1 hour; multi-agent costs more and finishes in 30 minutes). - Highly interdependent tasks (agents wait on each other; cost multiplies but time doesn't improve).

Budget Caps: --max-budget-usd - Claude Code supports a budget flag: claude-code --team backend-team --max-budget-usd 50 - This caps the session at $50 USD. Any agent in the team can spend up to that budget. - Once you hit the budget, agents stop working (no new tokens). - Useful for: experimentation, preventing runaway costs, controlling R&D spend.

Model Selection Strategy - Haiku (cheaper): 3.5x cheaper than Opus. Use for simple, well-scoped tasks (unit tests, docs, refactoring a single function). - Sonnet (balanced): 2x cheaper than Opus. Use for medium-complexity tasks (implement a feature, write a module). - Opus (expensive but powerful): Use for complex reasoning, architectural decisions, or tasks that need multiple rounds of thinking. - For multi-agent: Use Haiku + Sonnet agents for most tasks; reserve Opus for the orchestrator or the most complex agent.

Subagents vs. Agent Teams Cost Comparison - Subagent (within an agent's context): No extra tokens; the parent agent does the thinking for the subagent. - Agent Team: Each agent has their own context window and token budget; coordination overhead. - Cost comparison: - Subagents: Cheaper for simple delegation (1–2 subagent calls). - Agent Teams: Worth it only if you need true parallelization (wall-clock time matters).

Monitoring Approaches - OpenTelemetry: Claude Code can emit token usage and cost metrics. - Usage tracking: Keep logs of token count per agent, per task. - Cost dashboards: If you're running many projects, aggregate costs to see where money is going.

What to show on screen:

Token cost calculator (spreadsheet or simple script)
Show single-agent baseline cost.
Show 4-agent cost (multiply by 4, add 15%).
Show 8-agent cost (multiply by 8, add 20%).
Demonstrate the break-even point: when is multi-agent worth it?
Budget cap in action
Show Claude Code command with --max-budget-usd 50.
Show a team starting up with the budget cap.
Show team stopping when budget is exhausted.
Model selection comparison
Show cost/performance trade-offs: Haiku vs. Sonnet vs. Opus.
Show example task assignments by model (Haiku = tests, Sonnet = features, Opus = architecture).
OpenTelemetry dashboard (if available)
Show token usage per agent.
Show cumulative cost per session.
Highlight agents that used more tokens than expected.
Usage logs
Show a log file tracking token consumption per task, per agent.
Explain how to interpret the logs.

Demo Plan

Scenario: You're managing a backend team working on a Python service. You want to: 1. Calculate whether multi-agent justifies the cost. 2. Set a budget cap to prevent overspend. 3. Assign agents different models based on task complexity. 4. Monitor token usage as the team works.

Timing: ~4 minutes

Step 1: Calculate the Cost (45 seconds)

Show a cost analysis. Example: ``` PROJECT: Python API Service Estimated codebase size: 50 KLOC Estimated single-agent cost: ~$10 (1M tokens at Claude's pricing)

MULTI-AGENT COST ANALYSIS: 2-agent team: $10 × 2 + 15% overhead = ~$23 4-agent team: $10 × 4 + 20% overhead = ~$48 8-agent team: $10 × 8 + 25% overhead = ~$100

EXPECTED TIME SAVINGS: Single agent: ~8 hours (sequential) 2-agent team: ~5 hours (savings: 3 hours, cost increase: $13) 4-agent team: ~3 hours (savings: 5 hours, cost increase: $38) ```

Ask: "Is saving 5 hours worth $38 extra?" (Answer depends on your project's value.)

Step 2: Show Budget Configuration (45 seconds)

Open the team configuration (settings.json or team definition): json { "team": { "name": "backend-api-team", "max_budget_usd": 50, "agents": [ { "name": "auth_agent", "model": "claude-haiku" }, { "name": "database_agent", "model": "claude-sonnet" }, { "name": "api_agent", "model": "claude-sonnet" }, { "name": "orchestrator", "model": "claude-opus" } ] } }
Explain:
max_budget_usd: 50: Entire team budget is capped at $50.
Different models for different agents: simple tasks use Haiku, complex ones use Opus.
Show the CLI command: bash claude-code --team backend-api-team --max-budget-usd 50

Step 3: Assign Tasks by Model (45 seconds)

Show a task assignment table: ``` TASK ASSIGNMENT BY MODEL COST-EFFECTIVENESS:

Task Complexity Assigned Agent Model Est. Cost ───────────────────────────────────────────────────────────────────────────────── Unit tests for auth module Low auth_agent claude-haiku $0.50 Database schema + tests Medium database_agent claude-sonnet $3.00 API endpoints + tests Medium api_agent claude-sonnet $3.50 Arch review + integration High orchestrator claude-opus $5.00 ───────────────────────────────────────────────────────────────────────────────── Total estimated cost: $12.00 (Well under $50 budget) ```

Explain the logic:
Haiku for routine tasks (tests, docs, simple refactoring).
Sonnet for feature work (default, good balance).
Opus for complex decision-making (rarely).

Step 4: Monitor Token Usage in Real-Time (45 seconds)

Show a monitoring dashboard or log output as the team works: ``` TEAM: backend-api-team BUDGET: $50 USD SPENT SO FAR: $8.42 (16.8%)

AGENT BREAKDOWN: auth_agent tokens: 85,000 cost: $0.26 tasks: 3/3 (IDLE) database_agent tokens: 425,000 cost: $1.28 tasks: 2/3 (IN PROGRESS) api_agent tokens: 320,000 cost: $0.96 tasks: 1/2 (IN PROGRESS) orchestrator tokens: 120,000 cost: $3.60 tasks: 1/1 (IDLE) ───────────────────────────────────────────────────────────────────── TOTAL tokens: 1,050,000 cost: $8.42 budget remaining: $41.58

PREDICTION: Finish with ~$15 total spend (70% under budget) ```

Point out: "Database agent used more tokens than expected (425K vs. estimated 300K). But we're still well under budget, so this is fine."

Step 5: Show Budget Exhaustion Scenario (45 seconds)

Simulate what happens when budget is reached:
Show the team at 95% budget ($47.50 spent).
A new task comes in (would cost $5).
Agent tries to claim the task, but gets a warning: "Budget exhausted. Remaining: $2.50. This task would exceed budget."
Agent stops and idles.
Explain: "The budget cap protects you from runaway costs. When you hit the limit, work stops. You can increase the budget and restart, or declare the project done."

Step 6: Cost Report and Lessons (45 seconds)

Show a final cost report: ``` PROJECT COMPLETION SUMMARY ───────────────────────────── Wall-clock time: 3.5 hours Total tokens: 1.2M Total cost: $11.70

Cost per agent (average): $2.93 Cost per hour: $3.34

BREAKDOWN BY TASK: - Auth module: $0.26 (Haiku, 3 tasks) - Database layer: $1.28 (Sonnet, 2 tasks) - API endpoints: $0.96 (Sonnet, 2 tasks) - Orchestration: $3.60 (Opus, 1 task) - Overhead: $5.60 (context sharing, coordination) ─────────────────────────────

COMPARISON: Single-agent cost estimate = $10 Multi-agent cost = $11.70 Difference: +$1.70 (17%)

TIME SAVED: ~4.5 hours (vs. sequential approach) ROI: Worth it? YES - saved time > cost increase ```

Code Examples & Commands

Example 1: Cost Calculator (Python)

#!/usr/bin/env python3
"""
Cost calculator for multi-agent Claude Code projects.
Estimates token usage and financial cost.
"""

import json

# Pricing as of 2024 (update as needed)
PRICING = {
    "claude-opus": {
        "input": 0.015 / 1000,      # $0.015 per 1K input tokens
        "output": 0.075 / 1000,     # $0.075 per 1K output tokens
    },
    "claude-sonnet": {
        "input": 0.003 / 1000,      # $0.003 per 1K input tokens
        "output": 0.015 / 1000,     # $0.015 per 1K output tokens
    },
    "claude-haiku": {
        "input": 0.00080 / 1000,    # $0.00080 per 1K input tokens
        "output": 0.004 / 1000,     # $0.004 per 1K output tokens
    },
}

def estimate_cost(tokens_input, tokens_output, model="claude-opus"):
    """Calculate cost for a given token usage."""
    pricing = PRICING[model]
    input_cost = tokens_input * pricing["input"]
    output_cost = tokens_output * pricing["output"]
    return input_cost + output_cost

def multi_agent_cost(single_agent_cost, num_agents, coordination_overhead=0.15):
    """Estimate multi-agent cost as multiple of single-agent."""
    return single_agent_cost * num_agents * (1 + coordination_overhead)

# Example calculation
single_agent_tokens_in = 500000
single_agent_tokens_out = 200000
single_agent_cost = estimate_cost(single_agent_tokens_in, single_agent_tokens_out, "claude-opus")

print("COST ANALYSIS: Python API Backend\n")
print(f"Single-agent cost: ${single_agent_cost:.2f}")
print(f"  Input tokens: {single_agent_tokens_in:,}")
print(f"  Output tokens: {single_agent_tokens_out:,}")
print(f"  Model: claude-opus\n")

print("Multi-agent scenarios:")
for num_agents in [2, 4, 8]:
    cost = multi_agent_cost(single_agent_cost, num_agents, 0.15)
    overhead_cost = cost - (single_agent_cost * num_agents)
    print(f"  {num_agents} agents: ${cost:.2f} (base: ${single_agent_cost * num_agents:.2f}, overhead: ${overhead_cost:.2f})")

print("\nRECOMMENDATION:")
print("Use multi-agent if wall-clock time savings justify the cost increase.")

Example 2: Team Configuration with Model Assignment

{
  "team": {
    "name": "backend-team",
    "max_budget_usd": 100,
    "agents": [
      {
        "name": "auth_agent",
        "model": "claude-haiku",
        "role": "Authentication module (simple, scoped)",
        "budget_allocation": 5
      },
      {
        "name": "database_agent",
        "model": "claude-sonnet",
        "role": "Database layer (medium complexity)",
        "budget_allocation": 20
      },
      {
        "name": "api_agent",
        "model": "claude-sonnet",
        "role": "REST API (medium complexity)",
        "budget_allocation": 25
      },
      {
        "name": "tests_agent",
        "model": "claude-haiku",
        "role": "Integration tests (straightforward)",
        "budget_allocation": 10
      },
      {
        "name": "orchestrator",
        "model": "claude-opus",
        "role": "Architecture, coordination, synthesis",
        "budget_allocation": 40
      }
    ]
  },
  "hooks": {
    "taskCompleted": ".claude/hooks/check-tests.sh"
  }
}

Example 3: Budget-Aware CLI Launch

#!/bin/bash
# launch-team.sh - Start multi-agent team with budget controls

TEAM_NAME="backend-team"
BUDGET_USD=50
MAX_DURATION_HOURS=4

echo "Starting multi-agent team: $TEAM_NAME"
echo "Budget cap: \$$BUDGET_USD USD"
echo "Max duration: $MAX_DURATION_HOURS hours"
echo ""

# Start Claude Code with budget cap
claude-code \
  --team "$TEAM_NAME" \
  --max-budget-usd "$BUDGET_USD" \
  --max-duration-hours "$MAX_DURATION_HOURS" \
  --verbose

# Check exit code
if [ $? -eq 0 ]; then
  echo "✓ Team completed successfully."
else
  echo "✗ Team exited (possibly due to budget/time limit)."
fi

Example 4: Token Usage Monitoring Script

#!/usr/bin/env python3
"""Monitor token usage across a team."""

import json
import time
from datetime import datetime

def parse_logs(log_file):
    """Parse Claude Code logs to extract token usage."""
    usage_by_agent = {}

    with open(log_file, 'r') as f:
        for line in f:
            # Example log format:
            # [14:23:45] agent_name: token_usage={"input": 1000, "output": 500, "cost": 0.50}
            if "token_usage" in line:
                parts = line.split("token_usage=")
                if len(parts) == 2:
                    try:
                        agent_name = line.split(":")[1].split("token_usage")[0].strip()
                        usage = json.loads(parts[1].rstrip())

                        if agent_name not in usage_by_agent:
                            usage_by_agent[agent_name] = {
                                "input": 0,
                                "output": 0,
                                "cost": 0.0,
                                "tasks": 0
                            }

                        usage_by_agent[agent_name]["input"] += usage.get("input", 0)
                        usage_by_agent[agent_name]["output"] += usage.get("output", 0)
                        usage_by_agent[agent_name]["cost"] += usage.get("cost", 0)
                        usage_by_agent[agent_name]["tasks"] += 1
                    except json.JSONDecodeError:
                        pass

    return usage_by_agent

def print_report(usage_by_agent, budget):
    """Print a formatted cost report."""
    total_cost = sum(agent["cost"] for agent in usage_by_agent.values())
    total_tokens = sum(agent["input"] + agent["output"] for agent in usage_by_agent.values())

    print("\n" + "="*70)
    print("MULTI-AGENT COST REPORT")
    print("="*70)
    print(f"Report generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")

    print(f"BUDGET: ${budget:.2f} USD")
    print(f"SPENT:  ${total_cost:.2f} USD ({100*total_cost/budget:.1f}%)")
    print(f"REMAINING: ${budget - total_cost:.2f} USD\n")

    print("AGENT BREAKDOWN:")
    print("-" * 70)
    print(f"{'Agent':<20} {'Tasks':<8} {'Tokens':<15} {'Cost':>10}")
    print("-" * 70)

    for agent_name, stats in sorted(usage_by_agent.items(), key=lambda x: x[1]["cost"], reverse=True):
        tokens = stats["input"] + stats["output"]
        cost = stats["cost"]
        tasks = stats["tasks"]
        print(f"{agent_name:<20} {tasks:<8} {tokens:>14,} ${cost:>9.2f}")

    print("-" * 70)
    print(f"{'TOTAL':<20} {sum(s['tasks'] for s in usage_by_agent.values()):<8} {total_tokens:>14,} ${total_cost:>9.2f}")
    print("="*70 + "\n")

# Example usage
if __name__ == "__main__":
    import sys

    if len(sys.argv) < 2:
        print("Usage: monitor-costs.py <log_file> [budget_usd]")
        sys.exit(1)

    log_file = sys.argv[1]
    budget = float(sys.argv[2]) if len(sys.argv) > 2 else 100.0

    usage = parse_logs(log_file)
    print_report(usage, budget)

Example 5: Cost Comparison: Single vs. Multi-Agent

#!/bin/bash
# compare-costs.sh - Show cost difference between approaches

cat << 'EOF'
PROJECT: Build a Python REST API with auth, database, tests
CODEBASE SIZE: 50K lines of code
ESTIMATED TOKENS: 1.2M (input + output)

APPROACH 1: SINGLE AGENT (SEQUENTIAL)
────────────────────────────────────
Model: claude-opus
Tokens: 1.2M
Cost: ~$18.00
Time: 8 hours (wall-clock)

APPROACH 2: 4-AGENT TEAM (PARALLEL)
────────────────────────────────────
Models:
  - 1x claude-opus (orchestrator)
  - 2x claude-sonnet (features)
  - 1x claude-haiku (tests)

Tokens: 5.2M (4.3x multiplier for coordination)
Cost: ~$52.00
Time: 2.5 hours (wall-clock)

ANALYSIS
────────────────────────────────────
Speed improvement: 5.5x faster (8h → 2.5h)
Cost increase: 2.9x ($18 → $52)
Break-even: If time is worth $12/hour, multi-agent saves money.

RECOMMENDATION:
✓ Use multi-agent if:
  - Time-sensitive project
  - High-value deliverable (delay costs > $34)
  - Team parallelization is clear (not many dependencies)

✗ Skip multi-agent if:
  - Budget is tight
  - Project not time-critical
  - Task highly interdependent (low parallelization benefit)
EOF

Gotchas & Tips

Gotcha 1: Hidden Coordination Costs - You budget $10 for a task, but the orchestrator needs to synthesize results from 4 agents. Total cost: $30. - Tip: Account for coordination overhead (10–25%) in your cost estimates.

Gotcha 2: Over-Splitting Tasks - You break a 2-hour task into 10 30-minute micro-tasks for 10 agents. Now agents spend half their time waiting for each other. - Tip: Fewer, larger tasks = better parallelization and lower coordination cost.

Gotcha 3: Budget Cap Too Tight - You set --max-budget-usd 10 for a project that costs $12. Agents stop mid-project. - Tip: Estimate conservatively and set budget 20% higher than estimate.

Gotcha 4: Ignoring Baseline Cost - You focus on the multi-agent cost ($50) but forget the single-agent cost was $8. Multi-agent costs 6x more! - Tip: Always calculate the break-even point (time savings required to justify cost increase).

Gotcha 5: Wrong Model for the Task - You assign a Haiku to a complex architectural decision. It fails, you retry with Opus. Wasted money. - Tip: Match model complexity to task complexity. Start with Sonnet if unsure.

Lead-out

You now understand the cost curve of multi-agent work. But the most expensive multi-agent project in history was Anthropic's C compiler with 16 parallel agents—100K lines of code, $20K in costs, and incredible results. Next video, we're studying that case: how Anthropic's engineering team managed cost, parallelization, and quality across a truly massive project.

Reference URLs

Claude API Pricing: https://www.anthropic.com/pricing
Claude Code Budget Configuration: https://claude.ai/docs/claude-code/budget
OpenTelemetry for Token Tracking: https://opentelemetry.io/
Cost Monitoring Best Practices: https://aws.amazon.com/blogs/engineering/cost-optimization-best-practices/
Amdahl's Law (parallelization limits): https://en.wikipedia.org/wiki/Amdahl%27s_law

Prep Reading

Anthropic pricing page: Know the exact token costs for each model.
"Amdahl's Law" in systems design: Understand parallelization limits mathematically.
Cost optimization strategies: AWS or cloud cost management guides are applicable.
Token counting: Review how Anthropic counts tokens (affects cost calculations).

Notes for Daniel

This video is about making multi-agent work financially viable. Use real numbers from your experience or research. The cost calculator is the centerpiece—walk through it carefully so viewers can do the math themselves.

The key insight is: multi-agent isn't always cheaper or faster. It's a trade-off. Your job is to help viewers make that trade-off consciously, with data.

Show a practical example where multi-agent pays off (time-sensitive project) and one where it doesn't (simple project). That contrast is powerful.

The budget cap (--max-budget-usd) is a safety feature. Emphasize it as protection against runaway costs, especially for experimentation.

Quick Reference