17.10 Hosting & Deployment

Course: Claude Code - Enterprise Development

Section: Claude Agent SDK

Video Length: 3-5 minutes

Presenter: Daniel Treasure

Opening Hook

"You've built powerful agents in Python and TypeScript. Now, how do you run them in production? Cloud VMs, containers, serverless—each has trade-offs. We'll cover deployment architecture, environment config, health checks, and failover strategies."

Key Talking Points

What to say:

"Agent SDK runs anywhere Python or Node.js runs: VMs, containers, serverless, hybrid."
"Production agents need: resilience, monitoring, isolation, secret management, scalability."
"Each deployment model has pros/cons: VMs (control, overhead), containers (consistency), serverless (simplicity, cost)."
"We'll show practical patterns for each, plus cross-cutting concerns (logging, metrics, health checks)."

What to show on screen:

Agent running locally (baseline)
Agent in Docker container
Agent deployed to cloud (AWS Lambda, Google Cloud Run, Azure Functions, etc.)
Monitoring dashboard showing agent metrics
Failover/retry logic in action

Demo Plan

[00:00 - 01:00] Deployment Models Overview 1. Show three model options: - VMs (EC2, GCE, etc.): full control, overhead, need to manage OS/updates - Containers (Docker + Kubernetes): consistency, orchestration, scaling - Serverless (Lambda, Cloud Run, etc.): auto-scaling, no ops overhead, limited duration 2. Decision matrix: when to use each 3. Show: same agent code, different deployment targets

[01:00 - 02:00] Docker Container Deployment 1. Create Dockerfile for agent application 2. Show: install dependencies, copy agent code, expose port 3. Build image: docker build -t my-agent:latest . 4. Run container: docker run my-agent:latest 5. Show: container isolation, environment variables for config

[02:00 - 03:00] Cloud Deployment (e.g., Kubernetes) 1. Show Kubernetes deployment YAML 2. Explain: replicas (multiple agent instances), resource limits, health probes 3. Deploy to cluster: kubectl apply -f agent-deployment.yaml 4. Show: auto-scaling based on metrics 5. Mention: readiness/liveness probes (health checks)

[03:00 - 04:00] Monitoring & Observability 1. Show: structured logging (JSON output) 2. Show: metrics collection (Prometheus, CloudWatch, etc.) 3. Metrics to track: agent success rate, latency, error rate, MCP failures 4. Show: dashboard with these metrics 5. Mention: alerts (PagerDuty, etc.)

[04:00 - 04:45] Secrets & Configuration 1. Show: never hardcode credentials 2. Use secrets manager: AWS Secrets Manager, HashiCorp Vault, Kubernetes Secrets 3. Environment variables injected at runtime 4. Demonstrate: configuration file for environment-specific settings (dev, staging, prod)

Code Examples & Commands

Dockerfile (Python agent):

FROM python:3.11-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy agent code
COPY . .

# Environment variables
ENV ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
ENV LOG_LEVEL=INFO

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD python -c "import requests; requests.get('http://localhost:8000/health')"

# Run agent service
CMD ["python", "-m", "uvicorn", "agent_service:app", "--host", "0.0.0.0", "--port", "8000"]

Agent service (FastAPI wrapper):

from fastapi import FastAPI
import asyncio
import logging
from datetime import datetime
from claude_code import Agent

# Setup logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

app = FastAPI(title="Claude Agent Service")

# Initialize agent
agent = Agent(
    model="claude-sonnet-4-5-20250929",
    system_prompt="You are a helpful code assistant."
)

@app.get("/health")
async def health():
    """Health check endpoint."""
    return {
        "status": "healthy",
        "timestamp": datetime.utcnow().isoformat(),
        "service": "claude-agent"
    }

@app.post("/agent/run")
async def run_agent(task: str):
    """Run agent with given task."""
    try:
        logger.info(f"Starting agent task: {task[:50]}...")

        result = await agent.run(task)

        logger.info(f"Task completed successfully")

        return {
            "status": "success",
            "output": result.output,
            "tool_calls": len(result.tool_calls) if hasattr(result, 'tool_calls') else 0
        }

    except Exception as e:
        logger.error(f"Agent task failed: {str(e)}")
        return {
            "status": "error",
            "error": str(e)
        }

@app.get("/metrics")
async def metrics():
    """Return agent metrics."""
    return {
        "uptime_seconds": 0,  # Would track actual uptime
        "tasks_completed": 0,  # Would increment
        "errors": 0
    }

Build and run:

# Build Docker image
docker build -t claude-agent:latest .

# Run locally
docker run -e ANTHROPIC_API_KEY=your_key \
           -p 8000:8000 \
           claude-agent:latest

# Test
curl -X POST http://localhost:8000/agent/run \
  -H "Content-Type: application/json" \
  -d '{"task": "What is 2+2?"}'

Kubernetes Deployment YAML:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: claude-agent
spec:
  replicas: 3
  selector:
    matchLabels:
      app: claude-agent
  template:
    metadata:
      labels:
        app: claude-agent
    spec:
      containers:
      - name: agent
        image: claude-agent:latest
        ports:
        - containerPort: 8000

        env:
        - name: ANTHROPIC_API_KEY
          valueFrom:
            secretKeyRef:
              name: claude-secrets
              key: api-key
        - name: LOG_LEVEL
          value: "INFO"
        - name: ENVIRONMENT
          value: "production"

        resources:
          requests:
            memory: "256Mi"
            cpu: "100m"
          limits:
            memory: "512Mi"
            cpu: "500m"

        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10

        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: claude-agent-service
spec:
  selector:
    app: claude-agent
  ports:
  - port: 80
    targetPort: 8000
  type: LoadBalancer

Structured Logging (JSON):

import json
import logging

class JSONFormatter(logging.Formatter):
    def format(self, record):
        log_data = {
            "timestamp": self.formatTime(record),
            "level": record.levelname,
            "service": "claude-agent",
            "message": record.getMessage(),
            "module": record.module,
            "function": record.funcName,
            "line": record.lineno
        }

        if record.exc_info:
            log_data["exception"] = self.formatException(record.exc_info)

        return json.dumps(log_data)

handler = logging.StreamHandler()
handler.setFormatter(JSONFormatter())
logger = logging.getLogger()
logger.addHandler(handler)

Environment Configuration (config.py):

import os
from dataclasses import dataclass

@dataclass
class Config:
    # API Configuration
    anthropic_api_key: str = os.getenv("ANTHROPIC_API_KEY")

    # Agent Configuration
    model: str = os.getenv("MODEL", "claude-sonnet-4-5-20250929")
    max_tokens: int = int(os.getenv("MAX_TOKENS", "2048"))

    # Deployment Configuration
    environment: str = os.getenv("ENVIRONMENT", "development")
    log_level: str = os.getenv("LOG_LEVEL", "INFO")

    # Service Configuration
    host: str = os.getenv("HOST", "0.0.0.0")
    port: int = int(os.getenv("PORT", "8000"))

    # Resilience Configuration
    max_retries: int = int(os.getenv("MAX_RETRIES", "3"))
    timeout_seconds: int = int(os.getenv("TIMEOUT", "30"))

    def is_production(self) -> bool:
        return self.environment == "production"

config = Config()

Gotchas & Tips

Gotcha 1: Cold Starts (Serverless) - Lambda/Cloud Run cold starts add 5-30 seconds latency - Problematic if agents need sub-second response - Solution: Reserved concurrency, keep warm with scheduled pings

Gotcha 2: Memory Limits - Agent + models + MCP servers can use significant memory - Serverless has strict memory limits (default 128MB, max 10GB) - Solution: Profile memory usage, optimize, or use VMs for memory-heavy work

Gotcha 3: Long-Running Tasks (Serverless) - Serverless functions timeout (Lambda: 15 minutes max) - If agent tasks exceed timeout, fail - Solution: Break task into chunks, use job queue pattern

Gotcha 4: Credential Management - Never hardcode API keys in images or code - Use secrets manager (AWS Secrets Manager, Vault, K8s Secrets) - Rotate credentials regularly

Gotcha 5: Network Isolation - If agent needs private network access (to internal APIs, databases), configure networking - Cloud VPC/subnet configuration required

Tip 1: Health Checks - Always implement /health endpoint - Return: status, timestamp, dependencies (MCP servers reachable?)

Tip 2: Structured Logging - JSON logs → easier parsing, alerting, searching - Include: timestamp, level, service, message, context (user_id, request_id)

Tip 3: Graceful Shutdown - Handle SIGTERM, close connections, flush logs before exit - Important for rolling deployments

Tip 4: Resource Requests/Limits - Set appropriate CPU/memory requests and limits - Prevents resource starvation on shared clusters

Tip 5: Monitoring Stack - Metrics: Prometheus, CloudWatch, Datadog - Logs: ELK, Splunk, CloudWatch - Tracing: Jaeger, Datadog - Alerts: PagerDuty, Opsgenie

Tip 6: Cost Optimization - VMs: reserved instances (save 30-70%) - Containers: right-size resources, spot instances - Serverless: monitor invocations, optimize cold starts

Lead-out

"You've learned deployment patterns for production agents: containers, Kubernetes, serverless, with monitoring and resilience. The SDK is flexible—run agents anywhere. Final section: best practices. We'll synthesize everything and show enterprise patterns for effective agent use."

Reference URLs

Docker Documentation: https://docs.docker.com/
Kubernetes Basics: https://kubernetes.io/docs/
AWS Lambda: https://aws.amazon.com/lambda/
Google Cloud Run: https://cloud.google.com/run
Azure Functions: https://azure.microsoft.com/en-us/services/functions/
Prometheus Metrics: https://prometheus.io/
Structured Logging: https://www.kartar.net/2015/12/structured-logging/

Prep Reading

Review your company's infrastructure (where do services run?)
Understand your deployment pipeline (CI/CD system)
Review secret management practices in your org
Understand monitoring/alerting setup

Notes for Daniel

Demo flexibility: Can show one deployment model or multiple. Adjust depth based on audience.
Enterprise focus: Emphasize security, resilience, observability. Production isn't just code running—it's monitored, scaled, secured.
Cost awareness: Mention that cloud costs vary—help teams choose right model for their use case.
Tone: Position deployment as the final step that makes agents production-ready. Earlier videos (CLI, SDK features) are tools; this video is the destination.
Wrap-up momentum: Use lead-out to transition to final section (best practices).

Quick Reference