17.10 Hosting & Deployment
Course: Claude Code - Enterprise Development
Section: Claude Agent SDK
Video Length: 3-5 minutes
Presenter: Daniel Treasure
Opening Hook
"You've built powerful agents in Python and TypeScript. Now, how do you run them in production? Cloud VMs, containers, serverless—each has trade-offs. We'll cover deployment architecture, environment config, health checks, and failover strategies."
Key Talking Points
What to say:
- "Agent SDK runs anywhere Python or Node.js runs: VMs, containers, serverless, hybrid."
- "Production agents need: resilience, monitoring, isolation, secret management, scalability."
- "Each deployment model has pros/cons: VMs (control, overhead), containers (consistency), serverless (simplicity, cost)."
- "We'll show practical patterns for each, plus cross-cutting concerns (logging, metrics, health checks)."
What to show on screen:
- Agent running locally (baseline)
- Agent in Docker container
- Agent deployed to cloud (AWS Lambda, Google Cloud Run, Azure Functions, etc.)
- Monitoring dashboard showing agent metrics
- Failover/retry logic in action
Demo Plan
[00:00 - 01:00] Deployment Models Overview 1. Show three model options: - VMs (EC2, GCE, etc.): full control, overhead, need to manage OS/updates - Containers (Docker + Kubernetes): consistency, orchestration, scaling - Serverless (Lambda, Cloud Run, etc.): auto-scaling, no ops overhead, limited duration 2. Decision matrix: when to use each 3. Show: same agent code, different deployment targets
[01:00 - 02:00] Docker Container Deployment
1. Create Dockerfile for agent application
2. Show: install dependencies, copy agent code, expose port
3. Build image: docker build -t my-agent:latest .
4. Run container: docker run my-agent:latest
5. Show: container isolation, environment variables for config
[02:00 - 03:00] Cloud Deployment (e.g., Kubernetes)
1. Show Kubernetes deployment YAML
2. Explain: replicas (multiple agent instances), resource limits, health probes
3. Deploy to cluster: kubectl apply -f agent-deployment.yaml
4. Show: auto-scaling based on metrics
5. Mention: readiness/liveness probes (health checks)
[03:00 - 04:00] Monitoring & Observability 1. Show: structured logging (JSON output) 2. Show: metrics collection (Prometheus, CloudWatch, etc.) 3. Metrics to track: agent success rate, latency, error rate, MCP failures 4. Show: dashboard with these metrics 5. Mention: alerts (PagerDuty, etc.)
[04:00 - 04:45] Secrets & Configuration 1. Show: never hardcode credentials 2. Use secrets manager: AWS Secrets Manager, HashiCorp Vault, Kubernetes Secrets 3. Environment variables injected at runtime 4. Demonstrate: configuration file for environment-specific settings (dev, staging, prod)
Code Examples & Commands
Dockerfile (Python agent):
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy agent code
COPY . .
# Environment variables
ENV ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
ENV LOG_LEVEL=INFO
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD python -c "import requests; requests.get('http://localhost:8000/health')"
# Run agent service
CMD ["python", "-m", "uvicorn", "agent_service:app", "--host", "0.0.0.0", "--port", "8000"]
Agent service (FastAPI wrapper):
from fastapi import FastAPI
import asyncio
import logging
from datetime import datetime
from claude_code import Agent
# Setup logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
app = FastAPI(title="Claude Agent Service")
# Initialize agent
agent = Agent(
model="claude-sonnet-4-5-20250929",
system_prompt="You are a helpful code assistant."
)
@app.get("/health")
async def health():
"""Health check endpoint."""
return {
"status": "healthy",
"timestamp": datetime.utcnow().isoformat(),
"service": "claude-agent"
}
@app.post("/agent/run")
async def run_agent(task: str):
"""Run agent with given task."""
try:
logger.info(f"Starting agent task: {task[:50]}...")
result = await agent.run(task)
logger.info(f"Task completed successfully")
return {
"status": "success",
"output": result.output,
"tool_calls": len(result.tool_calls) if hasattr(result, 'tool_calls') else 0
}
except Exception as e:
logger.error(f"Agent task failed: {str(e)}")
return {
"status": "error",
"error": str(e)
}
@app.get("/metrics")
async def metrics():
"""Return agent metrics."""
return {
"uptime_seconds": 0, # Would track actual uptime
"tasks_completed": 0, # Would increment
"errors": 0
}
Build and run:
# Build Docker image
docker build -t claude-agent:latest .
# Run locally
docker run -e ANTHROPIC_API_KEY=your_key \
-p 8000:8000 \
claude-agent:latest
# Test
curl -X POST http://localhost:8000/agent/run \
-H "Content-Type: application/json" \
-d '{"task": "What is 2+2?"}'
Kubernetes Deployment YAML:
apiVersion: apps/v1
kind: Deployment
metadata:
name: claude-agent
spec:
replicas: 3
selector:
matchLabels:
app: claude-agent
template:
metadata:
labels:
app: claude-agent
spec:
containers:
- name: agent
image: claude-agent:latest
ports:
- containerPort: 8000
env:
- name: ANTHROPIC_API_KEY
valueFrom:
secretKeyRef:
name: claude-secrets
key: api-key
- name: LOG_LEVEL
value: "INFO"
- name: ENVIRONMENT
value: "production"
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 5
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: claude-agent-service
spec:
selector:
app: claude-agent
ports:
- port: 80
targetPort: 8000
type: LoadBalancer
Structured Logging (JSON):
import json
import logging
class JSONFormatter(logging.Formatter):
def format(self, record):
log_data = {
"timestamp": self.formatTime(record),
"level": record.levelname,
"service": "claude-agent",
"message": record.getMessage(),
"module": record.module,
"function": record.funcName,
"line": record.lineno
}
if record.exc_info:
log_data["exception"] = self.formatException(record.exc_info)
return json.dumps(log_data)
handler = logging.StreamHandler()
handler.setFormatter(JSONFormatter())
logger = logging.getLogger()
logger.addHandler(handler)
Environment Configuration (config.py):
import os
from dataclasses import dataclass
@dataclass
class Config:
# API Configuration
anthropic_api_key: str = os.getenv("ANTHROPIC_API_KEY")
# Agent Configuration
model: str = os.getenv("MODEL", "claude-sonnet-4-5-20250929")
max_tokens: int = int(os.getenv("MAX_TOKENS", "2048"))
# Deployment Configuration
environment: str = os.getenv("ENVIRONMENT", "development")
log_level: str = os.getenv("LOG_LEVEL", "INFO")
# Service Configuration
host: str = os.getenv("HOST", "0.0.0.0")
port: int = int(os.getenv("PORT", "8000"))
# Resilience Configuration
max_retries: int = int(os.getenv("MAX_RETRIES", "3"))
timeout_seconds: int = int(os.getenv("TIMEOUT", "30"))
def is_production(self) -> bool:
return self.environment == "production"
config = Config()
Gotchas & Tips
Gotcha 1: Cold Starts (Serverless) - Lambda/Cloud Run cold starts add 5-30 seconds latency - Problematic if agents need sub-second response - Solution: Reserved concurrency, keep warm with scheduled pings
Gotcha 2: Memory Limits - Agent + models + MCP servers can use significant memory - Serverless has strict memory limits (default 128MB, max 10GB) - Solution: Profile memory usage, optimize, or use VMs for memory-heavy work
Gotcha 3: Long-Running Tasks (Serverless) - Serverless functions timeout (Lambda: 15 minutes max) - If agent tasks exceed timeout, fail - Solution: Break task into chunks, use job queue pattern
Gotcha 4: Credential Management - Never hardcode API keys in images or code - Use secrets manager (AWS Secrets Manager, Vault, K8s Secrets) - Rotate credentials regularly
Gotcha 5: Network Isolation - If agent needs private network access (to internal APIs, databases), configure networking - Cloud VPC/subnet configuration required
Tip 1: Health Checks
- Always implement /health endpoint
- Return: status, timestamp, dependencies (MCP servers reachable?)
Tip 2: Structured Logging - JSON logs → easier parsing, alerting, searching - Include: timestamp, level, service, message, context (user_id, request_id)
Tip 3: Graceful Shutdown - Handle SIGTERM, close connections, flush logs before exit - Important for rolling deployments
Tip 4: Resource Requests/Limits - Set appropriate CPU/memory requests and limits - Prevents resource starvation on shared clusters
Tip 5: Monitoring Stack - Metrics: Prometheus, CloudWatch, Datadog - Logs: ELK, Splunk, CloudWatch - Tracing: Jaeger, Datadog - Alerts: PagerDuty, Opsgenie
Tip 6: Cost Optimization - VMs: reserved instances (save 30-70%) - Containers: right-size resources, spot instances - Serverless: monitor invocations, optimize cold starts
Lead-out
"You've learned deployment patterns for production agents: containers, Kubernetes, serverless, with monitoring and resilience. The SDK is flexible—run agents anywhere. Final section: best practices. We'll synthesize everything and show enterprise patterns for effective agent use."
Reference URLs
- Docker Documentation: https://docs.docker.com/
- Kubernetes Basics: https://kubernetes.io/docs/
- AWS Lambda: https://aws.amazon.com/lambda/
- Google Cloud Run: https://cloud.google.com/run
- Azure Functions: https://azure.microsoft.com/en-us/services/functions/
- Prometheus Metrics: https://prometheus.io/
- Structured Logging: https://www.kartar.net/2015/12/structured-logging/
Prep Reading
- Review your company's infrastructure (where do services run?)
- Understand your deployment pipeline (CI/CD system)
- Review secret management practices in your org
- Understand monitoring/alerting setup
Notes for Daniel
- Demo flexibility: Can show one deployment model or multiple. Adjust depth based on audience.
- Enterprise focus: Emphasize security, resilience, observability. Production isn't just code running—it's monitored, scaled, secured.
- Cost awareness: Mention that cloud costs vary—help teams choose right model for their use case.
- Tone: Position deployment as the final step that makes agents production-ready. Earlier videos (CLI, SDK features) are tools; this video is the destination.
- Wrap-up momentum: Use lead-out to transition to final section (best practices).