Multi-agent systems let you break complex tasks into specialized sub-tasks, each handled by a focused AI agent. This guide covers the architecture patterns and implementation for building reliable multi-agent systems with Claude.
When Multi-Agent Makes Sense
Use multi-agent systems when:
- Long tasks exceed a single context window
- Parallelization is possible and would speed things up
- Specialized roles benefit from dedicated context/instructions
- Verification requires an independent agent checking work
A single agent with a good system prompt often suffices. Add multi-agent complexity only when the benefits outweigh the coordination overhead.
Core Architecture Patterns
Orchestrator-Worker Pattern
The most common pattern: an orchestrator agent plans and delegates; worker agents execute specific tasks.
import anthropic
from typing import Callable
client = anthropic.Anthropic()
def create_agent(name: str, system_prompt: str) -> Callable:
"""Factory for creating specialized agents."""
def agent(task: str, context: str = "") -> str:
messages = [{"role": "user", "content": f"{context}\n\nTask: {task}" if context else f"Task: {task}"}]
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=4096,
system=system_prompt,
messages=messages
)
return response.content[0].text
agent.__name__ = name
return agent
# Create specialized agents
researcher = create_agent(
"researcher",
"You are a research specialist. Find and synthesize information on given topics. Be thorough and cite sources when possible."
)
writer = create_agent(
"writer",
"You are a professional writer. Transform research and outlines into polished, engaging content."
)
editor = create_agent(
"editor",
"You are an editor. Review content for clarity, accuracy, and quality. Provide specific improvements."
)
orchestrator = create_agent(
"orchestrator",
"""You are a project orchestrator. Given a goal, create a step-by-step plan.
Available agents: researcher, writer, editor.
Output your plan as JSON with format:
{"steps": [{"agent": "name", "task": "description", "depends_on": []}]}"""
)
Pipeline Pattern
Linear chain of agents, each building on the previous output:
def run_pipeline(initial_input: str, stages: list[tuple[str, Callable]]) -> dict:
"""Run a sequential pipeline of agents."""
results = {}
current_input = initial_input
for stage_name, agent in stages:
print(f"Running stage: {stage_name}")
result = agent(current_input)
results[stage_name] = result
current_input = result
return results
# Example: blog post pipeline
results = run_pipeline(
"Write about the impact of AI on software development in 2026",
[
("research", researcher),
("outline", lambda x: orchestrator(f"Create detailed outline based on: {x}")),
("draft", writer),
("edit", editor),
]
)
Tool-Using Agents
Agents become powerful when they can use tools:
def research_agent_with_tools(query: str) -> str:
"""Agent that can search and read web pages."""
tools = [
{
"name": "web_search",
"description": "Search the web for current information",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string"}
},
"required": ["query"]
}
},
{
"name": "read_url",
"description": "Read the content of a URL",
"input_schema": {
"type": "object",
"properties": {
"url": {"type": "string"}
},
"required": ["url"]
}
}
]
messages = [{"role": "user", "content": query}]
while True:
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=4096,
system="You are a research agent. Use tools to find accurate, current information.",
tools=tools,
messages=messages
)
if response.stop_reason == "end_turn":
return response.content[0].text
# Handle tool calls
messages.append({"role": "assistant", "content": response.content})
tool_results = []
for block in response.content:
if block.type == "tool_use":
# Execute the tool
result = execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result
})
messages.append({"role": "user", "content": tool_results})
def execute_tool(name: str, inputs: dict) -> str:
"""Execute tool calls — replace with real implementations."""
if name == "web_search":
# Replace with actual search implementation
return f"Search results for: {inputs['query']}"
elif name == "read_url":
# Replace with actual URL reading
return f"Content from: {inputs['url']}"
return "Tool not found"
State Management
Multi-agent systems need shared state:
from dataclasses import dataclass, field
from typing import Any
import json
@dataclass
class AgentState:
"""Shared state across agents."""
goal: str
completed_tasks: list[str] = field(default_factory=list)
artifacts: dict[str, Any] = field(default_factory=dict)
messages: list[dict] = field(default_factory=list)
def add_artifact(self, key: str, value: Any):
self.artifacts[key] = value
self.completed_tasks.append(key)
def get_context_summary(self) -> str:
return f"""
Goal: {self.goal}
Completed: {', '.join(self.completed_tasks)}
Available artifacts: {list(self.artifacts.keys())}
"""
class MultiAgentOrchestrator:
def __init__(self, goal: str):
self.state = AgentState(goal=goal)
self.agents = {}
def register_agent(self, name: str, agent: Callable):
self.agents[name] = agent
def run_agent(self, agent_name: str, task: str) -> str:
agent = self.agents[agent_name]
context = self.state.get_context_summary()
# Include relevant artifacts in context
relevant_artifacts = "\n".join([
f"{k}: {str(v)[:500]}"
for k, v in self.state.artifacts.items()
])
full_context = f"{context}\n\nPrevious work:\n{relevant_artifacts}"
result = agent(task, full_context)
self.state.add_artifact(f"{agent_name}_{task[:20]}", result)
return result
Parallel Agent Execution
Run independent tasks concurrently:
import asyncio
import anthropic
async_client = anthropic.AsyncAnthropic()
async def run_agent_async(name: str, system: str, task: str) -> tuple[str, str]:
"""Run a single agent asynchronously."""
response = await async_client.messages.create(
model="claude-3-haiku-20240307", # Use cheaper model for parallel workers
max_tokens=2048,
system=system,
messages=[{"role": "user", "content": task}]
)
return name, response.content[0].text
async def run_parallel_research(topics: list[str]) -> dict[str, str]:
"""Research multiple topics simultaneously."""
tasks = [
run_agent_async(
topic,
"You are a research specialist. Be concise and factual.",
f"Research and summarize: {topic}"
)
for topic in topics
]
results = await asyncio.gather(*tasks)
return dict(results)
# Usage
async def main():
topics = ["AI in healthcare", "AI in finance", "AI in education"]
results = await run_parallel_research(topics)
for topic, research in results.items():
print(f"\n=== {topic} ===")
print(research[:200])
asyncio.run(main())
Error Handling and Retry Logic
import time
from anthropic import RateLimitError, APIError
def resilient_agent_call(agent_fn: Callable, task: str, max_retries: int = 3) -> str:
"""Call an agent with retry logic."""
for attempt in range(max_retries):
try:
return agent_fn(task)
except RateLimitError:
wait = 2 ** attempt
print(f"Rate limited. Waiting {wait}s...")
time.sleep(wait)
except APIError as e:
if attempt == max_retries - 1:
raise
print(f"API error: {e}. Retrying...")
time.sleep(1)
raise Exception(f"Agent failed after {max_retries} attempts")
Production Considerations
Token costs: Each agent call has its own cost. Use cheaper models (Haiku) for worker agents and more capable models (Sonnet) for orchestrators.
Latency: Sequential agents compound latency. Parallelize where possible.
Context contamination: Each agent should get only the context it needs. Avoid passing everything to every agent.
Logging: Log all agent calls, inputs, outputs, and token usage. You need observability to debug multi-agent systems.
Timeouts: Set explicit timeouts. Hanging agents will block your entire pipeline.
Model Selection for Multi-Agent
| Role | Model | Reason |
|---|---|---|
| Orchestrator | claude-3-5-sonnet | Complex planning needs best model |
| Research workers | claude-3-haiku | Fast, cheap, good enough for factual tasks |
| Writing workers | claude-3-5-sonnet | Writing quality matters |
| Verification agents | claude-3-5-sonnet | Accuracy is critical |
| Classification/routing | claude-3-haiku | Simple decision, maximize speed |
Using the right model per agent tier significantly reduces cost without sacrificing overall quality.