The most common misconception about AI agents is that making them more powerful means making a single agent smarter. In practice, the architecture that actually scales complex AI tasks is multi-agent coordination: breaking a problem into specialized subtasks and assigning each to a dedicated agent with a specific role, context window, and set of tools.
A single LLM agent asked to "research a topic, write a blog post, check its SEO, add code examples, and proofread it" will do all of these things mediocrely. Four specialized agents — Researcher, Writer, Code Reviewer, and Editor — each focused on one task and passing structured output to the next, will consistently outperform a single generalist.
This post explains the multi-agent architecture patterns, when they're appropriate, and how to implement a practical pipeline.
Why Single Agents Hit Limits
SINGLE AGENT LIMITATIONS:
┌─────────────────────────────────────────┐
│ Single Agent │
│ │
│ Research → Write → Review → Edit │
│ ───────────────────────────────────── │
│ Issues: │
│ • Context window fills up quickly │
│ • No verification of own output │
│ • Cannot run tasks in parallel │
│ • All errors cascade without recovery │
└─────────────────────────────────────────┘
MULTI-AGENT SOLUTION:
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│Researcher│───►│ Writer │───►│Reviewer │───►│ Editor │
│ Agent │ │ Agent │ │ Agent │ │ Agent │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
│ │ │ │
Clean Fresh Dedicated Final
context context critique polishMulti-Agent Architecture Patterns
Pattern 1: Sequential Pipeline (Most Common)
Agents run in a fixed sequence, each consuming the previous agent's output:
// Sequential multi-agent pipeline
async function contentCreationPipeline(topic: string) {
// Agent 1: Research
const research = await researchAgent.run({
task: `Research the topic: "${topic}". Find 5 key insights with sources.`,
maxTokens: 2000,
});
// Agent 2: Write (uses research as context)
const draft = await writerAgent.run({
task: `Write a technical blog post about "${topic}".`,
context: research.output,
maxTokens: 3000,
});
// Agent 3: Code Review (only sees the draft)
const codeReview = await codeReviewAgent.run({
task: 'Review all code examples in this draft. Identify bugs and inaccuracies.',
context: draft.output,
maxTokens: 1500,
});
// Agent 4: Edit (uses draft + code review)
const final = await editorAgent.run({
task: 'Apply the code review feedback and polish the draft.',
context: `DRAFT:\n${draft.output}\n\nCODE REVIEW:\n${codeReview.output}`,
maxTokens: 3500,
});
return final.output;
}Pattern 2: Parallel Fan-Out
Multiple agents run simultaneously and their outputs are merged:
// Parallel multi-agent analysis
async function parallelAnalysisPipeline(codebase: string) {
// Run three specialized reviews simultaneously
const [securityReview, performanceReview, accessibilityReview] = await Promise.all([
securityAgent.run({ task: 'Find security vulnerabilities', context: codebase }),
performanceAgent.run({ task: 'Identify performance bottlenecks', context: codebase }),
a11yAgent.run({ task: 'Find accessibility issues', context: codebase }),
]);
// Merge and prioritize findings
const mergedReport = await reportAgent.run({
task: 'Consolidate these three reviews into a prioritized action plan.',
context: [
`SECURITY: ${securityReview.output}`,
`PERFORMANCE: ${performanceReview.output}`,
`ACCESSIBILITY: ${accessibilityReview.output}`,
].join('\n\n'),
});
return mergedReport.output;
}Pattern 3: Supervisor-Worker (Most Flexible)
A supervisor agent decomposes the task and routes subtasks to specialized workers:
// Supervisor-worker pattern
async function supervisedPipeline(userRequest: string) {
// Supervisor decomposes the request
const plan = await supervisorAgent.run({
task: `Decompose this request into subtasks: "${userRequest}"
Return a JSON array of: { taskId, agent, description, dependsOn }`,
tools: ['list_available_agents'],
});
const tasks = JSON.parse(plan.output);
const results: Record<string, string> = {};
// Execute tasks respecting dependencies
for (const task of topologicalSort(tasks)) {
const dependencyContext = task.dependsOn
.map((depId: string) => `${depId}: ${results[depId]}`)
.join('\n');
results[task.taskId] = await getAgent(task.agent).run({
task: task.description,
context: dependencyContext,
});
}
// Supervisor synthesizes the final result
const final = await supervisorAgent.run({
task: 'Synthesize these subtask results into a final answer.',
context: Object.entries(results)
.map(([id, result]) => `${id}:\n${result}`)
.join('\n\n'),
});
return final.output;
}
Building Agents with the Anthropic Claude SDK
// lib/agent.ts
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
interface AgentConfig {
name: string;
systemPrompt: string;
model?: string;
maxTokens?: number;
tools?: Anthropic.Tool[];
}
interface AgentRunOptions {
task: string;
context?: string;
maxTokens?: number;
}
export function createAgent(config: AgentConfig) {
return {
async run(options: AgentRunOptions): Promise<{ output: string; usage: object }> {
const messages: Anthropic.MessageParam[] = [
{
role: 'user',
content: options.context
? `CONTEXT:\n${options.context}\n\nTASK:\n${options.task}`
: options.task,
},
];
let response = await client.messages.create({
model: config.model ?? 'claude-sonnet-4-5',
max_tokens: options.maxTokens ?? config.maxTokens ?? 2048,
system: config.systemPrompt,
messages,
tools: config.tools,
});
// Handle tool use in an agentic loop
while (response.stop_reason === 'tool_use') {
const toolUseBlock = response.content.find(b => b.type === 'tool_use') as Anthropic.ToolUseBlock;
const toolResult = await executeToolCall(toolUseBlock.name, toolUseBlock.input);
messages.push({ role: 'assistant', content: response.content });
messages.push({
role: 'user',
content: [{ type: 'tool_result', tool_use_id: toolUseBlock.id, content: toolResult }],
});
response = await client.messages.create({
model: config.model ?? 'claude-sonnet-4-5',
max_tokens: options.maxTokens ?? config.maxTokens ?? 2048,
system: config.systemPrompt,
messages,
tools: config.tools,
});
}
const textContent = response.content.find(b => b.type === 'text') as Anthropic.TextBlock;
return { output: textContent.text, usage: response.usage };
},
};
}
// Specialized agents
export const researchAgent = createAgent({
name: 'researcher',
systemPrompt: 'You are a precise technical researcher. Extract key facts with citations. Return structured JSON.',
maxTokens: 2000,
});
export const writerAgent = createAgent({
name: 'writer',
systemPrompt: 'You are a senior technical writer. Write clear, engaging, developer-focused content.',
maxTokens: 4000,
});
export const reviewerAgent = createAgent({
name: 'reviewer',
systemPrompt: 'You are a critical code reviewer. Identify bugs, security issues, and inaccuracies. Be specific.',
maxTokens: 2000,
});When NOT to Use Multi-Agent Pipelines
Multi-agent pipelines add latency and cost. Don't use them when:
- A single, well-prompted LLM call is sufficient.
- Latency is critical (real-time user interactions).
- The task is simple enough to fit comfortably in one context window.
- You don't have a clear decomposition of subtasks.
Use multi-agent pipelines when:
- The task requires conflicting roles (write + critique).
- Subtasks can run in parallel to save wall-clock time.
- The full task exceeds the effective context window of a single agent.
- You need specialized expertise at different stages (research vs. writing vs. security review).
Conclusion
Multi-agent pipelines are the architectural pattern that takes AI from a useful tool into a scalable engineering system. By decomposing complex tasks, assigning specialized agents to each subtask, and passing structured output between them, you overcome the fundamental limitations of single-agent systems: context overflow, role conflict, and lack of verification. Start with a simple sequential pipeline, measure the output quality against a single-agent baseline, and add parallelism and supervision as your use case demands.