Building a Multi-Agent Pipeline: When One AI Isn't Enough

The most common misconception about AI agents is that making them more powerful means making a single agent smarter. In practice, the architecture that actually scales complex AI tasks is multi-agent coordination: breaking a problem into specialized subtasks and assigning each to a dedicated agent with a specific role, context window, and set of tools.

A single LLM agent asked to "research a topic, write a blog post, check its SEO, add code examples, and proofread it" will do all of these things mediocrely. Four specialized agents — Researcher, Writer, Code Reviewer, and Editor — each focused on one task and passing structured output to the next, will consistently outperform a single generalist.

This post explains the multi-agent architecture patterns, when they're appropriate, and how to implement a practical pipeline.

Why Single Agents Hit Limits

SINGLE AGENT LIMITATIONS:
┌─────────────────────────────────────────┐
│           Single Agent                  │
│                                         │
│  Research → Write → Review → Edit       │
│  ─────────────────────────────────────  │
│  Issues:                                │
│  • Context window fills up quickly      │
│  • No verification of own output        │
│  • Cannot run tasks in parallel         │
│  • All errors cascade without recovery  │
└─────────────────────────────────────────┘

MULTI-AGENT SOLUTION:
┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
│Researcher│───►│  Writer  │───►│Reviewer  │───►│  Editor  │
│ Agent    │    │  Agent   │    │  Agent   │    │  Agent   │
└──────────┘    └──────────┘    └──────────┘    └──────────┘
    │                │               │               │
  Clean           Fresh           Dedicated       Final
  context         context         critique        polish

Multi-Agent Architecture Patterns

Pattern 1: Sequential Pipeline (Most Common)

Agents run in a fixed sequence, each consuming the previous agent's output:

// Sequential multi-agent pipeline
async function contentCreationPipeline(topic: string) {
  // Agent 1: Research
  const research = await researchAgent.run({
    task: `Research the topic: "${topic}". Find 5 key insights with sources.`,
    maxTokens: 2000,
  });

  // Agent 2: Write (uses research as context)
  const draft = await writerAgent.run({
    task: `Write a technical blog post about "${topic}".`,
    context: research.output,
    maxTokens: 3000,
  });

  // Agent 3: Code Review (only sees the draft)
  const codeReview = await codeReviewAgent.run({
    task: 'Review all code examples in this draft. Identify bugs and inaccuracies.',
    context: draft.output,
    maxTokens: 1500,
  });

  // Agent 4: Edit (uses draft + code review)
  const final = await editorAgent.run({
    task: 'Apply the code review feedback and polish the draft.',
    context: `DRAFT:\n${draft.output}\n\nCODE REVIEW:\n${codeReview.output}`,
    maxTokens: 3500,
  });

  return final.output;
}

Pattern 2: Parallel Fan-Out

Multiple agents run simultaneously and their outputs are merged:

// Parallel multi-agent analysis
async function parallelAnalysisPipeline(codebase: string) {
  // Run three specialized reviews simultaneously
  const [securityReview, performanceReview, accessibilityReview] = await Promise.all([
    securityAgent.run({ task: 'Find security vulnerabilities', context: codebase }),
    performanceAgent.run({ task: 'Identify performance bottlenecks', context: codebase }),
    a11yAgent.run({ task: 'Find accessibility issues', context: codebase }),
  ]);

  // Merge and prioritize findings
  const mergedReport = await reportAgent.run({
    task: 'Consolidate these three reviews into a prioritized action plan.',
    context: [
      `SECURITY: ${securityReview.output}`,
      `PERFORMANCE: ${performanceReview.output}`,
      `ACCESSIBILITY: ${accessibilityReview.output}`,
    ].join('\n\n'),
  });

  return mergedReport.output;
}

Pattern 3: Supervisor-Worker (Most Flexible)

A supervisor agent decomposes the task and routes subtasks to specialized workers:

// Supervisor-worker pattern
async function supervisedPipeline(userRequest: string) {
  // Supervisor decomposes the request
  const plan = await supervisorAgent.run({
    task: `Decompose this request into subtasks: "${userRequest}"
    Return a JSON array of: { taskId, agent, description, dependsOn }`,
    tools: ['list_available_agents'],
  });

  const tasks = JSON.parse(plan.output);
  const results: Record<string, string> = {};

  // Execute tasks respecting dependencies
  for (const task of topologicalSort(tasks)) {
    const dependencyContext = task.dependsOn
      .map((depId: string) => `${depId}: ${results[depId]}`)
      .join('\n');

    results[task.taskId] = await getAgent(task.agent).run({
      task: task.description,
      context: dependencyContext,
    });
  }

  // Supervisor synthesizes the final result
  const final = await supervisorAgent.run({
    task: 'Synthesize these subtask results into a final answer.',
    context: Object.entries(results)
      .map(([id, result]) => `${id}:\n${result}`)
      .join('\n\n'),
  });

  return final.output;
}

Building Agents with the Anthropic Claude SDK

// lib/agent.ts
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

interface AgentConfig {
  name: string;
  systemPrompt: string;
  model?: string;
  maxTokens?: number;
  tools?: Anthropic.Tool[];
}

interface AgentRunOptions {
  task: string;
  context?: string;
  maxTokens?: number;
}

export function createAgent(config: AgentConfig) {
  return {
    async run(options: AgentRunOptions): Promise<{ output: string; usage: object }> {
      const messages: Anthropic.MessageParam[] = [
        {
          role: 'user',
          content: options.context
            ? `CONTEXT:\n${options.context}\n\nTASK:\n${options.task}`
            : options.task,
        },
      ];

      let response = await client.messages.create({
        model: config.model ?? 'claude-sonnet-4-5',
        max_tokens: options.maxTokens ?? config.maxTokens ?? 2048,
        system: config.systemPrompt,
        messages,
        tools: config.tools,
      });

      // Handle tool use in an agentic loop
      while (response.stop_reason === 'tool_use') {
        const toolUseBlock = response.content.find(b => b.type === 'tool_use') as Anthropic.ToolUseBlock;
        const toolResult = await executeToolCall(toolUseBlock.name, toolUseBlock.input);

        messages.push({ role: 'assistant', content: response.content });
        messages.push({
          role: 'user',
          content: [{ type: 'tool_result', tool_use_id: toolUseBlock.id, content: toolResult }],
        });

        response = await client.messages.create({
          model: config.model ?? 'claude-sonnet-4-5',
          max_tokens: options.maxTokens ?? config.maxTokens ?? 2048,
          system: config.systemPrompt,
          messages,
          tools: config.tools,
        });
      }

      const textContent = response.content.find(b => b.type === 'text') as Anthropic.TextBlock;
      return { output: textContent.text, usage: response.usage };
    },
  };
}

// Specialized agents
export const researchAgent = createAgent({
  name: 'researcher',
  systemPrompt: 'You are a precise technical researcher. Extract key facts with citations. Return structured JSON.',
  maxTokens: 2000,
});

export const writerAgent = createAgent({
  name: 'writer',
  systemPrompt: 'You are a senior technical writer. Write clear, engaging, developer-focused content.',
  maxTokens: 4000,
});

export const reviewerAgent = createAgent({
  name: 'reviewer',
  systemPrompt: 'You are a critical code reviewer. Identify bugs, security issues, and inaccuracies. Be specific.',
  maxTokens: 2000,
});

When NOT to Use Multi-Agent Pipelines

Multi-agent pipelines add latency and cost. Don't use them when:

A single, well-prompted LLM call is sufficient.
Latency is critical (real-time user interactions).
The task is simple enough to fit comfortably in one context window.
You don't have a clear decomposition of subtasks.

Use multi-agent pipelines when:

The task requires conflicting roles (write + critique).
Subtasks can run in parallel to save wall-clock time.
The full task exceeds the effective context window of a single agent.
You need specialized expertise at different stages (research vs. writing vs. security review).

Conclusion

Multi-agent pipelines are the architectural pattern that takes AI from a useful tool into a scalable engineering system. By decomposing complex tasks, assigning specialized agents to each subtask, and passing structured output between them, you overcome the fundamental limitations of single-agent systems: context overflow, role conflict, and lack of verification. Start with a simple sequential pipeline, measure the output quality against a single-agent baseline, and add parallelism and supervision as your use case demands.