Building AI Agents with Claude Agent SDK

An AI agent is a system that uses a language model to decide what actions to take, executes those actions, observes the results, and repeats until the task is done. Unlike a simple prompt-response interaction, agents maintain state across multiple steps and can use tools to interact with the outside world.

The Claude Agent SDK gives you the building blocks to create these agents in TypeScript. This post walks through the architecture, core concepts, and a working example you can adapt for your own projects.

What Makes an Agent Different from a Chatbot

A chatbot takes input, generates output, done. An agent runs in a loop:

Receive a task or observation
Decide what to do next (reason)
Execute an action using a tool
Observe the result
Repeat from step 2 until the task is complete

The key difference is autonomy. You give an agent a goal, not step-by-step instructions. It figures out the path.

Claude Agent SDK Overview

The SDK provides:

Agent class — The core runtime that manages the reasoning loop
Tool definitions — Typed functions the agent can call
Message history — Automatic conversation management
Streaming — Real-time output as the agent works
Guardrails — Input and output validation

Install it:

npm install @anthropic-ai/agent-sdk

Building a Code Review Agent

Let's build something practical: an agent that reviews pull requests. Given a diff, it analyzes code quality, identifies potential bugs, checks for security issues, and provides actionable feedback.

Step 1: Define the Tools

Tools are functions the agent can call. Each tool has a name, description, input schema, and an execute function.

import { Tool } from "@anthropic-ai/agent-sdk";
import { z } from "zod";
import { execSync } from "child_process";
import { readFileSync } from "fs";

const readFileTool: Tool = {
  name: "read_file",
  description: "Read the contents of a file at the given path",
  inputSchema: z.object({
    path: z.string().describe("Absolute file path to read"),
  }),
  async execute({ path }) {
    try {
      const content = readFileSync(path, "utf-8");
      return { content };
    } catch (error) {
      return { error: `Failed to read ${path}: ${error.message}` };
    }
  },
};

const gitDiffTool: Tool = {
  name: "git_diff",
  description: "Get the git diff for a branch compared to main",
  inputSchema: z.object({
    branch: z.string().describe("Branch name to diff against main"),
  }),
  async execute({ branch }) {
    try {
      const diff = execSync(`git diff main...${branch}`, {
        encoding: "utf-8",
        maxBuffer: 1024 * 1024 * 10,
      });
      return { diff };
    } catch (error) {
      return { error: `Failed to get diff: ${error.message}` };
    }
  },
};

const listChangedFilesTool: Tool = {
  name: "list_changed_files",
  description: "List files changed in a branch compared to main",
  inputSchema: z.object({
    branch: z.string().describe("Branch name"),
  }),
  async execute({ branch }) {
    try {
      const files = execSync(`git diff --name-only main...${branch}`, {
        encoding: "utf-8",
      });
      return { files: files.trim().split("\n") };
    } catch (error) {
      return { error: `Failed to list files: ${error.message}` };
    }
  },
};

const searchCodeTool: Tool = {
  name: "search_code",
  description: "Search for a pattern across the codebase using grep",
  inputSchema: z.object({
    pattern: z.string().describe("Regex pattern to search for"),
    fileGlob: z.string().optional().describe("File glob to filter (e.g. *.ts)"),
  }),
  async execute({ pattern, fileGlob }) {
    try {
      const globFlag = fileGlob ? `--include='${fileGlob}'` : "";
      const results = execSync(
        `grep -rn ${globFlag} '${pattern}' . --include='*.{ts,tsx,js,jsx}'`,
        { encoding: "utf-8", maxBuffer: 1024 * 1024 }
      );
      return { results };
    } catch {
      return { results: "No matches found" };
    }
  },
};

Step 2: Create the Agent

import { Agent } from "@anthropic-ai/agent-sdk";

const codeReviewAgent = new Agent({
  name: "Code Review Agent",
  model: "claude-sonnet-4-20250514",
  instructions: `You are a senior code reviewer. Given a branch name, you will:
1. List all changed files
2. Read the diff to understand the changes
3. Read full files when you need more context
4. Search the codebase for related patterns when needed
5. Provide a structured review with:
   - Summary of changes
   - Potential bugs or logic errors
   - Security concerns
   - Performance issues
   - Style/convention violations
   - Specific suggestions with code examples

Be direct and actionable. Don't praise obvious things. Focus on what could break or be improved.`,
  tools: [readFileTool, gitDiffTool, listChangedFilesTool, searchCodeTool],
});

Step 3: Run the Agent

import { Runner } from "@anthropic-ai/agent-sdk";

async function reviewPR(branch: string) {
  const runner = new Runner();

  const result = await runner.run(codeReviewAgent, {
    messages: [
      {
        role: "user",
        content: `Review the code changes on branch "${branch}" compared to main. Be thorough but concise.`,
      },
    ],
  });

  console.log(result.finalOutput);
}

reviewPR("feature/add-user-search");

When you run this, the agent will:

Call list_changed_files to see what's been modified
Call git_diff to read the actual changes
Call read_file on specific files for full context
Call search_code to check for related patterns or potential impacts
Synthesize everything into a structured review

Each tool call is a decision the agent makes based on what it's learned so far.

Multi-Step Reasoning

The agent loop is where the magic happens. After each tool call, Claude sees the result and decides what to do next. This creates emergent behavior — the agent adapts its strategy based on what it finds.

For example, if the diff shows a change to a database query, the agent might:

Read the full model file for context
Search for other queries that use the same table
Check if there's a migration that needs updating
Look for tests that cover this code path

You didn't program this sequence. The agent figured it out because its instructions say to be thorough and it has the tools to investigate.

Error Handling

Agents will encounter errors — files that don't exist, commands that fail, unexpected data formats. Build resilience into your tools:

const robustTool: Tool = {
  name: "read_file",
  description: "Read a file's contents",
  inputSchema: z.object({ path: z.string() }),
  async execute({ path }) {
    try {
      const content = readFileSync(path, "utf-8");

      // Truncate very large files to avoid context limits
      if (content.length > 50000) {
        return {
          content: content.slice(0, 50000),
          truncated: true,
          totalLength: content.length,
          note: "File was truncated. Request specific line ranges if needed.",
        };
      }

      return { content };
    } catch (error) {
      if (error.code === "ENOENT") {
        return { error: `File not found: ${path}` };
      }
      if (error.code === "EACCES") {
        return { error: `Permission denied: ${path}` };
      }
      return { error: `Unexpected error reading ${path}: ${error.message}` };
    }
  },
};

Return errors as data, not exceptions. The agent can read error messages and adjust its approach — maybe it'll try a different file path or ask for clarification.

Guardrails

The SDK supports input and output guardrails to keep agents on track:

const agent = new Agent({
  name: "Code Review Agent",
  model: "claude-sonnet-4-20250514",
  instructions: "...",
  tools: [...],
  inputGuardrails: [
    {
      name: "branch_validation",
      async execute({ input }) {
        // Don't allow reviewing main directly
        if (input.includes("main") && !input.includes("compared to main")) {
          return { blocked: true, reason: "Cannot review the main branch itself." };
        }
        return { blocked: false };
      },
    },
  ],
  outputGuardrails: [
    {
      name: "no_secrets",
      async execute({ output }) {
        const secretPatterns = [/API_KEY\s*=\s*\S+/, /password\s*[:=]\s*\S+/i];
        for (const pattern of secretPatterns) {
          if (pattern.test(output)) {
            return { blocked: true, reason: "Output contains potential secrets." };
          }
        }
        return { blocked: false };
      },
    },
  ],
});

Handoffs Between Agents

For complex workflows, you can chain specialized agents. A triage agent decides what kind of review is needed, then hands off to specialized agents:

const securityReviewer = new Agent({
  name: "Security Reviewer",
  model: "claude-sonnet-4-20250514",
  instructions: "Focus exclusively on security vulnerabilities...",
  tools: [...],
});

const performanceReviewer = new Agent({
  name: "Performance Reviewer",
  model: "claude-sonnet-4-20250514",
  instructions: "Focus exclusively on performance issues...",
  tools: [...],
});

const triageAgent = new Agent({
  name: "Review Triage",
  model: "claude-haiku-4-20250514",
  instructions: "Analyze the changed files and delegate to the appropriate reviewer.",
  tools: [...],
  handoffs: [securityReviewer, performanceReviewer],
});

The triage agent uses a cheaper, faster model to decide routing, then hands off to specialized agents that do deep analysis. This saves cost and improves quality.

Deployment Tips

Keep tools focused. Each tool should do one thing well. An agent with 5 focused tools outperforms one with 20 kitchen-sink tools because the model makes better decisions with clear options.

Log everything. Record every tool call, input, output, and the agent's reasoning. You'll need this for debugging when the agent does something unexpected.

Set token limits. Agents can get into loops. Set a maximum token budget or step count to prevent runaway sessions.

Test with diverse inputs. Agents are non-deterministic. The same input might produce different tool call sequences. Test with varied scenarios and edge cases.

Start with Claude Sonnet. It's the best balance of capability and cost for agent workloads. Use Opus for tasks requiring deep reasoning. Use Haiku for triage and simple routing.

AI agents are the most practical application of language models today. They turn AI from a suggestion engine into a collaborator that takes action. The Claude Agent SDK makes building them straightforward — define your tools, write clear instructions, and let the reasoning loop handle the rest.

For a broader look at the agentic AI landscape — including Claude Code, MCP, and multi-agent systems — read Agentic AI: When Your Code Writes, Tests, and Deploys Itself.