TUTORIALS 14 min read

Build a Production AI Agent with the Claude Agent SDK

Stop wiring agents together with LangChain and duct tape. The Claude Agent SDK gives you tool use, subagents, file system access, and hooks in a few dozen lines. Here's a full working example.

By EgoistAI ·
Build a Production AI Agent with the Claude Agent SDK

Why the Agent SDK Exists

For most of 2024 and 2025, building an LLM agent meant gluing together LangChain, a vector store, a prompt template framework, a custom tool runner, and a retry loop. Everyone did it slightly differently. Everyone’s agent was slightly broken in slightly different ways.

Anthropic released the Claude Agent SDK (originally the engine behind Claude Code) to replace that whole stack with one opinionated library. It ships with tool use, subagents, file system operations, hooks, permission prompts, and a streaming event loop — all production-tested, because it’s literally the same code that powers Claude Code’s terminal agent.

This tutorial builds a real agent: a repo-auditor that clones a Git repository, scans it for security issues, and writes a Markdown report. You’ll end up with ~100 lines of Python that would have taken 500 lines in a custom framework.


Prerequisites

  • Python 3.10+
  • An Anthropic API key (export ANTHROPIC_API_KEY=...)
  • pip install claude-agent-sdk
  • Git installed on the machine running the agent

Step 1: The Minimal Loop

Here’s the smallest possible working Agent SDK call:

import anyio
from claude_agent_sdk import query, ClaudeAgentOptions

async def main():
    options = ClaudeAgentOptions(
        system_prompt="You are a concise coding assistant.",
        allowed_tools=["Bash", "Read", "Glob", "Grep"],
    )
    async for message in query(
        prompt="List the top-level Python files in this directory and describe each.",
        options=options,
    ):
        print(message)

anyio.run(main)

Run this in a Python project and you’ll see Claude stream its tool calls: a Glob for *.py, Read on each file, then a final text response. No boilerplate. No tool-registration dance.

The SDK handles:

  • Sending the prompt with proper tool definitions
  • Executing tool calls in a sandbox
  • Feeding results back to the model
  • Looping until Claude decides it’s done
  • Streaming each event to your async generator

Step 2: Defining Custom Tools

Built-in tools cover filesystems, shells, and basic web fetching. For anything else you define your own:

from claude_agent_sdk import tool, create_sdk_mcp_server

@tool("clone_repo", "Clone a git repository into a temp directory.", {"url": str})
async def clone_repo(args):
    import subprocess, tempfile
    tmp = tempfile.mkdtemp()
    subprocess.run(["git", "clone", "--depth", "1", args["url"], tmp], check=True)
    return {
        "content": [
            {"type": "text", "text": f"Cloned to {tmp}"}
        ]
    }

repo_server = create_sdk_mcp_server(
    name="repo-tools",
    version="1.0.0",
    tools=[clone_repo],
)

The SDK exposes tools via the Model Context Protocol (MCP) under the hood, so any MCP server you’ve built for Claude Code also works here with zero changes. That’s a genuinely useful property — tools move freely between your IDE, your production agent, and your command line.


Step 3: Permission Hooks and Guardrails

Production agents need guardrails. The SDK’s hook system lets you intercept every tool call before it executes:

from claude_agent_sdk import ClaudeAgentOptions, HookMatcher

async def block_destructive(input_data, tool_use_id, context):
    cmd = input_data.get("tool_input", {}).get("command", "")
    if any(bad in cmd for bad in ["rm -rf", "sudo", "curl | sh", ":(){ :|:& };:"]):
        return {
            "hookSpecificOutput": {
                "hookEventName": "PreToolUse",
                "permissionDecision": "deny",
                "permissionDecisionReason": f"Blocked dangerous command: {cmd}",
            }
        }
    return {}

options = ClaudeAgentOptions(
    allowed_tools=["Bash", "Read", "Glob", "Grep", "Write"],
    hooks={
        "PreToolUse": [
            HookMatcher(matcher="Bash", hooks=[block_destructive]),
        ],
    },
)

The hook runs before any Bash tool call and can deny it with a reason that Claude sees. Claude will then typically try a different approach. Hooks also fire on PostToolUse, UserPromptSubmit, and other lifecycle events — enough to implement audit logging, rate limiting, output filtering, or cost caps.


Step 4: The Repo Auditor

Putting it together. This agent clones a repo, scans for a few classes of issue, and writes a Markdown report:

import anyio
from claude_agent_sdk import (
    query, ClaudeAgentOptions, tool,
    create_sdk_mcp_server, HookMatcher,
)

@tool("clone_repo", "Clone a git repository to a fresh temp directory.", {"url": str})
async def clone_repo(args):
    import subprocess, tempfile, os
    tmp = tempfile.mkdtemp(prefix="audit-")
    subprocess.run(
        ["git", "clone", "--depth", "1", args["url"], tmp],
        check=True, capture_output=True,
    )
    return {"content": [{"type": "text", "text": tmp}]}

repo_server = create_sdk_mcp_server(
    name="repo-tools", version="1.0.0", tools=[clone_repo],
)

async def log_tools(input_data, tool_use_id, context):
    print(f"[tool] {input_data.get('tool_name')} {input_data.get('tool_input')}")
    return {}

SYSTEM = """You are a security auditor. When given a repo URL:
1. Clone it with clone_repo
2. Read the directory structure with Glob
3. Grep for hardcoded secrets (API keys, tokens, passwords)
4. Grep for dangerous patterns (eval, exec, os.system, SQL string concat)
5. Read the top 5 most concerning files and assess them
6. Write a Markdown report to ./audit-report.md with: summary, findings table, recommendations
Be concise. Rank issues by severity: critical, warning, suggestion."""

async def main(repo_url: str):
    options = ClaudeAgentOptions(
        system_prompt=SYSTEM,
        allowed_tools=[
            "Bash", "Read", "Glob", "Grep", "Write",
            "mcp__repo-tools__clone_repo",
        ],
        mcp_servers={"repo-tools": repo_server},
        hooks={
            "PreToolUse": [HookMatcher(matcher="*", hooks=[log_tools])],
        },
        max_turns=40,
    )
    prompt = f"Audit this repository: {repo_url}"
    async for message in query(prompt=prompt, options=options):
        if hasattr(message, "text"):
            print(message.text)

if __name__ == "__main__":
    import sys
    anyio.run(main, sys.argv[1])

Run it:

python auditor.py https://github.com/some/repo

You’ll watch Claude clone the repo, grep for secrets, read suspicious files, then drop a Markdown report in your working directory. The whole loop is driven by the SDK’s event stream. You wrote ~90 lines.


Step 5: Subagents for Parallelism

One Claude instance auditing one repo is fine. Auditing 50 repos sequentially is slow. The SDK’s subagent mechanism lets the main agent spawn parallel workers:

options = ClaudeAgentOptions(
    system_prompt="You are an orchestrator. For each repo, spawn a subagent to audit it in parallel.",
    allowed_tools=["Bash", "Task"],  # Task is the built-in subagent spawner
    subagents={
        "auditor": {
            "description": "Audits a single repository for security issues.",
            "system_prompt": SYSTEM,
            "allowed_tools": ["Bash", "Read", "Glob", "Grep", "Write"],
        }
    },
)

The Task tool spawns a fresh auditor subagent with its own context window. The parent gets only the summary back, which keeps context usage sane. This is the same pattern Claude Code uses when it spawns itself recursively for research tasks.


Step 6: Cost Control

The Agent SDK reports usage in every event. A simple cost gate:

total_cost = 0.0
MAX_USD = 1.00

async for message in query(prompt=prompt, options=options):
    if hasattr(message, "usage") and message.usage:
        total_cost += message.usage.cost_usd
    if total_cost > MAX_USD:
        print(f"Cost cap hit: ${total_cost:.2f}")
        break

For production agents this is non-optional. Any loop where the LLM controls its own continuation must have a hard budget.


When Not To Use the Agent SDK

A few honest limitations:

  • It’s Claude-only. If you need multi-provider routing, stick with a higher-level framework.
  • It’s biased toward filesystem/shell workflows. Pure conversational agents don’t benefit as much.
  • It assumes asyncio. Sync-only codebases need a wrapper.

For any workload that does involve filesystem operations, shell commands, or multi-step tool use, the Agent SDK is the shortest path from prompt to production in 2026. It’s the reference implementation of how Anthropic thinks agents should work, and it shows.


Next Steps

  • Wire your agent into a cron job or queue worker for batch runs
  • Add a UserPromptSubmit hook that routes simple questions to Sonnet instead of Opus
  • Write a custom MCP server exposing your company’s internal APIs
  • Read the full claude-agent-sdk-python source — it’s small and worth understanding

The abstraction is thin enough that you can learn it in an afternoon and bent to nearly any shape. After years of agent frameworks that felt like cathedrals, an opinionated library built on top of MCP is a refreshing baseline.


Debugging Checklist

When your agent misbehaves, work through this list before blaming the model:

  1. Is the tool description clear? Vague tool descriptions are the number one cause of agents that “refuse to use the tool.” Rewrite the description so a human reading it cold would know exactly when to call it.
  2. Are tool inputs validated? If your tool accepts {"url": str} but silently fails on a non-URL, the agent will keep calling it with garbage and never understand why.
  3. Is the system prompt scoped? Agents that fail open-endedly usually had open-ended system prompts. Be specific about what done looks like.
  4. Are error messages actionable? Claude reads tool errors. “Error: 500” is useless. “Error: database not reachable, retry in 10s or use cached_data tool” is useful.
  5. Is max_turns too low? Complex workflows need headroom. Start at 40 and raise only if needed.
  6. Are you logging tool calls? Every production agent should log every tool call with inputs, outputs, and timing. Without this you can’t debug anything.

Following this list turns most agent failures from mysterious to trivial within minutes. Write it on a sticky note.

Share this article

> Want more like this?

Get the best AI insights delivered weekly.

> Related Articles

Tags

aitutorialclaudeagent-sdkpythontool-useagents

> Stay in the loop

Weekly AI tools & insights.