🌊Advanced

Deep Agents

Why Claude Code qualifies as a Deep Agent — and the four properties that enable long-horizon execution.

Agent Taxonomy

Not all agents are the same. The broadest category contains every production agent — fully autonomous systems and agentic applications. Inside that lives the ReAct loop, which initiated the modern agent paradigm. Deep Agents are a subset that handle long-horizon tasks. Coding agents like Claude Code, Cursor, and Devin are a further specialization within Deep Agents.

Layer	Examples	Best For
All agents	Hybrid RAG, classifier agents, decision routers	Any LLM-orchestrated workflow
Shallow / ReAct	Search-augmented chatbots, single-tool wrappers	1-2 iterations, tightly scoped tasks
Deep Agents	Deep research, GPT Researcher, coding agents	Long-horizon tasks, minutes to days
Coding agents	Claude Code, Cursor, Devin, Gemini CLI	Multi-step software engineering work

Why ReAct Breaks at Scale

ReAct (Reason + Act) is the foundational agent loop: LLM decides → tool runs → observation injected → LLM decides again. Works for one or two iterations. Breaks for long-horizon tasks because every iteration adds the full tool result back into context. Context grows linearly with iterations, then context rot kicks in — confusion, contradictions, pollution. Cost rises and quality degrades simultaneously.

⚠️Warning

ReAct is not wrong — it is foundational. It just is not designed for tasks that need 50+ iterations of reasoning, file reads, and tool calls. That is the gap Deep Agents fill.

What Makes an Agent Deep

There is no formal definition. In practice, an agent is deep if it can execute complex, long-running tasks with quality and reliability. Most modern Deep Agents share four properties — Claude Code implements all of them.

▸Planning tool — explicit to-do list, dynamically updated as work progresses
▸Subagent capabilities — specialized workers in isolated contexts for hierarchical delegation
▸Filesystem for intermediate state — persistent storage of intermediate results, not retained in context
▸Large system prompt — comprehensive instructions, constraints, and operational guidance

Property 1: The Planning Tool

Deep Agents do not rely on implicit chain-of-thought planning inside the model. They use explicit planning tools. In Claude Code, this surfaces as TodoWrite and TodoRead actions. The plan is dynamic — tasks marked pending, in_progress, or completed. Failed tasks do not retry blindly; the planning tool steers execution in a controlled manner.

text

# Visible in Claude Code as the agent works
Update Todos
  ☒ Research existing authentication patterns in codebase
  ☒ Design JWT token structure and refresh flow
  ☐ Implement auth middleware
  ☐ Create login/logout API endpoints
  ☐ Add password hashing with bcrypt
  ☐ Write integration tests for auth flow
  ☐ Update API documentation

Property 2: Hierarchical Delegation via Subagents

Subagents enable the main agent to spawn specialized workers. Each subagent runs in its own context with its own tools and system prompt. They execute their own internal ReAct loops in isolation, then return only the final response. Intermediate observations never pollute the main context.

In Claude Code, this manifests as the Explore subagent (and any custom subagents you define). When the main agent encounters work that benefits from isolation — codebase exploration, focused research, parallel review — it delegates instead of executing directly.

💡Tip

Hierarchical delegation mirrors real engineering teams. A tech lead does not personally inspect every file — they delegate to specialists who report back with conclusions. Claude Code does the same: main agent stays at the architectural level, subagents handle the deep dives.

Property 3: The Filesystem as a Context Engine

Claude Code exposes Read, Write, Edit, Glob, Grep, and NotebookEdit. These are not just I/O tools — they are the mechanism that prevents context rot on long-horizon tasks. Intermediate results, scratch notes, and structured artifacts get written to disk instead of accumulating in the LLM's context window. When needed later, Glob and Grep retrieve precisely what is required.

Box	Meaning
Box 1 (everything)	All available context: codebase, docs, web, databases
Box 2 (selected)	What the agent pulls into the context window for a step
Box 3 (needed)	What the agent actually requires to complete the task

Failure modes: under-retrieval (missed needed info), over-retrieval (noise dilutes signal), misaligned retrieval (searching wrong place), window overflow (context too large). The filesystem lets the agent narrow Box 2 to match Box 3 — read targeted files, search precisely, page large content with offset/limit.

Property 4: A Large, Carefully Engineered System Prompt

Industry leaders dedicate immense engineering resources to system prompts. They span hundreds of lines, evolve continuously with the model, and define the agent's reasoning, identity, and boundaries. A great system prompt does not hardcode workflows — it teaches the model how to reason.

▸Clear identity & scope — what the agent is and is not ("customer support, not sales")
▸Empowers, not constrains — defines goals, lets the model pick tools
▸Reasoning framework, not flowchart — repeatable approach (Identify → Gather → Resolve → Confirm) instead of branching logic
▸Heuristic boundaries — "always choose the simplest solution" beats listing 1,000 edge cases
▸Language efficiency — no repetition, no contradictory instructions

markdown

# Excerpt from LangChain DeepAgents base prompt (MIT-licensed reference)

You are a Deep Agent, an AI assistant that helps users
accomplish tasks using tools.

## Core Behavior
- Be concise and direct. Don't over-explain unless asked.
- NEVER add unnecessary preamble ("Sure!", "I'll now...").
- Don't say "I'll now do X" — just do it.

## Doing Tasks
1. **Understand first** — read relevant files, check existing
   patterns. Quick but thorough.
2. **Act** — implement the solution. Work quickly but accurately.
3. **Verify** — check your work against what was asked, not against
   your own output. First attempt is rarely correct — iterate.

## Tool Usage
- Use specialized tools over shell equivalents (read_file > cat,
  edit_file > sed)
- When performing multiple independent operations, make all tool
  calls in a single response.

## File Reading Best Practices
- Start with read_file(path, limit=100) to scan structure
- Read targeted sections with offset/limit
- Only read full files when necessary for editing

Why This Matters for Claude Code Users

Understanding the Deep Agent architecture explains many Claude Code behaviors: why it builds a to-do list before complex work, why it spawns Explore subagents instead of inline searching, why it prefers Glob and Grep over reading everything, and why CLAUDE.md (acting as a high-quality system prompt extension) has such a strong effect on output quality. The four properties are not abstract theory — they shape every session.

ℹ️Info

The application layer (the harness around the LLM) is where most current AI engineering innovation happens. Base models improve gradually; harnesses leap forward by composing planning, delegation, filesystem, and prompt engineering into something fundamentally more capable than the underlying LLM alone.

← PreviousDesktop & Git Worktrees Next →Use Cases & Success Criteria