Deep Agents
Why Claude Code qualifies as a Deep Agent — and the four properties that enable long-horizon execution.
Agent Taxonomy
Not all agents are the same. The broadest category contains every production agent — fully autonomous systems and agentic applications. Inside that lives the ReAct loop, which initiated the modern agent paradigm. Deep Agents are a subset that handle long-horizon tasks. Coding agents like Claude Code, Cursor, and Devin are a further specialization within Deep Agents.
| Layer | Examples | Best For |
|---|---|---|
| All agents | Hybrid RAG, classifier agents, decision routers | Any LLM-orchestrated workflow |
| Shallow / ReAct | Search-augmented chatbots, single-tool wrappers | 1-2 iterations, tightly scoped tasks |
| Deep Agents | Deep research, GPT Researcher, coding agents | Long-horizon tasks, minutes to days |
| Coding agents | Claude Code, Cursor, Devin, Gemini CLI | Multi-step software engineering work |
Why ReAct Breaks at Scale
ReAct (Reason + Act) is the foundational agent loop: LLM decides → tool runs → observation injected → LLM decides again. Works for one or two iterations. Breaks for long-horizon tasks because every iteration adds the full tool result back into context. Context grows linearly with iterations, then context rot kicks in — confusion, contradictions, pollution. Cost rises and quality degrades simultaneously.
ReAct is not wrong — it is foundational. It just is not designed for tasks that need 50+ iterations of reasoning, file reads, and tool calls. That is the gap Deep Agents fill.
What Makes an Agent Deep
There is no formal definition. In practice, an agent is deep if it can execute complex, long-running tasks with quality and reliability. Most modern Deep Agents share four properties — Claude Code implements all of them.
- ▸Planning tool — explicit to-do list, dynamically updated as work progresses
- ▸Subagent capabilities — specialized workers in isolated contexts for hierarchical delegation
- ▸Filesystem for intermediate state — persistent storage of intermediate results, not retained in context
- ▸Large system prompt — comprehensive instructions, constraints, and operational guidance
Property 1: The Planning Tool
Deep Agents do not rely on implicit chain-of-thought planning inside the model. They use explicit planning tools. In Claude Code, this surfaces as TodoWrite and TodoRead actions. The plan is dynamic — tasks marked pending, in_progress, or completed. Failed tasks do not retry blindly; the planning tool steers execution in a controlled manner.
# Visible in Claude Code as the agent works
Update Todos
☒ Research existing authentication patterns in codebase
☒ Design JWT token structure and refresh flow
☐ Implement auth middleware
☐ Create login/logout API endpoints
☐ Add password hashing with bcrypt
☐ Write integration tests for auth flow
☐ Update API documentationProperty 2: Hierarchical Delegation via Subagents
Subagents enable the main agent to spawn specialized workers. Each subagent runs in its own context with its own tools and system prompt. They execute their own internal ReAct loops in isolation, then return only the final response. Intermediate observations never pollute the main context.
In Claude Code, this manifests as the Explore subagent (and any custom subagents you define). When the main agent encounters work that benefits from isolation — codebase exploration, focused research, parallel review — it delegates instead of executing directly.
Hierarchical delegation mirrors real engineering teams. A tech lead does not personally inspect every file — they delegate to specialists who report back with conclusions. Claude Code does the same: main agent stays at the architectural level, subagents handle the deep dives.
Property 3: The Filesystem as a Context Engine
Claude Code exposes Read, Write, Edit, Glob, Grep, and NotebookEdit. These are not just I/O tools — they are the mechanism that prevents context rot on long-horizon tasks. Intermediate results, scratch notes, and structured artifacts get written to disk instead of accumulating in the LLM's context window. When needed later, Glob and Grep retrieve precisely what is required.
| Box | Meaning |
|---|---|
| Box 1 (everything) | All available context: codebase, docs, web, databases |
| Box 2 (selected) | What the agent pulls into the context window for a step |
| Box 3 (needed) | What the agent actually requires to complete the task |
Failure modes: under-retrieval (missed needed info), over-retrieval (noise dilutes signal), misaligned retrieval (searching wrong place), window overflow (context too large). The filesystem lets the agent narrow Box 2 to match Box 3 — read targeted files, search precisely, page large content with offset/limit.
Property 4: A Large, Carefully Engineered System Prompt
Industry leaders dedicate immense engineering resources to system prompts. They span hundreds of lines, evolve continuously with the model, and define the agent's reasoning, identity, and boundaries. A great system prompt does not hardcode workflows — it teaches the model how to reason.
- ▸Clear identity & scope — what the agent is and is not ("customer support, not sales")
- ▸Empowers, not constrains — defines goals, lets the model pick tools
- ▸Reasoning framework, not flowchart — repeatable approach (Identify → Gather → Resolve → Confirm) instead of branching logic
- ▸Heuristic boundaries — "always choose the simplest solution" beats listing 1,000 edge cases
- ▸Language efficiency — no repetition, no contradictory instructions
# Excerpt from LangChain DeepAgents base prompt (MIT-licensed reference)
You are a Deep Agent, an AI assistant that helps users
accomplish tasks using tools.
## Core Behavior
- Be concise and direct. Don't over-explain unless asked.
- NEVER add unnecessary preamble ("Sure!", "I'll now...").
- Don't say "I'll now do X" — just do it.
## Doing Tasks
1. **Understand first** — read relevant files, check existing
patterns. Quick but thorough.
2. **Act** — implement the solution. Work quickly but accurately.
3. **Verify** — check your work against what was asked, not against
your own output. First attempt is rarely correct — iterate.
## Tool Usage
- Use specialized tools over shell equivalents (read_file > cat,
edit_file > sed)
- When performing multiple independent operations, make all tool
calls in a single response.
## File Reading Best Practices
- Start with read_file(path, limit=100) to scan structure
- Read targeted sections with offset/limit
- Only read full files when necessary for editingWhy This Matters for Claude Code Users
Understanding the Deep Agent architecture explains many Claude Code behaviors: why it builds a to-do list before complex work, why it spawns Explore subagents instead of inline searching, why it prefers Glob and Grep over reading everything, and why CLAUDE.md (acting as a high-quality system prompt extension) has such a strong effect on output quality. The four properties are not abstract theory — they shape every session.
The application layer (the harness around the LLM) is where most current AI engineering innovation happens. Base models improve gradually; harnesses leap forward by composing planning, delegation, filesystem, and prompt engineering into something fundamentally more capable than the underlying LLM alone.