Agentic Exploration Protocol

Systematic Discovery Methodology

A framework for LLM agents to systematically explore and understand complex systems, codebases, research domains, and problem spaces.

📄 Framework v2.0 🎯 For AI Agents 📐 10 Principles

The Meta-Pattern

Exploration is not optimization. You're not minimizing a metric—you're reducing uncertainty.

UNKNOWN SPACE
[SYSTEMATIC PROBING]
MENTAL MODEL

Your job is to:

  1. Bound — Define what you're exploring and what's out of scope
  2. Map — Build a structural overview before diving deep
  3. Probe — Test your model with targeted investigations
  4. Synthesize — Construct coherent understanding from fragments

Principle 1 Uncertainty Mapping

Before exploring, inventory what you don't know.

The Uncertainty Inventory

## Known Knowns
Things I understand and can explain.

## Known Unknowns
Specific questions I need to answer.

## Suspected Structure
Hypotheses about how things connect.

## Boundary Conditions
What's in scope vs out of scope.

## Unknown Unknowns (meta)
Areas where I don't even know what questions to ask.

Why This Matters

Exploration without structure becomes wandering. The inventory:

Uncertainty Types

TypeDescriptionExploration Strategy
StructuralHow is it organized?Map → hierarchy, dependencies
BehavioralWhat does it do?Probe → inputs, outputs, edge cases
CausalWhy does it work this way?Trace → history, constraints, decisions
BoundaryWhere are the edges?Test → limits, failure modes
ConceptualWhat are the key abstractions?Synthesize → patterns, vocabulary

Principle 2 Exploration Journal

Your memory is unreliable. The journal is ground truth.

Schema

## Session [N]: [Focus Area]

### Starting State
- What I thought I knew: [summary]
- Open questions: [list]
- Hypothesis: [what I expect to find]

### Explorations

#### Probe 1: [Description]
- Action: What I did
- Found: What I observed
- Implies: What this means for my model
- New questions: What this raises

#### Probe 2: [Description]
...

### Session Synthesis
- Model updates: How my understanding changed
- Confirmed: Hypotheses that held
- Refuted: Hypotheses that failed
- Deferred: Questions for later
- Connections: Links to other areas

Journal Discipline

  1. Capture observations raw — Don't interpret while recording
  2. Separate observation from inference — "I saw X" vs "This means Y"
  3. Track model evolution — How did your understanding change?
  4. Note surprises — Surprises reveal model gaps
  5. Cross-reference — Link related discoveries

Principle 3 Exploration Modes

Mode 1: Survey (Breadth-First)

Goal: Structural overview. Map the territory.

Action: Scan everything at shallow depth
Output: Inventory of components, rough categorization
When: Starting exploration, entering new area

Tactics:

Survey Questions:

Mode 2: Dive (Depth-First)

Goal: Deep understanding of specific area.

Action: Exhaustive investigation of one component
Output: Complete model of that component
When: Survey identified high-priority area

Tactics:

Dive Questions:

Mode 3: Trace (Follow the Thread)

Goal: Understand flow across boundaries.

Action: Follow a specific path through the system
Output: End-to-end understanding of one flow
When: Need to understand integration

Tactics:

Trace Questions:

Mode 4: Probe (Test the Model)

Goal: Validate understanding through prediction.

Action: Make predictions, then verify
Output: Calibrated confidence in mental model
When: Have hypothesis to test

Tactics:

Probe Protocol:

1. State prediction: "I expect X because [reasoning]"
2. Investigate: Look at actual behavior/code/data
3. Compare: Did reality match prediction?
4. If match: Confidence increases
5. If mismatch: WHY? Update model.

Mode 5: Synthesize (Build Understanding)

Goal: Unify fragments into coherent model.

Action: Step back and integrate findings
Output: Articulated mental model
When: Accumulated enough observations

Tactics:

Synthesis Questions:


Principle 4 Agent Architecture

Role Decomposition

SURVEYOR
broad, fast, pattern-matching
Input: Raw territory
Output: Structural map, inventory
When: Starting, entering new area
  • High coverage, low depth
  • Good at pattern recognition
  • Identifies "what exists"
  • Flags anomalies for later
DIVER
deep, thorough, detail-oriented
Input: Specific component/question
Output: Complete understanding
When: Priority area identified
  • Low coverage, high depth
  • Follows every branch
  • Misses nothing in scope
  • Answers "how does this work"
TRACER
flow-oriented, cross-cutting
Input: Specific scenario/flow
Output: End-to-end understanding
When: Need integration view
  • Follows threads across boundaries
  • Maintains context through transitions
  • Identifies handoffs
  • Answers "how do pieces connect"
SYNTHESIZER
pattern-finding, model-building
Input: Accumulated observations
Output: Unified mental model
When: Enough material gathered
  • Sees forest, not just trees
  • Identifies recurring patterns
  • Names concepts
  • Produces teachable explanation
CHALLENGER
adversarial, model-testing
Input: Current mental model
Output: Holes, edge cases
When: Model feels "too clean"
  • Actively tries to break the model
  • Asks "what about...?"
  • Finds edge cases
  • Prevents false confidence

Coordination Protocol

while uncertainty > acceptable_threshold:

    # Phase 1: Survey (if entering new area)
    if new_territory:
        map = SURVEYOR.scan(territory)
        questions = extract_questions(map)
        priorities = rank_by_uncertainty(questions)

    # Phase 2: Investigate priority areas
    for question in priorities[:K]:

        if question.type == STRUCTURAL:
            finding = DIVER.investigate(question.target)

        elif question.type == FLOW:
            finding = TRACER.follow(question.scenario)

        elif question.type == BEHAVIORAL:
            finding = DIVER.probe(question.target)

        journal.record(question, finding)

    # Phase 3: Synthesize periodically
    if journal.entries > synthesis_threshold:
        model = SYNTHESIZER.integrate(journal)

        # Phase 4: Challenge the model
        holes = CHALLENGER.attack(model)
        if holes:
            priorities.extend(holes)
        else:
            uncertainty = estimate_remaining(model)

    # Phase 5: Decide next focus
    if stuck_in_one_area:
        switch_to_adjacent_area()
    if model.confidence > threshold:
        mark_area_understood()

Principle 5 Exploration Heuristics

Heuristic 1: Entry Points First

Start where users/callers start: main(), index.html, API endpoints. Public interfaces before internals. High-traffic paths before edge cases.

Why: Entry points reveal intended usage and core flows.

Heuristic 2: Follow the Data

When confused, trace data movement. Where does input come from? What transforms happen? Where does output go? What persists vs. what's ephemeral?

Why: Data flow is often clearer than control flow.

Heuristic 3: Read the Tests

Tests reveal: intended behavior, edge cases the authors worried about, integration boundaries, "happy path" assumptions.

Why: Tests are executable documentation of expectations.

Heuristic 4: Identify the Core

Every system has essential vs. accidental complexity. What's the minimum viable version? What could you remove and still have it work?

Why: Understanding the core accelerates understanding the rest.

Heuristic 5: Name What You Find

If you can't name it, you don't understand it. Create vocabulary as you go. "The X pattern" — name recurring structures. Naming forces clarity.

Why: Vocabulary is crystallized understanding.

Heuristic 6: Explain to Probe

Try to explain what you've found. Where does the explanation break down? What can't you articulate clearly? Those gaps are understanding gaps.

Why: Teaching reveals holes in mental models.

Heuristic 7: Hunt Anomalies

Pay special attention to things that don't fit: unexpected dependencies, naming inconsistencies, code that "shouldn't be there", historical artifacts.

Why: Anomalies often reveal important history or constraints.


Principle 6 The Depth Protocol

When you're going too deep or too shallow, recalibrate.

Too Shallow Detection

Signs you need to go deeper:

  • Can describe WHAT but not HOW
  • Predictions fail when tested
  • Can't explain edge cases
  • Understanding feels "slippery"

Action: Pick one component, DIVER mode, exhaust it.

Too Deep Detection

Signs you need to pull back:

  • Losing sight of overall structure
  • Details not connecting to big picture
  • Diminishing returns on investigation
  • Missing adjacent important areas

Action: SURVEYOR mode, map adjacent territory.

Depth Calibration Questions

1. Could I implement this from my understanding?
   No → probably too shallow

2. Could I explain this to someone in 2 minutes?
   No → might be too deep (or too shallow)

3. Do I know where this fits in the bigger picture?
   No → too deep without context

4. Can I predict what I'll find in adjacent areas?
   No → haven't extracted patterns yet

Principle 7 Exploration Patterns

Pattern: Concentric Rings

Start at center, expand outward in rings.

Ring 0: Entry point / main concept
Ring 1: Direct dependencies / immediate context
Ring 2: Secondary dependencies / broader context
Ring 3: Ecosystem / external integrations

Each ring complete before next.

When: Clear center exists. Good for codebase with obvious main.

Pattern: Key Questions

Start with critical questions, investigate to answer.

1. What is the #1 thing I need to understand?
2. Investigate until answered.
3. What's the next most critical question?
4. Repeat.

Let questions drive exploration path.

When: Specific goals exist. Good for targeted investigation.

Pattern: Comparative

Understand by comparing to known similar things.

1. What does this remind me of?
2. How is it similar?
3. How is it different?
4. What explains the differences?

Build understanding through contrast.

When: Familiar reference points exist. Good for learning new framework.

Pattern: Historical

Understand the present through the past.

1. What was version 0?
2. What changed and why?
3. What constraints shaped decisions?
4. What's vestigial vs. essential?

Git history, release notes, design docs.

When: System seems historically contingent. Good for legacy code.

Pattern: Adversarial

Understand by trying to break.

1. What could go wrong?
2. What are the trust boundaries?
3. Where are the assumptions?
4. What happens at the edges?

Security/reliability mindset.

When: Need to understand robustness. Good for security audit.

Pattern: Constructive

Understand by mentally rebuilding.

1. If I were building this, what would I need?
2. What problems would I face?
3. How would I solve them?
4. How does actual compare to my imagined version?

Predict then compare.

When: System is large/complex. Good for architecture understanding.


Principle 8 Synthesis Artifacts

Exploration produces artifacts, not just knowledge.

Artifact: Concept Map
Nodes: Key concepts, components
Edges: Relationships (uses, contains, depends-on, etc.)
Annotations: Brief descriptions

Visual representation of system structure.
Artifact: Glossary
Term: Definition in context of this system.

Build shared vocabulary.
Important: Note where this system's usage
differs from common usage.
Artifact: Decision Record
Decision: What choice was made
Context: What constraints existed
Alternatives: What was considered
Rationale: Why this choice

Captures the "why" that code can't show.
Artifact: Scenario Traces
Trigger: What initiates the flow
Steps: Numbered sequence through system
Data: What transforms at each step
Result: What outcome

Executable understanding of key flows.
Artifact: Unknown Log
Question: What I still don't understand
Attempts: What I tried to find out
Blocker: Why I couldn't answer
Priority: How important to resolve

Explicit acknowledgment of gaps.

Principle 9 Knowing When You're Done

Completion Criteria

"Good Enough" Thresholds

ContextCompletion Standard
Quick orientationSurvey complete, key concepts named
Working in codebaseCan modify safely, know impact radius
DebuggingCan trace issue, know relevant components
Architecture reviewCan critique decisions, identify risks
Full ownershipCould rewrite from scratch

The "Teach Test"

Try to explain your understanding.
Where does the explanation:
- Feel confident?    → Actually understood
- Get hand-wavy?     → Partially understood
- Require hedging?   → Not understood

The explanation reveals your actual knowledge.

Principle 10 Memory System

Don't use markdown files. Use a queryable database with Zettelkasten-style linking.

Architecture: SQLite + Embeddings + Links

┌─────────────────────────────────────────────────────────────┐ │ AGENT MEMORY SYSTEM │ ├─────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ FINDINGS │ │ SESSIONS │ │ LINKS │ │ │ │ (exploration)│◄─►│ (contexts) │◄─►│ (zettelkasten│ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ │ │ │ │ └────────┬────────┴──────────┬──────┘ │ │ ▼ ▼ │ │ ┌─────────────┐ ┌─────────────┐ │ │ │ TAGS │ │ EMBEDDINGS │ │ │ │ (fast filter│ │ (semantic │ │ │ └─────────────┘ │ search) │ │ │ └─────────────┘ │ │ │ │ Storage: SQLite (portable, queryable, auditable) │ │ Search: sqlite-vec for vector similarity │ │ Links: Zettelkasten-style bidirectional relations │ └─────────────────────────────────────────────────────────────┘

Schema for Exploration

-- Exploration findings/observations
CREATE TABLE findings (
    id INTEGER PRIMARY KEY,
    session_id TEXT NOT NULL,
    area TEXT NOT NULL,         -- module, file, concept
    finding_type TEXT,           -- structural|behavioral|causal|boundary|conceptual
    content TEXT NOT NULL,
    implications TEXT,
    questions TEXT,              -- JSON array of new questions
    confirmed INTEGER,           -- has this been validated?
    created_at TIMESTAMP,
    embedding BLOB               -- for semantic search
);

-- Zettelkasten-style links between findings
CREATE TABLE links (
    from_id INTEGER,
    to_id INTEGER,
    relation TEXT,              -- supports|contradicts|refines|questions|similar
    note TEXT
);

-- Fast tag-based filtering
CREATE TABLE tags (
    finding_id INTEGER,
    tag TEXT
);

Usage

from agent_memory import AgentMemory

mem = AgentMemory("./exploration.db")

# Log a finding
finding_id = mem.log_finding(
    session_id="codebase_explore_001",
    area="src/encoder",
    finding_type="structural",
    content="Encoder uses varint for all integer types",
    implications="Space-efficient but CPU cost on decode",
    questions=["Why not fixed-width for small ints?"],
    tags=["encoder", "varint", "design-decision"]
)

# Link to related finding
mem.add_link(
    from_id=finding_id, from_type="finding",
    to_id=3, to_type="finding",
    relation="supports",
    note="Both relate to space optimization"
)

# Query: all structural findings in encoder
findings = mem.get_findings(
    area="encoder",
    finding_type="structural"
)

# Query: unconfirmed findings (need validation)
uncertain = mem.get_findings(confirmed_only=False)

# Semantic search
related = mem.search_text("memory allocation")

Finding Types Map to Uncertainty Types

Finding TypeUncertainty Being ReducedTypical Tags
structuralHow is it organized?module, component, dependency
behavioralWhat does it do?api, input, output, edge-case
causalWhy does it work this way?design-decision, history, constraint
boundaryWhere are the edges?limit, failure-mode, assumption
conceptualWhat are the key abstractions?pattern, vocabulary, mental-model

Why This Works


Quick Reference Card

THE EXPLORATION LOOP

 1. BOUND      What am I exploring? What's out of scope?
 2. SURVEY     What exists? How is it organized?
 3. QUESTION   What don't I understand? Prioritize.
 4. DIVE       Investigate priority unknowns.
 5. TRACE      Follow flows across boundaries.
 6. PROBE      Test mental model with predictions.
 7. RECORD     Journal everything. Trust the log.
 8. SYNTHESIZE Build unified understanding.
 9. CHALLENGE  Attack the model. Find holes.
10. ITERATE    Until "done enough" for context.

Agent Roles

SURVEYOR Broad mapping (fast, shallow)
DIVER Deep investigation (slow, thorough)
TRACER Follow flows (cross-cutting)
SYNTHESIZER Build models (pattern-finding)
CHALLENGER Test models (adversarial)

Exploration Modes

SURVEY What exists? (breadth-first)
DIVE How does this work? (depth-first)
TRACE How does it flow? (follow-thread)
PROBE Is my model right? (predict-check)
SYNTHESIZE What's the theory? (step-back)

Key Heuristics

Depth Calibration

Too shallow:              Too deep:
• Know WHAT not HOW        • Lost big picture
• Predictions fail         • Details disconnected
• Can't explain edges      • Diminishing returns
→ Go deeper: DIVER         → Pull back: SURVEYOR

Appendix: Domain Instantiation

Codebase Exploration
Survey:     File tree, README, entry points
Dive:       Core module, critical path
Trace:      Main user flow, error handling
Probe:      "If I change X, what breaks?"
Synthesize: Architecture diagram, module responsibilities
Research Domain Exploration
Survey:     Survey papers, key authors, major conferences
Dive:       Seminal papers, foundational techniques
Trace:      Citation chains (forward and backward)
Probe:      "Can I reproduce this result?"
Synthesize: Research map, open problems, key debates
API/System Exploration
Survey:     Documentation, endpoint list, data models
Dive:       Core resources, authentication, rate limits
Trace:      Complete request lifecycle
Probe:      "What happens if I send malformed X?"
Synthesize: Mental model of system behavior, gotchas
Problem Space Exploration
Survey:     Existing solutions, stakeholder needs, constraints
Dive:       Specific failure modes, edge cases
Trace:      User journeys, data flows
Probe:      "Would approach X handle scenario Y?"
Synthesize: Problem decomposition, solution space map
Data Exploration (EDA)
Survey:     Schema, row counts, column types, missingness
Dive:       Distributions, outliers, specific fields
Trace:      Relationships between tables/fields
Probe:      "If X is true, what should Y look like?"
Synthesize: Data quality assessment, feature hypotheses