Agentic Exploration Protocol: Systematic Discovery Methodology

Principle 1 Uncertainty Mapping

Before exploring, inventory what you don't know.

The Uncertainty Inventory

## Known Knowns
Things I understand and can explain.

## Known Unknowns
Specific questions I need to answer.

## Suspected Structure
Hypotheses about how things connect.

## Boundary Conditions
What's in scope vs out of scope.

## Unknown Unknowns (meta)
Areas where I don't even know what questions to ask.

Why This Matters

Exploration without structure becomes wandering. The inventory:

Prevents revisiting already-understood areas
Prioritizes high-uncertainty regions
Tracks progress toward understanding
Identifies when you're "done enough"

Uncertainty Types

Type	Description	Exploration Strategy
Structural	How is it organized?	Map → hierarchy, dependencies
Behavioral	What does it do?	Probe → inputs, outputs, edge cases
Causal	Why does it work this way?	Trace → history, constraints, decisions
Boundary	Where are the edges?	Test → limits, failure modes
Conceptual	What are the key abstractions?	Synthesize → patterns, vocabulary

Principle 2 Exploration Journal

Your memory is unreliable. The journal is ground truth.

Schema

## Session [N]: [Focus Area]

### Starting State
- What I thought I knew: [summary]
- Open questions: [list]
- Hypothesis: [what I expect to find]

### Explorations

#### Probe 1: [Description]
- Action: What I did
- Found: What I observed
- Implies: What this means for my model
- New questions: What this raises

#### Probe 2: [Description]
...

### Session Synthesis
- Model updates: How my understanding changed
- Confirmed: Hypotheses that held
- Refuted: Hypotheses that failed
- Deferred: Questions for later
- Connections: Links to other areas

Journal Discipline

Capture observations raw — Don't interpret while recording
Separate observation from inference — "I saw X" vs "This means Y"
Track model evolution — How did your understanding change?
Note surprises — Surprises reveal model gaps
Cross-reference — Link related discoveries

Principle 3 Exploration Modes

Mode 1: Survey (Breadth-First)

Goal: Structural overview. Map the territory.

Action: Scan everything at shallow depth

Output: Inventory of components, rough categorization

When: Starting exploration, entering new area

Tactics:

List all files/modules/sections
Read READMEs, docs, entry points
Identify major subsystems
Note vocabulary and naming conventions
Build initial mental map

Survey Questions:

What are the major components?
How are they organized?
What's the apparent entry point?
What vocabulary does this system use?
What's conspicuously missing or surprising?

Mode 2: Dive (Depth-First)

Goal: Deep understanding of specific area.

Action: Exhaustive investigation of one component

Output: Complete model of that component

When: Survey identified high-priority area

Tactics:

Trace execution paths end-to-end
Read every line in the module
Understand all dependencies
Test edge cases mentally or actually
Build component-level mental model

Dive Questions:

How does this actually work?
What are the invariants?
Where are the failure modes?
What assumptions does it make?
Why was it designed this way?

Mode 3: Trace (Follow the Thread)

Goal: Understand flow across boundaries.

Action: Follow a specific path through the system

Output: End-to-end understanding of one flow

When: Need to understand integration

Tactics:

Pick a concrete scenario (request, event, etc.)
Follow data/control from entry to exit
Note every transformation and decision
Identify handoff points between components
Map the critical path

Trace Questions:

What triggers this flow?
What data moves where?
Where are the decision points?
What could go wrong at each step?
What's on vs off the critical path?

Mode 4: Probe (Test the Model)

Goal: Validate understanding through prediction.

Action: Make predictions, then verify

Output: Calibrated confidence in mental model

When: Have hypothesis to test

Tactics:

State what you expect to find BEFORE looking
Look and compare to expectation
Surprises = model gaps
Update model based on evidence
Repeat until predictions are accurate

Probe Protocol:

1. State prediction: "I expect X because [reasoning]"
2. Investigate: Look at actual behavior/code/data
3. Compare: Did reality match prediction?
4. If match: Confidence increases
5. If mismatch: WHY? Update model.

Mode 5: Synthesize (Build Understanding)

Goal: Unify fragments into coherent model.

Action: Step back and integrate findings

Output: Articulated mental model

When: Accumulated enough observations

Tactics:

Write out your understanding
Identify patterns across areas
Name the key concepts
Draw the architecture
Explain it as if teaching someone

Synthesis Questions:

What are the 3-5 key concepts?
What's the core design philosophy?
What patterns repeat?
What's the "theory of the system"?
Where does my model still feel weak?

Principle 4 Agent Architecture

Role Decomposition

SURVEYOR

broad, fast, pattern-matching

Input: Raw territory

Output: Structural map, inventory

When: Starting, entering new area

High coverage, low depth
Good at pattern recognition
Identifies "what exists"
Flags anomalies for later

DIVER

deep, thorough, detail-oriented

Input: Specific component/question

Output: Complete understanding

When: Priority area identified

Low coverage, high depth
Follows every branch
Misses nothing in scope
Answers "how does this work"

TRACER

flow-oriented, cross-cutting

Input: Specific scenario/flow

Output: End-to-end understanding

When: Need integration view

Follows threads across boundaries
Maintains context through transitions
Identifies handoffs
Answers "how do pieces connect"

SYNTHESIZER

pattern-finding, model-building

Input: Accumulated observations

Output: Unified mental model

When: Enough material gathered

Sees forest, not just trees
Identifies recurring patterns
Names concepts
Produces teachable explanation

CHALLENGER

adversarial, model-testing

Input: Current mental model

Output: Holes, edge cases

When: Model feels "too clean"

Actively tries to break the model
Asks "what about...?"
Finds edge cases
Prevents false confidence

Coordination Protocol

while uncertainty > acceptable_threshold:

    # Phase 1: Survey (if entering new area)
    if new_territory:
        map = SURVEYOR.scan(territory)
        questions = extract_questions(map)
        priorities = rank_by_uncertainty(questions)

    # Phase 2: Investigate priority areas
    for question in priorities[:K]:

        if question.type == STRUCTURAL:
            finding = DIVER.investigate(question.target)

        elif question.type == FLOW:
            finding = TRACER.follow(question.scenario)

        elif question.type == BEHAVIORAL:
            finding = DIVER.probe(question.target)

        journal.record(question, finding)

    # Phase 3: Synthesize periodically
    if journal.entries > synthesis_threshold:
        model = SYNTHESIZER.integrate(journal)

        # Phase 4: Challenge the model
        holes = CHALLENGER.attack(model)
        if holes:
            priorities.extend(holes)
        else:
            uncertainty = estimate_remaining(model)

    # Phase 5: Decide next focus
    if stuck_in_one_area:
        switch_to_adjacent_area()
    if model.confidence > threshold:
        mark_area_understood()

Principle 5 Exploration Heuristics

Heuristic 1: Entry Points First

Start where users/callers start: main(), index.html, API endpoints. Public interfaces before internals. High-traffic paths before edge cases.

Why: Entry points reveal intended usage and core flows.

Heuristic 2: Follow the Data

When confused, trace data movement. Where does input come from? What transforms happen? Where does output go? What persists vs. what's ephemeral?

Why: Data flow is often clearer than control flow.

Heuristic 3: Read the Tests

Tests reveal: intended behavior, edge cases the authors worried about, integration boundaries, "happy path" assumptions.

Why: Tests are executable documentation of expectations.

Heuristic 4: Identify the Core

Every system has essential vs. accidental complexity. What's the minimum viable version? What could you remove and still have it work?

Why: Understanding the core accelerates understanding the rest.

Heuristic 5: Name What You Find

If you can't name it, you don't understand it. Create vocabulary as you go. "The X pattern" — name recurring structures. Naming forces clarity.

Why: Vocabulary is crystallized understanding.

Heuristic 6: Explain to Probe

Try to explain what you've found. Where does the explanation break down? What can't you articulate clearly? Those gaps are understanding gaps.

Why: Teaching reveals holes in mental models.

Heuristic 7: Hunt Anomalies

Pay special attention to things that don't fit: unexpected dependencies, naming inconsistencies, code that "shouldn't be there", historical artifacts.

Why: Anomalies often reveal important history or constraints.

Principle 6 The Depth Protocol

When you're going too deep or too shallow, recalibrate.

Too Shallow Detection

Signs you need to go deeper:

Can describe WHAT but not HOW
Predictions fail when tested
Can't explain edge cases
Understanding feels "slippery"

Action: Pick one component, DIVER mode, exhaust it.

Too Deep Detection

Signs you need to pull back:

Losing sight of overall structure
Details not connecting to big picture
Diminishing returns on investigation
Missing adjacent important areas

Action: SURVEYOR mode, map adjacent territory.

Depth Calibration Questions

1. Could I implement this from my understanding?
   No → probably too shallow

2. Could I explain this to someone in 2 minutes?
   No → might be too deep (or too shallow)

3. Do I know where this fits in the bigger picture?
   No → too deep without context

4. Can I predict what I'll find in adjacent areas?
   No → haven't extracted patterns yet

Principle 7 Exploration Patterns

Pattern: Concentric Rings

Start at center, expand outward in rings.

Ring 0: Entry point / main concept
Ring 1: Direct dependencies / immediate context
Ring 2: Secondary dependencies / broader context
Ring 3: Ecosystem / external integrations

Each ring complete before next.

When: Clear center exists. Good for codebase with obvious main.

Pattern: Key Questions

Start with critical questions, investigate to answer.

1. What is the #1 thing I need to understand?
2. Investigate until answered.
3. What's the next most critical question?
4. Repeat.

Let questions drive exploration path.

When: Specific goals exist. Good for targeted investigation.

Pattern: Comparative

Understand by comparing to known similar things.

1. What does this remind me of?
2. How is it similar?
3. How is it different?
4. What explains the differences?

Build understanding through contrast.

When: Familiar reference points exist. Good for learning new framework.

Pattern: Historical

Understand the present through the past.

1. What was version 0?
2. What changed and why?
3. What constraints shaped decisions?
4. What's vestigial vs. essential?

Git history, release notes, design docs.

When: System seems historically contingent. Good for legacy code.

Pattern: Adversarial

Understand by trying to break.

1. What could go wrong?
2. What are the trust boundaries?
3. Where are the assumptions?
4. What happens at the edges?

Security/reliability mindset.

When: Need to understand robustness. Good for security audit.

Pattern: Constructive

Understand by mentally rebuilding.

1. If I were building this, what would I need?
2. What problems would I face?
3. How would I solve them?
4. How does actual compare to my imagined version?

Predict then compare.

When: System is large/complex. Good for architecture understanding.

Principle 8 Synthesis Artifacts

Exploration produces artifacts, not just knowledge.

Artifact: Concept Map

Nodes: Key concepts, components
Edges: Relationships (uses, contains, depends-on, etc.)
Annotations: Brief descriptions

Visual representation of system structure.

Artifact: Glossary

Term: Definition in context of this system.

Build shared vocabulary.
Important: Note where this system's usage
differs from common usage.

Artifact: Decision Record

Decision: What choice was made
Context: What constraints existed
Alternatives: What was considered
Rationale: Why this choice

Captures the "why" that code can't show.

Artifact: Scenario Traces

Trigger: What initiates the flow
Steps: Numbered sequence through system
Data: What transforms at each step
Result: What outcome

Executable understanding of key flows.

Artifact: Unknown Log

Question: What I still don't understand
Attempts: What I tried to find out
Blocker: Why I couldn't answer
Priority: How important to resolve

Explicit acknowledgment of gaps.

Principle 9 Knowing When You're Done

Completion Criteria

Can explain the system's purpose in one sentence
Can identify the 3-5 key abstractions
Can trace the primary use case end-to-end
Can predict behavior for new scenarios
Predictions match reality when tested
Can explain why it's designed this way
Know where to look for any specific question
Diminishing returns on further exploration

"Good Enough" Thresholds

Context	Completion Standard
Quick orientation	Survey complete, key concepts named
Working in codebase	Can modify safely, know impact radius
Debugging	Can trace issue, know relevant components
Architecture review	Can critique decisions, identify risks
Full ownership	Could rewrite from scratch

The "Teach Test"

Try to explain your understanding.
Where does the explanation:
- Feel confident?    → Actually understood
- Get hand-wavy?     → Partially understood
- Require hedging?   → Not understood

The explanation reveals your actual knowledge.

Principle 10 Memory System

Don't use markdown files. Use a queryable database with Zettelkasten-style linking.

Architecture: SQLite + Embeddings + Links

┌─────────────────────────────────────────────────────────────┐ │ AGENT MEMORY SYSTEM │ ├─────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ FINDINGS │ │ SESSIONS │ │ LINKS │ │ │ │ (exploration)│◄─►│ (contexts) │◄─►│ (zettelkasten│ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ │ │ │ │ └────────┬────────┴──────────┬──────┘ │ │ ▼ ▼ │ │ ┌─────────────┐ ┌─────────────┐ │ │ │ TAGS │ │ EMBEDDINGS │ │ │ │ (fast filter│ │ (semantic │ │ │ └─────────────┘ │ search) │ │ │ └─────────────┘ │ │ │ │ Storage: SQLite (portable, queryable, auditable) │ │ Search: sqlite-vec for vector similarity │ │ Links: Zettelkasten-style bidirectional relations │ └─────────────────────────────────────────────────────────────┘

Schema for Exploration

-- Exploration findings/observations
CREATE TABLE findings (
    id INTEGER PRIMARY KEY,
    session_id TEXT NOT NULL,
    area TEXT NOT NULL,         -- module, file, concept
    finding_type TEXT,           -- structural|behavioral|causal|boundary|conceptual
    content TEXT NOT NULL,
    implications TEXT,
    questions TEXT,              -- JSON array of new questions
    confirmed INTEGER,           -- has this been validated?
    created_at TIMESTAMP,
    embedding BLOB               -- for semantic search
);

-- Zettelkasten-style links between findings
CREATE TABLE links (
    from_id INTEGER,
    to_id INTEGER,
    relation TEXT,              -- supports|contradicts|refines|questions|similar
    note TEXT
);

-- Fast tag-based filtering
CREATE TABLE tags (
    finding_id INTEGER,
    tag TEXT
);

Usage

from agent_memory import AgentMemory

mem = AgentMemory("./exploration.db")

# Log a finding
finding_id = mem.log_finding(
    session_id="codebase_explore_001",
    area="src/encoder",
    finding_type="structural",
    content="Encoder uses varint for all integer types",
    implications="Space-efficient but CPU cost on decode",
    questions=["Why not fixed-width for small ints?"],
    tags=["encoder", "varint", "design-decision"]
)

# Link to related finding
mem.add_link(
    from_id=finding_id, from_type="finding",
    to_id=3, to_type="finding",
    relation="supports",
    note="Both relate to space optimization"
)

# Query: all structural findings in encoder
findings = mem.get_findings(
    area="encoder",
    finding_type="structural"
)

# Query: unconfirmed findings (need validation)
uncertain = mem.get_findings(confirmed_only=False)

# Semantic search
related = mem.search_text("memory allocation")

Finding Types Map to Uncertainty Types

Finding Type	Uncertainty Being Reduced	Typical Tags
structural	How is it organized?	module, component, dependency
behavioral	What does it do?	api, input, output, edge-case
causal	Why does it work this way?	design-decision, history, constraint
boundary	Where are the edges?	limit, failure-mode, assumption
conceptual	What are the key abstractions?	pattern, vocabulary, mental-model

Why This Works

Queryable: SQL queries beat grep through markdown files
Linkable: Zettelkasten connections reveal patterns across areas
Searchable: Embeddings find semantically related findings
Exportable: Generate markdown reports when needed for humans
Portable: Single SQLite file, no server required

Quick Reference Card

THE EXPLORATION LOOP

 1. BOUND      What am I exploring? What's out of scope?
 2. SURVEY     What exists? How is it organized?
 3. QUESTION   What don't I understand? Prioritize.
 4. DIVE       Investigate priority unknowns.
 5. TRACE      Follow flows across boundaries.
 6. PROBE      Test mental model with predictions.
 7. RECORD     Journal everything. Trust the log.
 8. SYNTHESIZE Build unified understanding.
 9. CHALLENGE  Attack the model. Find holes.
10. ITERATE    Until "done enough" for context.

Agent Roles

SURVEYOR Broad mapping (fast, shallow)

DIVER Deep investigation (slow, thorough)

TRACER Follow flows (cross-cutting)

SYNTHESIZER Build models (pattern-finding)

CHALLENGER Test models (adversarial)

Exploration Modes

SURVEY What exists? (breadth-first)

DIVE How does this work? (depth-first)

TRACE How does it flow? (follow-thread)

PROBE Is my model right? (predict-check)

SYNTHESIZE What's the theory? (step-back)

Key Heuristics

Entry points first
Follow the data
Read the tests
Find the core
Name what you find
Explain to probe
Hunt anomalies

Depth Calibration

Too shallow:              Too deep:
• Know WHAT not HOW        • Lost big picture
• Predictions fail         • Details disconnected
• Can't explain edges      • Diminishing returns
→ Go deeper: DIVER         → Pull back: SURVEYOR

Appendix: Domain Instantiation

Codebase Exploration

Survey:     File tree, README, entry points
Dive:       Core module, critical path
Trace:      Main user flow, error handling
Probe:      "If I change X, what breaks?"
Synthesize: Architecture diagram, module responsibilities

Research Domain Exploration

Survey:     Survey papers, key authors, major conferences
Dive:       Seminal papers, foundational techniques
Trace:      Citation chains (forward and backward)
Probe:      "Can I reproduce this result?"
Synthesize: Research map, open problems, key debates

API/System Exploration

Survey:     Documentation, endpoint list, data models
Dive:       Core resources, authentication, rate limits
Trace:      Complete request lifecycle
Probe:      "What happens if I send malformed X?"
Synthesize: Mental model of system behavior, gotchas

Problem Space Exploration

Survey:     Existing solutions, stakeholder needs, constraints
Dive:       Specific failure modes, edge cases
Trace:      User journeys, data flows
Probe:      "Would approach X handle scenario Y?"
Synthesize: Problem decomposition, solution space map

Data Exploration (EDA)

Survey:     Schema, row counts, column types, missingness
Dive:       Distributions, outliers, specific fields
Trace:      Relationships between tables/fields
Probe:      "If X is true, what should Y look like?"
Synthesize: Data quality assessment, feature hypotheses

The Meta-Pattern

Principle 1 Uncertainty Mapping

The Uncertainty Inventory

Why This Matters

Uncertainty Types

Principle 2 Exploration Journal

Schema

Journal Discipline

Principle 3 Exploration Modes

Mode 1: Survey (Breadth-First)

Mode 2: Dive (Depth-First)

Mode 3: Trace (Follow the Thread)

Mode 4: Probe (Test the Model)

Mode 5: Synthesize (Build Understanding)

Principle 4 Agent Architecture

Role Decomposition

Coordination Protocol

Principle 5 Exploration Heuristics

Heuristic 1: Entry Points First

Heuristic 2: Follow the Data

Heuristic 3: Read the Tests

Heuristic 4: Identify the Core

Heuristic 5: Name What You Find

Heuristic 6: Explain to Probe

Heuristic 7: Hunt Anomalies

Principle 6 The Depth Protocol

Depth Calibration Questions

Principle 7 Exploration Patterns

Principle 8 Synthesis Artifacts

Principle 9 Knowing When You're Done

Completion Criteria

"Good Enough" Thresholds

The "Teach Test"

Principle 10 Memory System

Architecture: SQLite + Embeddings + Links

Schema for Exploration

Usage

Finding Types Map to Uncertainty Types

Why This Works

Quick Reference Card

THE EXPLORATION LOOP

Agent Roles

Exploration Modes

Key Heuristics

Depth Calibration

Appendix: Domain Instantiation