Markov Fingerprinting - Pollack AI Lab

The Insight

Agent tool-call sequences are approximately first-order Markov chains. The probability of the next tool call depends primarily on the current tool call, not the full history. This means we can apply well-studied stochastic process analysis to understand agent behavior.

The 9-State Taxonomy

Every tool call maps to one of nine behavioral states:

State	Description	Example Tools
ORIENT	Understanding the task	Read prompt, list files
READ	Reading existing code	Read file, search code
EDIT	Modifying code	Write file, edit file
BUILD	Compiling	Run maven/gradle build
TEST	Running tests	Execute test suite
JAR_INSPECT	Examining dependencies	Inspect JAR contents
SEARCH	Broad exploration	Grep, glob, web search
META	Agent self-management	Task creation, planning
TERMINAL	Done	Success or failure

What Markov Analysis Reveals

Transition Probability Engineering (TPE)

The transition matrix P(next_state | current_state) is the agent’s “behavioral fingerprint.” Different variants produce measurably different fingerprints.

Key Metrics

Expected steps to completion — From the fundamental matrix N = (I - Q)^
P(success) — Absorbing chain probability of reaching success vs. failure
Thrash score — Loop amplification in BUILD→TEST→EDIT cycles
JAR cluster % — Time spent in dependency inspection (correlates with knowledge availability)

What We’ve Found

From Code Coverage v1:

Knowledge reduces JAR inspection — Variants with domain knowledge spend less time inspecting dependencies
Thrashing predicts failure — High BUILD→TEST→EDIT loop counts correlate with lower T3 scores
SAE changes the fingerprint — Structured Agent Execution produces measurably different transition matrices
Two independent axes — Knowledge injection and prompt hardening affect different parts of the transition matrix

Tools

The analysis pipeline is implemented in the markov-agent-analysis Python library.

uv pip install -e ~/tuvium/projects/markov-agent-analysis/[all]

Key function: build_absorbing_chain_from_traces() — transforms raw tool-call logs into absorbing Markov chains.

Role in the Growth Cycle

Markov analysis is the primary diagnostic lens in the DIAGNOSE step of the Improvement Flywheel. It converts raw tool-call traces into actionable signals about where the agent gets stuck and why.

Loop Amplification Signals

Signal	Diagnosis	Lever
Amplification > 2.0 on BUILD→FIX→BUILD	Agent is in a fix loop — build fails, fix attempt fails, rebuild fails	Knowledge (Lever 2) or deterministic tool (Lever 3)
Amplification > 2.0 on SEARCH states	Agent is searching for something it can’t find	Add the target information to `knowledge/` (Lever 2)
Amplification > 2.0 on EXPLORE	Agent is reading many files without making progress	Clarify task decomposition in prompt (Lever 1) or pre-analysis script (Lever 3)

Transition Gap Signals

Signal	Diagnosis	Lever
VERIFY never reached	Agent produces output but doesn’t confirm correctness	Add stopping condition to prompt (Lever 1)
READ_KB never reached	Agent ignores knowledge files	Reference knowledge files in prompt, improve routing table (Lever 1)
WRITE reached late (after many EXPLORE/SEARCH cycles)	Agent spends too long understanding before acting	Templates, scaffolding, explicit first steps (Lever 3)

Failure Pattern Signals

Signal	Diagnosis	Lever
Error state reachable from multiple paths	Structural/tooling issue	Fix the tooling, not the agent (Lever 3)
Single dominant failure path (BUILD → ERROR 80%)	Invalid strategy	Change strategy, not retry count (Lever 1 or 2)
Agent stops after N fix attempts	Capability ceiling	Add fix pattern to knowledge (Lever 2) or restructure task (Lever 3)

Loop Type Classification

Not all loops are problems. Classify before intervening:

Loop Type	Pattern	Action
Productive	WRITE → VERIFY → FIX → VERIFY	Leave it alone
Friction	SEARCH → READ → SEARCH → READ	Add knowledge or routing
Failure	BUILD → FIX → BUILD → FIX (same error)	Change strategy
Diagnostic	BUILD → ERROR → READ_LOG → FIX	Leave it alone
Degenerate	EXPLORE → EXPLORE → EXPLORE	Intervene — agent is stuck

Improvement Flywheel

The feedback loop that this analysis drives

Forge Methodology

Define → Forge → Run → Grow pipeline

Blog: I Read My Agent's Diary

Narrative walkthrough of the Markov analysis

Agent Journal

The trace capture layer that feeds this analysis

​The Insight

​The 9-State Taxonomy

​What Markov Analysis Reveals

​Transition Probability Engineering (TPE)

​Key Metrics

​What We’ve Found

​Tools

​Role in the Growth Cycle

​Loop Amplification Signals

​Transition Gap Signals

​Failure Pattern Signals

​Loop Type Classification

​Related