The Insight
Agent tool-call sequences are approximately first-order Markov chains. The probability of the next tool call depends primarily on the current tool call, not the full history. This means we can apply well-studied stochastic process analysis to understand agent behavior.The 9-State Taxonomy
Every tool call maps to one of nine behavioral states:| State | Description | Example Tools |
|---|---|---|
| ORIENT | Understanding the task | Read prompt, list files |
| READ | Reading existing code | Read file, search code |
| EDIT | Modifying code | Write file, edit file |
| BUILD | Compiling | Run maven/gradle build |
| TEST | Running tests | Execute test suite |
| JAR_INSPECT | Examining dependencies | Inspect JAR contents |
| SEARCH | Broad exploration | Grep, glob, web search |
| META | Agent self-management | Task creation, planning |
| TERMINAL | Done | Success or failure |
What Markov Analysis Reveals
Transition Probability Engineering (TPE)
The transition matrix P(next_state | current_state) is the agent’s “behavioral fingerprint.” Different variants produce measurably different fingerprints.Key Metrics
- Expected steps to completion — From the fundamental matrix N = (I - Q)^
- P(success) — Absorbing chain probability of reaching success vs. failure
- Thrash score — Loop amplification in BUILD→TEST→EDIT cycles
- JAR cluster % — Time spent in dependency inspection (correlates with knowledge availability)
What We’ve Found
From Code Coverage v1:- Knowledge reduces JAR inspection — Variants with domain knowledge spend less time inspecting dependencies
- Thrashing predicts failure — High BUILD→TEST→EDIT loop counts correlate with lower T3 scores
- SAE changes the fingerprint — Structured Agent Execution produces measurably different transition matrices
- Two independent axes — Knowledge injection and prompt hardening affect different parts of the transition matrix
Tools
The analysis pipeline is implemented in the markov-agent-analysis Python library.build_absorbing_chain_from_traces() — transforms raw tool-call logs into absorbing Markov chains.
Related
Blog: I Read My Agent's Diary
Narrative walkthrough of the Markov analysis
Agent Journal
The trace capture layer that feeds this analysis