Skip to main content

The Insight

Agent tool-call sequences are approximately first-order Markov chains. The probability of the next tool call depends primarily on the current tool call, not the full history. This means we can apply well-studied stochastic process analysis to understand agent behavior.

The 9-State Taxonomy

Every tool call maps to one of nine behavioral states:
StateDescriptionExample Tools
ORIENTUnderstanding the taskRead prompt, list files
READReading existing codeRead file, search code
EDITModifying codeWrite file, edit file
BUILDCompilingRun maven/gradle build
TESTRunning testsExecute test suite
JAR_INSPECTExamining dependenciesInspect JAR contents
SEARCHBroad explorationGrep, glob, web search
METAAgent self-managementTask creation, planning
TERMINALDoneSuccess or failure

What Markov Analysis Reveals

Transition Probability Engineering (TPE)

The transition matrix P(next_state | current_state) is the agent’s “behavioral fingerprint.” Different variants produce measurably different fingerprints.

Key Metrics

  • Expected steps to completion — From the fundamental matrix N = (I - Q)^
  • P(success) — Absorbing chain probability of reaching success vs. failure
  • Thrash score — Loop amplification in BUILD→TEST→EDIT cycles
  • JAR cluster % — Time spent in dependency inspection (correlates with knowledge availability)

What We’ve Found

From Code Coverage v1:
  1. Knowledge reduces JAR inspection — Variants with domain knowledge spend less time inspecting dependencies
  2. Thrashing predicts failure — High BUILD→TEST→EDIT loop counts correlate with lower T3 scores
  3. SAE changes the fingerprint — Structured Agent Execution produces measurably different transition matrices
  4. Two independent axes — Knowledge injection and prompt hardening affect different parts of the transition matrix

Tools

The analysis pipeline is implemented in the markov-agent-analysis Python library.
uv pip install -e ~/tuvium/projects/markov-agent-analysis/[all]
Key function: build_absorbing_chain_from_traces() — transforms raw tool-call logs into absorbing Markov chains.

Blog: I Read My Agent's Diary

Narrative walkthrough of the Markov analysis

Agent Journal

The trace capture layer that feeds this analysis