Documentation Index
Fetch the complete documentation index at: https://lab.pollack.ai/llms.txt
Use this file to discover all available pages before exploring further.
This project has moved from the
spring-ai-community GitHub organization to
markpollack. New releases are published under the Maven groupId
io.github.markpollack, and Java packages now use the io.github.markpollack
namespace. If you previously used org.springaicommunity, update your
dependency coordinates and imports to the current values shown below.Overview
Agent Journal captures the structured behavioral traces that make agent research possible. Every LLM call, tool invocation, state transition, and decision point is logged as a typed event in an append-only journal. TheEvalSubject extraction layer converts heterogeneous event sources into a uniform stream of behavioral units ready for evaluation by Agent Judge. A human feedback API records reviewer judgments with typed scores for judge calibration and golden dataset creation.
Architecture
Event System
Sealed event hierarchy: LLM calls, tool calls, state changes, git events, metrics, custom events
EvalSubject Extraction
Source-neutral behavioral units for evaluation — 9 subject kinds from journal events or SDK captures
Human Feedback
Typed feedback events with binary, numerical, and categorical scores for judge calibration
Modules
| Module | Description | Dependencies |
|---|---|---|
journal-core | Events, storage, EvalSubject, Feedback | Zero external deps |
claude-code-capture | Claude Code SDK → journal bridge, PhaseCaptureSources | claude-code-sdk |
Installation
EvalSubject
EvalSubject is the source-neutral unit of recorded agent behavior that can be judged. Each subject carries an ID, kind, source reference, and metadata extracted from the original event.
EvalSubjectKind
Nine kinds classify the type of behavior:| Kind | Source Event |
|---|---|
LLM_CALL | LLM invocation with tokens, cost, duration |
TOOL_CALL | Individual tool invocation |
WORKFLOW_STEP | High-level workflow phase |
ROUTER_DECISION | Routing or dispatch decision |
RETRIEVAL_RESULT | RAG or search retrieval |
FINAL_OUTPUT | Terminal agent output |
FEEDBACK | Human feedback event |
STATE_CHANGE | State transition |
CUSTOM | Application-defined behavior |
EvalSubjectSource
Functional interface that adapts a specific data source into a stream ofEvalSubject records:
EvalSubjectSources
Factory for creating sources from journal data:claude-code-capture module provides PhaseCaptureSources.fromPhaseCaptures(captures) for extracting subjects from Claude Code SDK PhaseCapture records — each phase becomes an LLM_CALL subject and individual tool uses become separate TOOL_CALL subjects.
EvalSubjectQuery
Fluent selection and grouping over subjects:Human Feedback
Record human reviewer judgments for judge agreement analysis and golden dataset creation.FeedbackTarget
Identifies what the feedback applies to:FeedbackEvent
Records a single piece of human feedback. ImplementsJournalEvent and is stored in the journal’s feedback.jsonl sidecar:
FeedbackScore
Typed score with three kinds:| Factory Method | ScoreKind | Description |
|---|---|---|
binary(true) | BINARY | Thumbs up / thumbs down |
numerical(0.8, 1.0) | NUMERICAL | Value with max, normalized() returns OptionalDouble |
categorical("correct") | CATEGORICAL | Discrete category label |
FeedbackService
Records and queries feedback, exports reviewed items for golden dataset creation:ReviewedItem is a projection record containing itemId, runId, feedback, and itemMetadata — suitable for building labeled datasets from human judgments.
What It Captures
Events are stored as an append-onlyevents.jsonl log per run:
JournalEvent hierarchy includes:
- LLMCallEvent — tokens, cost, duration, provider, model
- ToolCallEvent — tool name, arguments, result, duration
- StateChangeEvent — from/to states in the 9-state taxonomy
- MetricEvent — counters, timers, gauges with dimensional tags
- GitEvent — commit, diff, branch operations
- CustomEvent — application-defined events
- FeedbackEvent — human reviewer feedback
Why It Matters
Without structured traces, agent behavior is a black box. Agent Journal transforms agent runs into analyzable data — enabling Markov fingerprinting, loop detection, and cross-variant behavioral comparison.Feeds Into
- Code Coverage v1 — First Markov analysis from captured traces
- Code Coverage v2 — Refined 9-state taxonomy
Source
GitHub
Source code (BSL 1.1) — two modules, 401 tests