Documentation Index
Fetch the complete documentation index at: https://lab.pollack.ai/llms.txt
Use this file to discover all available pages before exploring further.
What is Agent Judge?
Agent Judge is an evaluation framework for AI agent output. It provides deterministic rules, command execution checks, file comparison, RAG evaluation, and LLM-powered assessment that compose into juries with configurable voting strategies. Think of judges as being like unit tests for your agent: executable checks that decide whether an agent output satisfies a goal. You wouldn’t ship application code without tests or assertions, and agents need the same discipline. The core module has zero external dependencies. Framework bridge modules let you evaluate output from Spring AI, LangChain4j, Koog, and CLI-delegated agents (via AgentClient) — the same judges and juries work across all of them.License
Agent Judge is licensed under BSL 1.1. Internal enterprise use is welcome. Commercial redistribution requires a separate agreement — see the LICENSE file for details.Prerequisites
- Java 21+
- Maven 3.9+ (or Gradle 8+)
- For LLM judges: Spring AI and an API key (Anthropic, OpenAI, etc.)
Add the Dependency
Start with the core module, then add only the modules you need:| Module | Artifact | What it adds |
|---|---|---|
| Exec | agent-judge-exec | Build, shell, and coverage judges |
| File | agent-judge-file | AST, POM, XML, and text comparison |
| LLM | agent-judge-llm | LLM-powered judges (requires Spring AI) |
| RAG | agent-judge-rag | Faithfulness, hallucination, relevance |
| Spring AI bridge | agent-judge-spring-ai | Evaluates ChatResponse output |
| LangChain4j bridge | agent-judge-langchain4j | Evaluates Result<T> output |
| Koog bridge | agent-judge-koog | Evaluates AIAgent output |
| AgentClient bridge | agent-judge-agent-client | Evaluates CLI-agent output |
io.github.markpollack) and version.
Your First Judge
Check whether a file exists in an agent’s workspace:JudgmentContext (what the agent was asked to do and where it worked) and returns a Judgment (score, status, reasoning, and granular checks).
Add a Build Judge
Verify the project still compiles after the agent modified it:BuildSuccessJudge.maven() auto-detects the ./mvnw wrapper.
Use BuildSuccessJudge.gradle() for Gradle projects.
Command judges require the
agent-judge-exec module.
They run real processes in the workspace directory.Combine into a Jury
Run multiple judges together and aggregate results with a voting strategy:Evaluate Framework Output
Framework bridge modules convert agent output intoJudgmentContext automatically.
The same judges and juries work regardless of which framework produced the output.
| Runtime | Input type | Bridge |
|---|---|---|
| Spring AI | ChatResponse | SpringAiEvaluator |
| LangChain4j | Result<T> | LangChain4jEvaluator |
| Koog | AIAgent | KoogEvaluator |
| AgentClient | AgentClientResponse | AgentClientEvaluator |
Bridge modules do not bring framework runtimes transitively. Add the Spring AI, LangChain4j, Koog, or AgentClient dependency your application already uses.
Spring AI
LangChain4j
Koog
AgentClient (CLI agents)
Jury instead of a Judge to get a Verdict.
Evaluate RAG Pipelines
The RAG module provides LLM-powered judges for retrieval-augmented generation:FaithfulnessJudge, ContextualRelevanceJudge, and HallucinationJudge.
See Built-in Judges for details.
What’s Next
Tutorial: Build an Evaluation Pipeline
Step-by-step guide from single judge to multi-judge jury
Built-in Judges
Catalog of built-in judges across all modules
Jury System
SimpleJury, CascadedJury, voting strategies, and composition
Writing Custom Judges
Lambda judges, DeterministicJudge, LLMJudge template method