Skip to main content

Documentation Index

Fetch the complete documentation index at: https://lab.pollack.ai/llms.txt

Use this file to discover all available pages before exploring further.

This project now publishes new releases under the Maven groupId io.github.markpollack. If you previously used org.springaicommunity, update your dependency coordinates to the current values shown below.
Judges are like unit tests for your agent: executable checks that decide whether an agent output satisfies a goal. You wouldn’t ship application code without tests or assertions; agents need the same discipline. Agent Judge provides the evaluation framework: deterministic rules, command execution, file comparison, RAG evaluation, and LLM-powered assessment. Judges compose into juries with configurable voting strategies, so cheap deterministic checks can run before expensive LLM judges. Agent runtimes are vertical stacks — Spring AI, LangChain4j, Koog, and CLI-delegated agents each provide their own execution model. Evaluation cuts across all of them. Each bridge converts framework output into a JudgmentContext; the same judges and juries then evaluate the result.
        Spring AI     LangChain4j     Koog     AgentClient
            |              |           |             |
            v              v           v             v
-----------------------------------------------------------------
                         Agent Judge
                 horizontal evaluation layer
-----------------------------------------------------------------
   Build checks, file checks, AST comparison, RAG faithfulness,
   hallucination checks, LLM-as-judge, juries, cascaded juries
The core module (agent-judge-core) has zero external dependencies and contains the evaluation abstractions plus basic deterministic judges. Four additional judge-family modules add command execution, file comparison, LLM evaluation, and RAG assessment. Four framework bridge modules connect to Spring AI, LangChain4j, Koog, and AgentClient. The agent-judge-ai-core module provides framework-neutral AI-backed judge infrastructure. ModelBackedJudge composes a prompt template, model backend, and response classifier into a judge — no subclassing needed. JudgeModel implementations in agent-judge-llm (SpringAiJudgeModel) and agent-judge-agent-client (AgentClientJudgeModel) connect to specific AI backends.

Quick Start

Every runtime is adapted into a JudgmentContext, then evaluated by judges and juries:
JudgmentContext context = JudgmentContext.builder()
    .goal("Add a REST controller")
    .workspace(Path.of("/my/project"))
    .status(ExecutionStatus.SUCCESS)
    .startedAt(Instant.now())
    .executionTime(Duration.ofMinutes(2))
    .build();

Judge fileCheck = new FileExistsJudge("README.md");
Judge buildCheck = BuildSuccessJudge.maven("compile");

SimpleJury jury = SimpleJury.builder()
    .judge(fileCheck)
    .judge(buildCheck, 2.0)
    .votingStrategy(new MajorityVotingStrategy())
    .build();

Verdict verdict = jury.vote(context);

Core Abstractions

Judge

Functional interface — takes JudgmentContext, returns Judgment with score, status, reasoning, and granular checks

Jury

Multi-judge aggregation with voting strategies — majority, consensus, weighted average, median

ModelBackedJudge

Composable AI-backed judge: prompt template, model backend, and classifier composed into a judge

Judge Families

TypeModuleCostExample
Deterministicagent-judge-coreFreeFileExistsJudge, FileContentJudge, custom rules
Commandagent-judge-execCompute onlyBuildSuccessJudge (Maven/Gradle), CommandJudge
File Comparisonagent-judge-fileFreeJavaSemanticJudge, MavenSemanticJudge, AST-based diffs
LLMagent-judge-llmToken costCorrectnessJudge, custom LLM evaluation
RAGagent-judge-ragToken costFaithfulnessJudge, HallucinationJudge, ContextualRelevanceJudge

Framework Bridges

FrameworkModuleEvaluator
Spring AIagent-judge-spring-aiSpringAiEvaluator adapts ChatResponse
LangChain4jagent-judge-langchain4jLangChain4jEvaluator adapts Result<T>
Koogagent-judge-koogKoogEvaluator adapts AIAgent output
AgentClientagent-judge-agent-clientAgentClientEvaluator adapts CLI-agent responses
See API Reference for exact method signatures.

Installation

Each module is published separately under io.github.markpollack:
<dependency>
  <groupId>io.github.markpollack</groupId>
  <artifactId>agent-judge-core</artifactId>
  <version>0.11.0</version>
</dependency>
Start with agent-judge-core, then add only the judge-family or bridge modules you need. See Getting Started for the full dependency list.

Documentation

Getting Started

Add evaluation to your agent pipeline

Tutorial: Evaluation Pipeline

Build a multi-judge jury step by step

Writing Custom Judges

Lambda judges, DeterministicJudge, LLMJudge template method

Built-in Judges

Complete catalog of every judge across all modules

Jury System

SimpleJury, CascadedJury, voting strategies, composition

API Reference

All public types, interfaces, and records

License

Agent Judge is source-available under BSL 1.1. Internal enterprise use is welcome; commercial redistribution or competing hosted/managed offerings require permission.

Resources

Source Code

10 modules — 5 judge families + AI core + 4 framework bridges

Tutorial Code

8 runnable modules — from single judge to ModelBackedJudge

Design Philosophy

Why zero deps, functional interface, sealed scores, cascaded cost