Agent Judge

What’s New → What’s new in 0.13.0: Upgraded to Spring AI 2.0.0 GA on Spring Boot 4.0.7, and added an OWASP dependency-check CVE gate to the build. Follows 0.12.0’s Jackson 2.21.2 alignment. The 10-module evaluation architecture (Judge / Jury / ModelBackedJudge) is unchanged.

This project now publishes new releases under the Maven groupId io.github.markpollack. If you previously used org.springaicommunity, update your dependency coordinates to the current values shown below.

Judges are like unit tests for your agent: executable checks that decide whether an agent output satisfies a goal. You wouldn’t ship application code without tests or assertions; agents need the same discipline. Agent Judge provides the evaluation framework: deterministic rules, command execution, file comparison, RAG evaluation, and LLM-powered assessment. Judges compose into juries with configurable voting strategies, so cheap deterministic checks can run before expensive LLM judges. Agent runtimes are vertical stacks — Spring AI, LangChain4j, Koog, and CLI-delegated agents each provide their own execution model. Evaluation cuts across all of them. Each bridge converts framework output into a JudgmentContext; the same judges and juries then evaluate the result.

        Spring AI     LangChain4j     Koog     AgentClient
            |              |           |             |
            v              v           v             v
-----------------------------------------------------------------
                         Agent Judge
                 horizontal evaluation layer
-----------------------------------------------------------------
   Build checks, file checks, AST comparison, RAG faithfulness,
   hallucination checks, LLM-as-judge, juries, cascaded juries

The core module (agent-judge-core) has zero external dependencies and contains the evaluation abstractions plus basic deterministic judges. Four additional judge-family modules add command execution, file comparison, LLM evaluation, and RAG assessment. Four framework bridge modules connect to Spring AI, LangChain4j, Koog, and AgentClient. The agent-judge-ai-core module provides framework-neutral AI-backed judge infrastructure. ModelBackedJudge composes a prompt template, model backend, and response classifier into a judge — no subclassing needed. JudgeModel implementations in agent-judge-llm (SpringAiJudgeModel) and agent-judge-agent-client (AgentClientJudgeModel) connect to specific AI backends.

Quick Start

Every runtime is adapted into a JudgmentContext, then evaluated by judges and juries:

JudgmentContext context = JudgmentContext.builder()
    .goal("Add a REST controller")
    .workspace(Path.of("/my/project"))
    .status(ExecutionStatus.SUCCESS)
    .startedAt(Instant.now())
    .executionTime(Duration.ofMinutes(2))
    .build();

Judge fileCheck = new FileExistsJudge("README.md");
Judge buildCheck = BuildSuccessJudge.maven("compile");

SimpleJury jury = SimpleJury.builder()
    .judge(fileCheck)
    .judge(buildCheck, 2.0)
    .votingStrategy(new MajorityVotingStrategy())
    .build();

Verdict verdict = jury.vote(context);

Core Abstractions

Judge

Functional interface — takes JudgmentContext, returns Judgment with score, status, reasoning, and granular checks

Jury

Multi-judge aggregation with voting strategies — majority, consensus, weighted average, median

ModelBackedJudge

Composable AI-backed judge: prompt template, model backend, and classifier composed into a judge

Judge Families

Type	Module	Cost	Example
Deterministic	`agent-judge-core`	Free	`FileExistsJudge`, `FileContentJudge`, custom rules
Command	`agent-judge-exec`	Compute only	`BuildSuccessJudge` (Maven/Gradle), `CommandJudge`
File Comparison	`agent-judge-file`	Free	`JavaSemanticJudge`, `MavenSemanticJudge`, AST-based diffs
LLM	`agent-judge-llm`	Token cost	`CorrectnessJudge`, custom LLM evaluation
RAG	`agent-judge-rag`	Token cost	`FaithfulnessJudge`, `HallucinationJudge`, `ContextualRelevanceJudge`

Framework Bridges

Framework	Module	Evaluator
Spring AI	`agent-judge-spring-ai`	`SpringAiEvaluator` adapts `ChatResponse`
LangChain4j	`agent-judge-langchain4j`	`LangChain4jEvaluator` adapts `Result<T>`
Koog	`agent-judge-koog`	`KoogEvaluator` adapts `AIAgent` output
AgentClient	`agent-judge-agent-client`	`AgentClientEvaluator` adapts CLI-agent responses

See API Reference for exact method signatures.

Installation

Each module is published separately under io.github.markpollack:

<dependency>
  <groupId>io.github.markpollack</groupId>
  <artifactId>agent-judge-core</artifactId>
  <version>0.13.0</version>
</dependency>

Start with agent-judge-core, then add only the judge-family or bridge modules you need. See Getting Started for the full dependency list.

Documentation

Getting Started

Add evaluation to your agent pipeline

Tutorial: Evaluation Pipeline

Build a multi-judge jury step by step

Writing Custom Judges

Lambda judges, DeterministicJudge, LLMJudge template method

Built-in Judges

Complete catalog of every judge across all modules

Jury System

SimpleJury, CascadedJury, voting strategies, composition

API Reference

All public types, interfaces, and records

License

Agent Judge is source-available under BSL 1.1. Internal enterprise use is welcome; commercial redistribution or competing hosted/managed offerings require permission.

Resources

Source Code

10 modules — 5 judge families + AI core + 4 framework bridges

Tutorial Code

8 runnable modules — from single judge to ModelBackedJudge

Design Philosophy

Why zero deps, functional interface, sealed scores, cascaded cost

Projects

AgentWorks

Agento

Supporting Projects

Migration

Quick Start

Core Abstractions

Judge

Jury

ModelBackedJudge

Judge Families

Framework Bridges

Installation

Documentation

Getting Started

Tutorial: Evaluation Pipeline

Writing Custom Judges

Built-in Judges

Jury System

API Reference

License

Resources

Source Code

Tutorial Code

Design Philosophy

​Quick Start

​Core Abstractions

Judge

Jury

ModelBackedJudge

​Judge Families

​Framework Bridges

​Installation

​Documentation

Getting Started

Tutorial: Evaluation Pipeline

Writing Custom Judges

Built-in Judges

Jury System

API Reference

​License

​Resources

Source Code

Tutorial Code

Design Philosophy

Quick Start

Core Abstractions

Judge Families

Framework Bridges

Installation

Documentation

License

Resources