Skip to main content

Documentation Index

Fetch the complete documentation index at: https://lab.pollack.ai/llms.txt

Use this file to discover all available pages before exploring further.

Why a Jury?

A single judge gives you one judgment. A jury aggregates multiple judgments into a verdict with diagnostic information — when something fails, you can see which checks failed, which tier failed, and why. Agent Judge provides two jury types:
JuryUse when
SimpleJuryAll judges run as peers — aggregate with voting
CascadedJuryJudges are organized in cost tiers — fail fast on cheap checks, escalate to expensive ones
The snippets below omit imports for brevity. See API Reference for package names.

SimpleJury

Run multiple judges and aggregate results with a voting strategy:
SimpleJury jury = SimpleJury.builder()
    .judge(new FileExistsJudge("output.txt"))
    .judge(BuildSuccessJudge.maven("compile"), 2.0)  // Weight of 2.0
    .judge(new FileContentJudge("output.txt", "expected", FileContentJudge.MatchMode.CONTAINS))
    .votingStrategy(new MajorityVotingStrategy())
    .parallel(true)   // Execute judges concurrently (default)
    .build();

Verdict verdict = jury.vote(context);

Builder API

MethodDescriptionDefault
.judge(Judge)Add judge with weight 1.0
.judge(Judge, double)Add judge with custom weight
.votingStrategy(VotingStrategy)Aggregation method (required)
.parallel(boolean)Concurrent executiontrue
.executor(Executor)Custom thread pool for parallel executionCommon pool

Reading the Verdict

Verdict verdict = jury.vote(context);

// Aggregated result
Judgment overall = verdict.aggregated();
System.out.println(overall.status());    // PASS or FAIL
System.out.println(overall.reasoning()); // "Majority passed (2/3)"

// Individual results
verdict.individualByName().forEach((name, judgment) ->
    System.out.println(name + " -> " + judgment.status()));

// Weights used
Map<String, Double> weights = verdict.weights();
SimpleJury aggregates peers. It does not provide fail-fast cost control. Use CascadedJury when you want cheap checks to prevent expensive judges from running.

Voting Strategies

StrategyPass conditionBest for
MajorityVotingStrategypassCount > failCountGeneral purpose
ConsensusStrategyAll judges agreeHigh-stakes evaluation
AverageVotingStrategyaverage(scores) >= 0.5Continuous scores
WeightedAverageStrategyweightedAvg(scores) >= 0.5Judges with different importance
MedianVotingStrategymedian(scores) >= 0.5Outlier-resistant scoring

Configuring MajorityVotingStrategy

MajorityVotingStrategy strategy = new MajorityVotingStrategy(
    TiePolicy.FAIL,              // What to do on a tie
    ErrorPolicy.TREAT_AS_FAIL    // How to handle ERROR judgments
);
TiePolicy — when pass count equals fail count:
PolicyBehavior
TiePolicy.PASSOptimistic — resolve ties as PASS
TiePolicy.FAILPessimistic — resolve ties as FAIL (default)
TiePolicy.ABSTAINNeutral — no verdict
ErrorPolicy — when a judge returns JudgmentStatus.ERROR:
PolicyBehavior
ErrorPolicy.TREAT_AS_FAILCount errors as failures (default)
ErrorPolicy.TREAT_AS_ABSTAINCount the judge as having abstained
ErrorPolicy.IGNOREExclude the errored judge from the vote count and diagnostics

CascadedJury

A cascaded jury organizes judges into tiers. Each tier is itself a jury (typically a SimpleJury). Tiers execute sequentially — if a cheap tier already has a verdict, expensive tiers never run.
// Tier 1: Deterministic guardrails
Jury deterministic = SimpleJury.builder()
    .judge(new FileExistsJudge("src/main/java/App.java"))
    .judge(BuildSuccessJudge.maven("compile"))
    .votingStrategy(new MajorityVotingStrategy())
    .build();

// Tier 2: Structural (cheap, compares against reference)
// Requires context.metadata().get("expectedDir") to point at the reference directory
Jury structural = SimpleJury.builder()
    .judge(new FileComparisonJudge())
    .votingStrategy(new ConsensusStrategy())
    .build();

// Tier 3: Semantic (LLM cost)
Jury semantic = SimpleJury.builder()
    .judge(new CorrectnessJudge(chatClientBuilder))
    .votingStrategy(new MajorityVotingStrategy())
    .build();

CascadedJury jury = CascadedJury.builder()
    .tier("deterministic", deterministic, TierPolicy.REJECT_ON_ANY_FAIL)
    .tier("structural", structural, TierPolicy.ACCEPT_ON_ALL_PASS)
    .tier("semantic", semantic, TierPolicy.FINAL_TIER)
    .build();

Verdict verdict = jury.vote(context);

Tier Policies

PolicyBehaviorTypical use
REJECT_ON_ANY_FAILStop immediately if any judge in this tier failsGuardrails: must compile, files must exist
ACCEPT_ON_ALL_PASSStop if all judges pass — accept without escalating to later tiersConsensus gate when this tier is strong enough on its own
FINAL_TIERRuns when reached and produces the final verdictLast tier (required)
The last tier in a CascadedJury must use TierPolicy.FINAL_TIER. The builder validates this at build time.

Inspecting Tier Results

Verdict verdict = jury.vote(context);

// Overall result
System.out.println(verdict.aggregated().status());

// Per-tier sub-verdicts
for (Verdict tierVerdict : verdict.subVerdicts()) {
    System.out.println("Tier: " + tierVerdict.aggregated().reasoning());
    tierVerdict.individualByName().forEach((name, j) ->
        System.out.println("  " + name + " -> " + j.status()));
}

Jury Composition

Named Judges

Wrap any judge with a name for readable verdict output:
Judge named = Judges.named(myJudge, "build-check", "Verifies compilation");
Without names, judges get auto-generated identifiers in the verdict.

Combining Juries

The Juries utility class provides shortcuts:
import io.github.markpollack.judge.jury.Juries;

// Quick jury from judges
Jury quick = Juries.fromJudges(new MajorityVotingStrategy(), judge1, judge2, judge3);

// Meta-jury: combine two juries
Jury meta = Juries.combine(jury1, jury2, new ConsensusStrategy());

// Multiple juries with a shared strategy
Jury combined = Juries.allOf(new AverageVotingStrategy(), jury1, jury2, jury3);

Choosing a Pattern

ScenarioUse
Same-tier judges, single voteSimpleJury with majority or consensus
Weighted importance among judgesSimpleJury with WeightedAverageStrategy
Cheap-then-expensive evaluationCascadedJury with 2-3 tiers
Multiple evaluation dimensionsJuries.combine() to merge sub-juries
Quick one-off checkJudges.and() or Judges.allOf() (no jury overhead)

Built-in Judges

Catalog of judges to wire into juries

Writing Custom Judges

Build domain-specific judges for your evaluation criteria