Skip to main content

Why a Jury?

A single judge gives you one judgment. A jury aggregates multiple judgments into a verdict with diagnostic information β€” when something fails, you can see which checks failed, which tier failed, and why. Agent Judge provides two jury types:
JuryUse when
SimpleJuryAll judges run as peers β€” aggregate with voting
CascadedJuryJudges are organized in cost tiers β€” fail fast on cheap checks, escalate to expensive ones
The snippets below omit imports for brevity. See API Reference for package names.

SimpleJury

Run multiple judges and aggregate results with a voting strategy:
SimpleJury jury = SimpleJury.builder()
    .judge(new FileExistsJudge("output.txt"))
    .judge(BuildSuccessJudge.maven("compile"), 2.0)  // Weight of 2.0
    .judge(new FileContentJudge("output.txt", "expected", FileContentJudge.MatchMode.CONTAINS))
    .votingStrategy(new MajorityVotingStrategy())
    .parallel(true)   // Execute judges concurrently (default)
    .build();

Verdict verdict = jury.vote(context);

Builder API

MethodDescriptionDefault
.judge(Judge)Add judge with weight 1.0β€”
.judge(Judge, double)Add judge with custom weightβ€”
.votingStrategy(VotingStrategy)Aggregation method (required)β€”
.parallel(boolean)Concurrent executiontrue
.executor(Executor)Custom thread pool for parallel executionCommon pool

Reading the Verdict

Verdict verdict = jury.vote(context);

// Aggregated result
Judgment overall = verdict.aggregated();
System.out.println(overall.status());    // PASS or FAIL
System.out.println(overall.reasoning()); // "Majority passed (2/3)"

// Individual results
verdict.individualByName().forEach((name, judgment) ->
    System.out.println(name + " -> " + judgment.status()));

// Weights used
Map<String, Double> weights = verdict.weights();
SimpleJury aggregates peers. It does not provide fail-fast cost control. Use CascadedJury when you want cheap checks to prevent expensive judges from running.

Voting Strategies

StrategyPass conditionBest for
MajorityVotingStrategypassCount > failCountGeneral purpose
ConsensusStrategyAll judges agreeHigh-stakes evaluation
AverageVotingStrategyaverage(scores) >= 0.5Continuous scores
WeightedAverageStrategyweightedAvg(scores) >= 0.5Judges with different importance
MedianVotingStrategymedian(scores) >= 0.5Outlier-resistant scoring

Configuring MajorityVotingStrategy

MajorityVotingStrategy strategy = new MajorityVotingStrategy(
    TiePolicy.FAIL,              // What to do on a tie
    ErrorPolicy.TREAT_AS_FAIL    // How to handle ERROR judgments
);
TiePolicy β€” when pass count equals fail count:
PolicyBehavior
TiePolicy.PASSOptimistic β€” resolve ties as PASS
TiePolicy.FAILPessimistic β€” resolve ties as FAIL (default)
TiePolicy.ABSTAINNeutral β€” no verdict
ErrorPolicy β€” when a judge returns JudgmentStatus.ERROR:
PolicyBehavior
ErrorPolicy.TREAT_AS_FAILCount errors as failures (default)
ErrorPolicy.TREAT_AS_ABSTAINCount the judge as having abstained
ErrorPolicy.IGNOREExclude the errored judge from the vote count and diagnostics

CascadedJury

A cascaded jury organizes judges into tiers. Each tier is itself a jury (typically a SimpleJury). Tiers execute sequentially β€” if a cheap tier already has a verdict, expensive tiers never run.
// Tier 1: Deterministic guardrails
Jury deterministic = SimpleJury.builder()
    .judge(new FileExistsJudge("src/main/java/App.java"))
    .judge(BuildSuccessJudge.maven("compile"))
    .votingStrategy(new MajorityVotingStrategy())
    .build();

// Tier 2: Structural (cheap, compares against reference)
// Requires context.metadata().get("expectedDir") to point at the reference directory
Jury structural = SimpleJury.builder()
    .judge(new FileComparisonJudge())
    .votingStrategy(new ConsensusStrategy())
    .build();

// Tier 3: Semantic (LLM cost)
Jury semantic = SimpleJury.builder()
    .judge(new CorrectnessJudge(chatClientBuilder))
    .votingStrategy(new MajorityVotingStrategy())
    .build();

CascadedJury jury = CascadedJury.builder()
    .tier("deterministic", deterministic, TierPolicy.REJECT_ON_ANY_FAIL)
    .tier("structural", structural, TierPolicy.ACCEPT_ON_ALL_PASS)
    .tier("semantic", semantic, TierPolicy.FINAL_TIER)
    .build();

Verdict verdict = jury.vote(context);

Tier Policies

PolicyBehaviorTypical use
REJECT_ON_ANY_FAILStop immediately if any judge in this tier failsGuardrails: must compile, files must exist
ACCEPT_ON_ALL_PASSStop if all judges pass β€” accept without escalating to later tiersConsensus gate when this tier is strong enough on its own
FINAL_TIERRuns when reached and produces the final verdictLast tier (required)
The last tier in a CascadedJury must use TierPolicy.FINAL_TIER. The builder validates this at build time.

Inspecting Tier Results

Verdict verdict = jury.vote(context);

// Overall result
System.out.println(verdict.aggregated().status());

// Per-tier sub-verdicts
for (Verdict tierVerdict : verdict.subVerdicts()) {
    System.out.println("Tier: " + tierVerdict.aggregated().reasoning());
    tierVerdict.individualByName().forEach((name, j) ->
        System.out.println("  " + name + " -> " + j.status()));
}

Jury Composition

Named Judges

Wrap any judge with a name for readable verdict output:
Judge named = Judges.named(myJudge, "build-check", "Verifies compilation");
Without names, judges get auto-generated identifiers in the verdict.

Combining Juries

The Juries utility class provides shortcuts:
import io.github.markpollack.judge.jury.Juries;

// Quick jury from judges
Jury quick = Juries.fromJudges(new MajorityVotingStrategy(), judge1, judge2, judge3);

// Meta-jury: combine two juries
Jury meta = Juries.combine(jury1, jury2, new ConsensusStrategy());

// Multiple juries with a shared strategy
Jury combined = Juries.allOf(new AverageVotingStrategy(), jury1, jury2, jury3);

Choosing a Pattern

ScenarioUse
Same-tier judges, single voteSimpleJury with majority or consensus
Weighted importance among judgesSimpleJury with WeightedAverageStrategy
Cheap-then-expensive evaluationCascadedJury with 2-3 tiers
Multiple evaluation dimensionsJuries.combine() to merge sub-juries
Quick one-off checkJudges.and() or Judges.allOf() (no jury overhead)

Built-in Judges

Catalog of judges to wire into juries

Writing Custom Judges

Build domain-specific judges for your evaluation criteria