A single experiment run produces one ExperimentResult. When you run multiple variants — a baseline, a prompt tweak, a knowledge-base variant — you need a way to group those results and track which variants have completed. That’s what sessions and sweeps provide.
Sessions group variant results from a single run. Sweeps coordinate across sessions, tracking which of the expected variants have been resolved and by which session.
To use sessions, create an ActiveSession and pass it to AgentExperiment.run():
ActiveSession active = new ActiveSession( "full-suite-2026-03-03", // session name "rename-field-v1", // experiment name "variant-a"); // variant being executedExperimentResult result = experiment.run(agentInvoker, active);
When an ActiveSession is provided:
Traces and workspaces are written under the session directory
The result is saved to both ResultStore (as before) and SessionStore
Without an ActiveSession, the experiment behaves exactly as before — full backward compatibility
A sweep declares which variants must run and tracks progress across sessions. This is useful when variants run at different times — overnight jobs, CI retries, or manual re-runs of failed variants.
// Create a sweep with expected variantsSweep sweep = sweepStore.createSweep( "stage5-full", "rename-field-v1", List.of("control", "variant-a", "variant-b"), Map.of("stage", "5"));// Add sessions as they completesweepStore.addSession("stage5-full", "rename-field-v1", "run-monday", "abc123");sweepStore.addSession("stage5-full", "rename-field-v1", "run-tuesday", "abc123");// Check progressSweep updated = sweepStore.loadSweep("rename-field-v1", "stage5-full").get();updated.missingVariants(); // variants not yet resolvedupdated.isComplete(); // true if all expected variants resolvedupdated.hasVersionMismatch(); // true if resolved variants used different git commits
Sweep.hasVersionMismatch() returns true if resolved variants were run against different git commits. This catches a subtle problem: when you re-run a failed variant after a code change, the sweep now contains results from two different code versions. The mismatch flag lets you detect this and decide whether to accept the mixed results or re-run the full sweep.