# Pollack AI Lab ## Docs - [Agent Configuration](https://lab.pollack.ai/docs/agent-bench/agent-config.md): How to configure any CLI tool as an Agent Bench agent - [CLI Reference](https://lab.pollack.ai/docs/agent-bench/cli-reference.md): All Agent Bench commands, flags, and usage patterns - [Getting Started with Agent Bench](https://lab.pollack.ai/docs/agent-bench/getting-started.md): Test your AI coding agent against benchmarks in 5 minutes - [Jury System](https://lab.pollack.ai/docs/agent-bench/jury-system.md): How Agent Bench grades agent output using cascaded tiers of judges - [AgentLoop](https://lab.pollack.ai/docs/agent-workflow/agent-loop.md): Ready-to-use SWE agent — tools, session memory, observability, and a simple run/chat API backed by Spring AI's agent loop - [Annotation Model](https://lab.pollack.ai/docs/agent-workflow/annotation-model.md): Declare agents with @Agent, compose workflows inside AgentHandler, and wire exception handling with @ExceptionHandler and @AgentAdvice - [Agent Workflow API Reference](https://lab.pollack.ai/docs/agent-workflow/api-reference.md): Step interface, Workflow builder, WorkflowGraph IR, gates, context, StepRunner, and TraceRecorder - [DSL Primitives](https://lab.pollack.ai/docs/agent-workflow/choosing-a-pattern.md): 10+ composable primitives for building agentic pipelines — sequential, branch, loop, parallel, decision, gate, supervisor, and more - [Durability](https://lab.pollack.ai/docs/agent-workflow/durability.md): Crash recovery, checkpointing, and distributed execution for agent workflows - [Workflow DSL Examples](https://lab.pollack.ai/docs/agent-workflow/examples.md): Complete, runnable examples — validated with real LLM calls against GPT-4.1 - [Getting Started with Agent Workflow](https://lab.pollack.ai/docs/agent-workflow/getting-started.md): Compose steps into workflows with typed context, portable runtimes, and quality gates - [Step Parameterization](https://lab.pollack.ai/docs/agent-workflow/parameterization.md): How to get data into and out of steps — constructor injection, input chaining, context keys, metadata publishing, and mixed patterns - [Your First Research Agent](https://lab.pollack.ai/docs/agento-studio/getting-started.md): Build a file-based research KB and teach an AI agent to navigate it — in 20 minutes - [Architecture](https://lab.pollack.ai/docs/bud/architecture.md): Three modules, two protocols, zero API keys - [Getting Started](https://lab.pollack.ai/docs/bud/getting-started.md): Install Bud and build your first Spring Boot project - [Tools & Reference Projects](https://lab.pollack.ai/docs/bud/tools-reference.md): All 23 MCP tools and 9 reference projects - [Agent Experiment API Reference](https://lab.pollack.ai/docs/experiment-driver/api-reference.md): Configuration, dataset format, invoker contract, and result model - [Creating Experiments](https://lab.pollack.ai/docs/experiment-driver/creating-experiments.md): Design datasets, define variant ladders, filter items, and analyze results - [Getting Started with Agent Experiment](https://lab.pollack.ai/docs/experiment-driver/getting-started.md): Run your first AI agent evaluation: dataset, agent, jury, and variant comparison - [Building a Jury](https://lab.pollack.ai/docs/experiment-driver/jury-system.md): Three-tier cascaded evaluation: deterministic, structural, and semantic judges - [Loopy CLI Reference](https://lab.pollack.ai/docs/loopy/cli-reference.md): All flags, slash commands, execution modes, and configuration options - [Extending Loopy](https://lab.pollack.ai/docs/loopy/extending.md): Custom skills, subagents, tool profiles, listeners, and the programmatic API - [Getting Started with Loopy](https://lab.pollack.ai/docs/loopy/getting-started.md): Install, configure, and run your first agent session in under 5 minutes - [Code Coverage v1 — Knowledge Injection Baseline](https://lab.pollack.ai/experiments/code-coverage-v1.md): 9 variants testing progressive knowledge injection on Spring Boot test generation - [Code Coverage v2 — Skills vs Knowledge Bases](https://lab.pollack.ai/experiments/code-coverage-v2.md): 7 variants on Spring PetClinic testing whether structured skills outperform flat knowledge injection - [Code Coverage v3 — The Exemplar Effect](https://lab.pollack.ai/experiments/code-coverage-v3.md): When existing tests use older patterns, skills can't override them. The codebase is the agent's primary teacher. - [Experiments](https://lab.pollack.ai/experiments/index.md): Controlled studies measuring what moves the needle for AI agent reliability - [Issue Classification — Infrastructure vs Prompts](https://lab.pollack.ai/experiments/issue-classification.md): SWE-bench Lite: does infrastructure-level optimization beat prompt engineering? - [Pollack AI Lab](https://lab.pollack.ai/index.md): Tools and experiments for building agents that work — and understanding why they work - [ACP Java SDK](https://lab.pollack.ai/projects/acp-java-sdk.md): Agent Communication Protocol — build agents, consume agents, integrate with IDEs using a standard protocol - [Agent Bench](https://lab.pollack.ai/projects/agent-bench.md): Benchmarking suite for Java-centric AI agents on real-world software engineering tasks - [Agent Client](https://lab.pollack.ai/projects/agent-client.md): Autonomous CLI agent integrations for Spring AI — Claude Code, Gemini CLI, and SWE-bench agents - [Agent Experiment](https://lab.pollack.ai/projects/agent-experiment.md): End-to-end experiment driver for evaluating AI coding agents against fixture datasets - [Agent Hooks](https://lab.pollack.ai/projects/agent-hooks.md): Portable hook API for steering agent behavior at the tool-call boundary — write once, run on any runtime - [Agent Journal](https://lab.pollack.ai/projects/agent-journal.md): Behavioral trace capture for agent research and observability - [Agent Judge](https://lab.pollack.ai/projects/agent-judge.md): Agent-agnostic evaluation framework with deterministic, command, and LLM judges - [Agent Memory](https://lab.pollack.ai/projects/agent-memory.md): Progressive memory management for Spring AI — from context compaction to autonomous memory control - [Agent Sandbox](https://lab.pollack.ai/projects/agent-sandbox.md): Isolated command execution — Local, Docker, and E2B cloud backends behind a unified API - [Agent Skills](https://lab.pollack.ai/projects/agent-skills.md): Curated domain knowledge modules — SkillsJars that make agents smarter without prompt engineering - [Agent Tools](https://lab.pollack.ai/projects/agent-tools.md): Claude Code-inspired tools for Spring AI agents — file I/O, shell, search, web, and multi-agent orchestration - [Agent Workflow](https://lab.pollack.ai/projects/agent-workflow.md): Build agents that work — and measure why they work. Multi-step pipelines with typed context, quality gates, and portable runtimes. - [Agento Studio](https://lab.pollack.ai/projects/agento-studio.md): A systematic approach to growing AI agents — judges tell you if the agent worked, journals tell you why - [Agento University](https://lab.pollack.ai/projects/agento-university.md): Hierarchical multi-agent platform for continuous, autonomous management of software development projects - [AgentWorks BOM](https://lab.pollack.ai/projects/agentworks-bom.md): Bill of Materials for coordinated version management across all AgentWorks projects - [Bud](https://lab.pollack.ai/projects/bud.md): ACP agent for Spring Boot development — proven patterns as a starting point, AI to adapt them - [Claude Agent SDK (Java)](https://lab.pollack.ai/projects/claude-agent-sdk.md): Java SDK for Claude Code CLI integration — three-API architecture, sessions, MCP, multi-agent orchestration - [Projects](https://lab.pollack.ai/projects/index.md): Tools for building, evaluating, and observing AI agents on the JVM - [Loopy](https://lab.pollack.ai/projects/loopy.md): Loop-driven interactive coding agent CLI for Java developers - [Spring AI A2A](https://lab.pollack.ai/projects/spring-ai-a2a.md): Agent-to-Agent protocol implementation for Spring AI ## Optional - [Blog](https://blog.pollack.ai) - [GitHub](https://github.com/markpollack)