# Pollack AI Lab

## Docs

- [Agent Configuration](https://lab.pollack.ai/docs/agent-bench/agent-config.md): How to configure any CLI tool as an Agent Bench agent
- [CLI Reference](https://lab.pollack.ai/docs/agent-bench/cli-reference.md): All Agent Bench commands, flags, and usage patterns
- [Getting Started with Agent Bench](https://lab.pollack.ai/docs/agent-bench/getting-started.md): Test your AI coding agent against benchmarks in 5 minutes
- [Jury System](https://lab.pollack.ai/docs/agent-bench/jury-system.md): How Agent Bench grades agent output using cascaded tiers of judges
- [AgentLoop](https://lab.pollack.ai/docs/agent-workflow/agent-loop.md): Ready-to-use SWE agent — tools, session memory, observability, and a simple run/chat API backed by Spring AI's agent loop
- [Annotation Model](https://lab.pollack.ai/docs/agent-workflow/annotation-model.md): Declare agents with @Agent, compose workflows inside AgentHandler, and wire exception handling with @ExceptionHandler and @AgentAdvice
- [Agent Workflow API Reference](https://lab.pollack.ai/docs/agent-workflow/api-reference.md): Step interface, Workflow builder, WorkflowGraph IR, gates, context, StepRunner, and TraceRecorder
- [DSL Primitives](https://lab.pollack.ai/docs/agent-workflow/choosing-a-pattern.md): 10+ composable primitives for building agentic pipelines — sequential, branch, loop, parallel, decision, gate, supervisor, and more
- [Durability](https://lab.pollack.ai/docs/agent-workflow/durability.md): Crash recovery, checkpointing, and distributed execution for agent workflows
- [Workflow DSL Examples](https://lab.pollack.ai/docs/agent-workflow/examples.md): Complete, runnable examples — validated with real LLM calls against GPT-4.1
- [Getting Started with Agent Workflow](https://lab.pollack.ai/docs/agent-workflow/getting-started.md): Compose steps into workflows with typed context, portable runtimes, and quality gates
- [Step Parameterization](https://lab.pollack.ai/docs/agent-workflow/parameterization.md): How to get data into and out of steps — constructor injection, input chaining, context keys, metadata publishing, and mixed patterns
- [Your First Research Agent](https://lab.pollack.ai/docs/agento-studio/getting-started.md): Build a file-based research KB and teach an AI agent to navigate it — in 20 minutes
- [Architecture](https://lab.pollack.ai/docs/bud/architecture.md): Three modules, two protocols, zero API keys
- [Getting Started](https://lab.pollack.ai/docs/bud/getting-started.md): Install Bud and build your first Spring Boot project
- [Tools & Reference Projects](https://lab.pollack.ai/docs/bud/tools-reference.md): All 23 MCP tools and 9 reference projects
- [Agent Experiment API Reference](https://lab.pollack.ai/docs/experiment-driver/api-reference.md): Configuration, dataset format, invoker contract, and result model
- [Creating Experiments](https://lab.pollack.ai/docs/experiment-driver/creating-experiments.md): Design datasets, define variant ladders, filter items, and analyze results
- [Getting Started with Agent Experiment](https://lab.pollack.ai/docs/experiment-driver/getting-started.md): Run your first AI agent evaluation: dataset, agent, jury, and variant comparison
- [Building a Jury](https://lab.pollack.ai/docs/experiment-driver/jury-system.md): Three-tier cascaded evaluation: deterministic, structural, and semantic judges
- [Loopy CLI Reference](https://lab.pollack.ai/docs/loopy/cli-reference.md): All flags, slash commands, execution modes, and configuration options
- [Extending Loopy](https://lab.pollack.ai/docs/loopy/extending.md): Custom skills, subagents, tool profiles, listeners, and the programmatic API
- [Getting Started with Loopy](https://lab.pollack.ai/docs/loopy/getting-started.md): Install, configure, and run your first agent session in under 5 minutes
- [Code Coverage v1 — Knowledge Injection Baseline](https://lab.pollack.ai/experiments/code-coverage-v1.md): 9 variants testing progressive knowledge injection on Spring Boot test generation
- [Code Coverage v2 — Skills vs Knowledge Bases](https://lab.pollack.ai/experiments/code-coverage-v2.md): 7 variants on Spring PetClinic testing whether structured skills outperform flat knowledge injection
- [Code Coverage v3 — The Exemplar Effect](https://lab.pollack.ai/experiments/code-coverage-v3.md): When existing tests use older patterns, skills can't override them. The codebase is the agent's primary teacher.
- [Experiments](https://lab.pollack.ai/experiments/index.md): Controlled studies measuring what moves the needle for AI agent reliability
- [Issue Classification — Infrastructure vs Prompts](https://lab.pollack.ai/experiments/issue-classification.md): SWE-bench Lite: does infrastructure-level optimization beat prompt engineering?
- [Pollack AI Lab](https://lab.pollack.ai/index.md): Tools and experiments for building agents that work — and understanding why they work
- [ACP Java SDK](https://lab.pollack.ai/projects/acp-java-sdk.md): Agent Communication Protocol — build agents, consume agents, integrate with IDEs using a standard protocol
- [Agent Bench](https://lab.pollack.ai/projects/agent-bench.md): Benchmarking suite for Java-centric AI agents on real-world software engineering tasks
- [Agent Client](https://lab.pollack.ai/projects/agent-client.md): Autonomous CLI agent integrations for Spring AI — Claude Code, Gemini CLI, and SWE-bench agents
- [Agent Experiment](https://lab.pollack.ai/projects/agent-experiment.md): End-to-end experiment driver for evaluating AI coding agents against fixture datasets
- [Agent Hooks](https://lab.pollack.ai/projects/agent-hooks.md): Portable hook API for steering agent behavior at the tool-call boundary — write once, run on any runtime
- [Agent Journal](https://lab.pollack.ai/projects/agent-journal.md): Behavioral trace capture for agent research and observability
- [Agent Judge](https://lab.pollack.ai/projects/agent-judge.md): Agent-agnostic evaluation framework with deterministic, command, and LLM judges
- [Agent Memory](https://lab.pollack.ai/projects/agent-memory.md): Progressive memory management for Spring AI — from context compaction to autonomous memory control
- [Agent Sandbox](https://lab.pollack.ai/projects/agent-sandbox.md): Isolated command execution — Local, Docker, and E2B cloud backends behind a unified API
- [Agent Skills](https://lab.pollack.ai/projects/agent-skills.md): Curated domain knowledge modules — SkillsJars that make agents smarter without prompt engineering
- [Agent Tools](https://lab.pollack.ai/projects/agent-tools.md): Claude Code-inspired tools for Spring AI agents — file I/O, shell, search, web, and multi-agent orchestration
- [Agent Workflow](https://lab.pollack.ai/projects/agent-workflow.md): Build agents that work — and measure why they work. Multi-step pipelines with typed context, quality gates, and portable runtimes.
- [Agento Studio](https://lab.pollack.ai/projects/agento-studio.md): A systematic approach to growing AI agents — judges tell you if the agent worked, journals tell you why
- [Agento University](https://lab.pollack.ai/projects/agento-university.md): Hierarchical multi-agent platform for continuous, autonomous management of software development projects
- [AgentWorks BOM](https://lab.pollack.ai/projects/agentworks-bom.md): Bill of Materials for coordinated version management across all AgentWorks projects
- [Bud](https://lab.pollack.ai/projects/bud.md): ACP agent for Spring Boot development — proven patterns as a starting point, AI to adapt them
- [Claude Agent SDK (Java)](https://lab.pollack.ai/projects/claude-agent-sdk.md): Java SDK for Claude Code CLI integration — three-API architecture, sessions, MCP, multi-agent orchestration
- [Projects](https://lab.pollack.ai/projects/index.md): Tools for building, evaluating, and observing AI agents on the JVM
- [Loopy](https://lab.pollack.ai/projects/loopy.md): Loop-driven interactive coding agent CLI for Java developers
- [Spring AI A2A](https://lab.pollack.ai/projects/spring-ai-a2a.md): Agent-to-Agent protocol implementation for Spring AI

## Optional

- [Blog](https://blog.pollack.ai)
- [GitHub](https://github.com/markpollack)