Overview
Agent Workflow helps you build agents that really work — understand why they work, then improve them in a controlled, experimental manner. You compose steps into workflows, each step doing one thing: call an LLM, run a function, invoke an external agent. Quality gates evaluate output at each stage. Every step transition is traced, feeding Agent Journal for behavioral analysis — so you can answer: does the agent need better real-time steering? What knowledge is it missing to achieve its goal? Which steps should be deterministic instead of LLM-driven? What new tools should be built? The philosophy follows what Stripe learned building Minions at scale: “The model does not run the system. The system runs the model.” A fluent DSL makes workflows easy to define — branching, loops, parallel execution, LLM-driven routing, error recovery. Steps exchange data through typed context. The workflow compiles to a graph intermediate representation that separates definition from execution, enabling portable runtimes without changing workflow code.Core Concepts
Steps are the building blocks. Each step takes input, does work, and produces output. Steps can be:- Deterministic — a Java function (GitHub API call, string formatting, file parsing)
- Single LLM call —
ChatClientStepwraps a Spring AIChatClientcall - Agentic CLI tools —
ClaudeStepuses the Claude Agent SDK for full multi-turn agent sessions with deep tracing.AgentClientStepwraps other agentic CLI tools — Google Gemini, OpenAI Codex, Amazon Q — via Agent Client, giving you a unified interface
ClaudeStep or AgentClientStep isn’t a single API call — it runs a complete agentic loop internally (dozens of tool calls, minutes of execution) and returns a typed result. The workflow sees it as one step.
Context threads through every step. Steps read input parameters by key, do their work, and write output parameters back. Downstream steps pick up what upstream steps produced — all type-safe via ContextKey<T>.
The graph means the workflow definition is pure data — nodes and edges, not opaque lambdas. This enables:
- Portable runtimes — the graph decouples definition from execution. Ships with
LocalStepRunner(in-process, zero overhead).CheckpointingStepRunner(JDBC crash recovery viaworkflow-batch) andTemporalStepRunner(distributed durable execution) are planned — same workflow code, swap a single@Bean - Tracing — every step transition is recorded for observability and behavioral analysis via Agent Journal
- Steering (planned) — runtime hooks that intercept before/after steps to enforce constraints, redirect behavior, or inject guidance. Deterministic or LLM-powered. Integrates with Spring AI advisors and the Claude Agent SDK hook system
- Inspection — the graph is pure data (nodes + edges), not opaque lambdas
Documentation
AgentLoop
Ready-to-use SWE agent — tools, memory, run/chat API
Getting Started
Steps, context, portable runtimes, first workflow
Parameterization
4 patterns for getting data into steps
DSL Primitives
10+ composable patterns with code
Examples
8 tests validated against GPT-4.1
Spring Batch Mapping
Batch concepts → Agent Workflow
API Reference
Step, AgentContext, Gate, WorkflowGraph, StepRunner
Why Deterministic Steps Matter
The biggest insight from running real agent experiments: the AI shouldn’t do everything. This is the pattern Stripe describes in their Minions system, now shipping 1,300 PRs a week: the model does not run the system — the system runs the model. A real PR merge workflow illustrates the point. The pipeline has 8 steps:Quick Links
GitHub
Source code (0.2.0 on Maven Central)
Used In
- Code Coverage v1 — Agent execution engine for all 9 variants
- Code Coverage v2 — Agent execution with skills injection
- Issue Classification — SWE-bench agent runner