Skip to main content

Overview

Agent Workflow helps you build agents that really work — understand why they work, then improve them in a controlled, experimental manner. You compose steps into workflows, each step doing one thing: call an LLM, run a function, invoke an external agent. Quality gates evaluate output at each stage. Every step transition is traced, feeding Agent Journal for behavioral analysis — so you can answer: does the agent need better real-time steering? What knowledge is it missing to achieve its goal? Which steps should be deterministic instead of LLM-driven? What new tools should be built? The philosophy follows what Stripe learned building Minions at scale: “The model does not run the system. The system runs the model.” A fluent DSL makes workflows easy to define — branching, loops, parallel execution, LLM-driven routing, error recovery. Steps exchange data through typed context. The workflow compiles to a graph intermediate representation that separates definition from execution, enabling portable runtimes without changing workflow code.
Workflow.define("pr-review")
    .step(fetchDiff)
    .then(analyzeDiff)
    .gate(new JudgeGate(jury, 0.8))
        .onPass(postComment)
        .onFail(revise)
    .end()
    .run(event);

Core Concepts

Steps are the building blocks. Each step takes input, does work, and produces output. Steps can be:
  • Deterministic — a Java function (GitHub API call, string formatting, file parsing)
  • Single LLM callChatClientStep wraps a Spring AI ChatClient call
  • Agentic CLI toolsClaudeStep uses the Claude Agent SDK for full multi-turn agent sessions with deep tracing. AgentClientStep wraps other agentic CLI tools — Google Gemini, OpenAI Codex, Amazon Q — via Agent Client, giving you a unified interface
A ClaudeStep or AgentClientStep isn’t a single API call — it runs a complete agentic loop internally (dozens of tool calls, minutes of execution) and returns a typed result. The workflow sees it as one step. Context threads through every step. Steps read input parameters by key, do their work, and write output parameters back. Downstream steps pick up what upstream steps produced — all type-safe via ContextKey<T>. The graph means the workflow definition is pure data — nodes and edges, not opaque lambdas. This enables:
  • Portable runtimes — the graph decouples definition from execution. Ships with LocalStepRunner (in-process, zero overhead). CheckpointingStepRunner (JDBC crash recovery via workflow-batch) and TemporalStepRunner (distributed durable execution) are planned — same workflow code, swap a single @Bean
  • Tracing — every step transition is recorded for observability and behavioral analysis via Agent Journal
  • Steering (planned) — runtime hooks that intercept before/after steps to enforce constraints, redirect behavior, or inject guidance. Deterministic or LLM-powered. Integrates with Spring AI advisors and the Claude Agent SDK hook system
  • Inspection — the graph is pure data (nodes + edges), not opaque lambdas

Documentation

AgentLoop

Ready-to-use SWE agent — tools, memory, run/chat API

Getting Started

Steps, context, portable runtimes, first workflow

Parameterization

4 patterns for getting data into steps

DSL Primitives

10+ composable patterns with code

Examples

8 tests validated against GPT-4.1

Spring Batch Mapping

Batch concepts → Agent Workflow

API Reference

Step, AgentContext, Gate, WorkflowGraph, StepRunner

Why Deterministic Steps Matter

The biggest insight from running real agent experiments: the AI shouldn’t do everything. This is the pattern Stripe describes in their Minions system, now shipping 1,300 PRs a week: the model does not run the system — the system runs the model. A real PR merge workflow illustrates the point. The pipeline has 8 steps:
checkoutPR → formatCode → collectContext → compile → squash → rebase → resolveConflicts → review
Six steps are deterministic — git operations, Java formatting, GitHub API calls, Maven compile. Only two need LLM reasoning: resolving merge conflicts and reviewing the diff. The deterministic steps are free, fast, and perfectly reliable. The LLM steps are expensive and variable. By minimizing what the LLM needs to do, you reduce cost, increase reliability, and make the whole pipeline easier to debug. This isn’t obvious until you measure it. In our code coverage experiments, adding a deterministic pre-analysis step cut agent steps by 27%. The agent still explored the codebase — but it read source files instead of decompiling JARs. Same attention budget, better allocation.

GitHub

Source code (0.2.0 on Maven Central)

Used In