Agent Memory

What’s New → Latest — 0.3.0: upgrades to Spring AI 2.0.0 GA and adds an OWASP dependency-check CVE gate. The Tier-1 compaction core (CompactionMemoryAdvisor, MemoryStore) shipped in 0.1.0 and is unchanged; 0.2.0 aligned on the Jackson 2.21.2 BOM.

Overview

Agent Memory gives AI agents the ability to manage conversational context intelligently. Without memory management, every prior tool result is re-sent every turn — on long tasks, context fills with stale noise, costs climb, and the model loses focus. In production: 18M input tokens for a single code-coverage task. With compaction: 854K tokens. 21x reduction, same quality. The library starts with a proven context compaction strategy and progressively adds structured retrieval, reflection, and autonomous memory management — eventually reaching MemGPT-level capabilities. Each tier is independently useful. Agent Memory ships as a Spring AI BaseAdvisor — plug it into any ChatClient pipeline with one line:

var memoryStore = new FileSystemMemoryStore(Path.of(".memory"));

var advisor = CompactionMemoryAdvisor.builder(memoryStore)
    .compactionChatClient(ChatClient.create(haikuModel))
    .memoryTokenBudget(8192)
    .compactionRatio(0.75)
    .build();

ChatClient agent = ChatClient.builder(chatModel)
    .defaultAdvisors(advisor)
    .build();

On each request, the advisor retrieves accumulated learnings (within the token budget) and injects them into the system message. After each response, it appends the assistant’s output to the store. When uncompacted entries exceed budget × ratio, compaction summarizes them via a cheap model and replaces them with dense summaries.

How Compaction Works

When accumulated context exceeds a token budget, older entries are summarized by a cheap model (e.g., Haiku) and replaced with a compact summary. The agent continues with dense, relevant context instead of an ever-growing prompt. Two parameters control it:

Parameter	Default	Description
`memoryTokenBudget`	8,192	Max tokens of memory included in each prompt
`compactionRatio`	0.75	Fraction of budget that triggers compaction

Benchmark Data

Real LLM benchmarks against Anthropic Haiku 4.5 on a 12-story e-commerce platform PRD:

Metric	Without Compaction	With Compaction
Stories passed	9/12	11/12
Total tokens	56,876	40,152
Total cost	$0.34	$0.24

Token growth without compaction is linear and unbounded (~800 tokens/story, reaching 9,000+ by story 12). With compaction it plateaus around 4,600 tokens after the first compaction cycle.

Budget Sensitivity

The memory budget is the critical tuning parameter:

Budget	Result	Notes
2,048	7/12 stories	Too aggressive — destroys critical details
4,096	11/12 stories	Sweet spot for structured tasks
8,192	Good	Best for unstructured conversations

At 2,048 tokens, compaction destroys critical details — table names, endpoint signatures, auth token formats — that later tasks depend on. At 4,096, it preserves what matters and discards what doesn’t.

Production Validation

In a code-coverage experiment with Loopy:

Configuration	Compaction	Input Tokens	Outcome
No compaction	none	18,336,594	Failed
Threshold 0.5	late	—	Cost cap hit
Threshold 0.3	early	854,353	Passed

Compaction does not change answer quality. It determines whether the run finishes within budget.

Roadmap

Tier	Name	Status	Description
1	Compaction	Shipping	Token-budgeted retrieval + LLM summarization
2	Structured	Planned	Categorized memory with selective retrieval and per-category retention policies
3	Reflective	Planned	Importance scoring + periodic reflection synthesis (Generative Agents pattern)
4	Autonomous	Planned	Agent-controlled memory via tools — virtual context management (MemGPT pattern)

Modules

Module	Description
`memory-core`	`MemoryStore` interface, `FileSystemMemoryStore`, `MemoryCompactor`, `TokenEstimator`
`memory-advisor`	`CompactionMemoryAdvisor` — Spring AI `BaseAdvisor` for ChatClient integration

Quick Links

GitHub

Source code (0.3.0 on Maven Central)

Origin

Extracted from wiggum-memory — a research project that explored the Ralph Wiggum pattern for context management in AI agent loops. The memory subsystem proved to be the most broadly useful component, so it was promoted to a standalone library as part of the AgentWorks stack.

Projects

AgentWorks

Agento

Supporting Projects

Migration

Overview

How Compaction Works

Benchmark Data

Budget Sensitivity

Production Validation

Roadmap

Modules

Quick Links

GitHub

Origin

​Overview

​How Compaction Works

​Benchmark Data

​Budget Sensitivity

​Production Validation

​Roadmap

​Modules

​Quick Links

GitHub

​Origin

Overview

How Compaction Works

Benchmark Data

Budget Sensitivity

Production Validation

Roadmap

Modules

Quick Links

Origin