Agent Memory gives AI agents the ability to manage conversational context intelligently. Without memory management, every prior tool result is re-sent every turn — on long tasks, context fills with stale noise, costs climb, and the model loses focus. In production: 18M input tokens for a single code-coverage task. With compaction: 854K tokens. 21x reduction, same quality.The library starts with a proven context compaction strategy and progressively adds structured retrieval, reflection, and autonomous memory management — eventually reaching MemGPT-level capabilities. Each tier is independently useful.Agent Memory ships as a Spring AI BaseAdvisor — plug it into any ChatClient pipeline with one line:
var memoryStore = new FileSystemMemoryStore(Path.of(".memory"));var advisor = CompactionMemoryAdvisor.builder(memoryStore) .compactionChatClient(ChatClient.create(haikuModel)) .memoryTokenBudget(8192) .compactionRatio(0.75) .build();ChatClient agent = ChatClient.builder(chatModel) .defaultAdvisors(advisor) .build();
On each request, the advisor retrieves accumulated learnings (within the token budget) and injects them into the system message. After each response, it appends the assistant’s output to the store. When uncompacted entries exceed budget × ratio, compaction summarizes them via a cheap model and replaces them with dense summaries.
When accumulated context exceeds a token budget, older entries are summarized by a cheap model (e.g., Haiku) and replaced with a compact summary. The agent continues with dense, relevant context instead of an ever-growing prompt.Two parameters control it:
Real LLM benchmarks against Anthropic Haiku 4.5 on a 12-story e-commerce platform PRD:
Metric
Without Compaction
With Compaction
Stories passed
9/12
11/12
Total tokens
56,876
40,152
Total cost
$0.34
$0.24
Token growth without compaction is linear and unbounded (~800 tokens/story, reaching 9,000+ by story 12). With compaction it plateaus around 4,600 tokens after the first compaction cycle.
The memory budget is the critical tuning parameter:
Budget
Result
Notes
2,048
7/12 stories
Too aggressive — destroys critical details
4,096
11/12 stories
Sweet spot for structured tasks
8,192
Good
Best for unstructured conversations
At 2,048 tokens, compaction destroys critical details — table names, endpoint signatures, auth token formats — that later tasks depend on. At 4,096, it preserves what matters and discards what doesn’t.
Extracted from wiggum-memory — a research project that explored the Ralph Wiggum pattern for context management in AI agent loops. The memory subsystem proved to be the most broadly useful component, so it was promoted to a standalone library as part of the AgentWorks stack.