Overview
Agent Memory gives AI agents the ability to manage conversational context intelligently. Without memory management, every prior tool result is re-sent every turn — on long tasks, context fills with stale noise, costs climb, and the model loses focus. In production: 18M input tokens for a single code-coverage task. With compaction: 854K tokens. 21x reduction, same quality. The library starts with a proven context compaction strategy and progressively adds structured retrieval, reflection, and autonomous memory management — eventually reaching MemGPT-level capabilities. Each tier is independently useful. Agent Memory ships as a Spring AIBaseAdvisor — plug it into any ChatClient pipeline with one line:
budget × ratio, compaction summarizes them via a cheap model and replaces them with dense summaries.
How Compaction Works
When accumulated context exceeds a token budget, older entries are summarized by a cheap model (e.g., Haiku) and replaced with a compact summary. The agent continues with dense, relevant context instead of an ever-growing prompt. Two parameters control it:| Parameter | Default | Description |
|---|---|---|
memoryTokenBudget | 8,192 | Max tokens of memory included in each prompt |
compactionRatio | 0.75 | Fraction of budget that triggers compaction |
Benchmark Data
Real LLM benchmarks against Anthropic Haiku 4.5 on a 12-story e-commerce platform PRD:| Metric | Without Compaction | With Compaction |
|---|---|---|
| Stories passed | 9/12 | 11/12 |
| Total tokens | 56,876 | 40,152 |
| Total cost | $0.34 | $0.24 |
Budget Sensitivity
The memory budget is the critical tuning parameter:| Budget | Result | Notes |
|---|---|---|
| 2,048 | 7/12 stories | Too aggressive — destroys critical details |
| 4,096 | 11/12 stories | Sweet spot for structured tasks |
| 8,192 | Good | Best for unstructured conversations |
Production Validation
In a code-coverage experiment with Loopy:| Configuration | Compaction | Input Tokens | Outcome |
|---|---|---|---|
| No compaction | none | 18,336,594 | Failed |
| Threshold 0.5 | late | — | Cost cap hit |
| Threshold 0.3 | early | 854,353 | Passed |
Roadmap
| Tier | Name | Status | Description |
|---|---|---|---|
| 1 | Compaction | Shipping | Token-budgeted retrieval + LLM summarization |
| 2 | Structured | Planned | Categorized memory with selective retrieval and per-category retention policies |
| 3 | Reflective | Planned | Importance scoring + periodic reflection synthesis (Generative Agents pattern) |
| 4 | Autonomous | Planned | Agent-controlled memory via tools — virtual context management (MemGPT pattern) |
Modules
| Module | Description |
|---|---|
memory-core | MemoryStore interface, FileSystemMemoryStore, MemoryCompactor, TokenEstimator |
memory-advisor | CompactionMemoryAdvisor — Spring AI BaseAdvisor for ChatClient integration |
Quick Links
GitHub
Source code (0.1.0 on Maven Central)