You’re going to build a file-based research knowledge base and teach an AI agent to navigate it. By the end, you’ll ask a question about coding agents and get a grounded answer — sourced from real papers, not just the model’s parametric memory. Time: ~20 minutes. Result: A working research KB with 5 papers, routing tables, and a research partner you can query.Documentation Index
Fetch the complete documentation index at: https://lab.pollack.ai/llms.txt
Use this file to discover all available pages before exploring further.
What You’re Building
This is not a chatbot. It’s not a vector database. It’s a structured file system that an agent reads directly — markdown files with routing tables that guide the agent to the right context. Three ideas make it work:- Knowledge lives in files — summaries, routing tables, and metadata are plain markdown in git
- The agent reads those files directly — no embeddings, no vector search, no retrieval pipeline
- Routing tables guide the agent to the right context — this replaces vector search for this class of problems
Prerequisites
- Python 3 (any recent version — no pip packages needed)
- Git
- Claude Code
- Internet access (for arXiv downloads)
Step 1: Clone and Scaffold
Clone Agento Studio and scaffold your research KB:Step 2: Seed Your Paper Tracker
Tell Claude Code to populate the tracker with seed papers:Step 3: Ingest Papers — The Deterministic Layer
Run the arXiv ingest script to download PDFs, metadata, and LaTeX source:The Anthropic blog post doesn’t have an arXiv ID — fetch it separately using Claude Code’s WebFetch or save it manually.
Step 4: Generate Your First Summary
Ask Claude Code to read a paper and write a structured summary:.tex files directly and produces a grounded summary. Update the paper tracker to mark it as Summarized.
Quick Test — Your First Query
Now ask a question:papers/summaries/react-reasoning-acting.md and cite specific findings.
Repeat Step 4 for the remaining papers, then continue.
Step 5: Build Routing Tables
With summaries written, create the routing layer. Createpapers/summaries/index.md:
Read when... column is the core mechanism. When the agent gets a question, it reads this table and follows the link whose description matches. If the agent answers poorly, this table is usually the problem.
The Not Covered section prevents the agent from searching for content that doesn’t exist.
Step 6: Ask a Real Question
Ask something that requires cross-summary reasoning:If It Doesn’t Work
When the agent gives a wrong or weak answer, the fix is always in the knowledge — not in prompts.| Symptom | Fix |
|---|---|
| Wrong answer | Routing table Read when... descriptions don’t match the question |
| Missing detail | Summaries are too shallow — add more content |
| Hallucination | Not Covered section is missing a topic |
Step 7: Validate Your KB
Run the health check to catch structural issues:What You Learned
- Routing tables replace vector search for this class of problems
- Knowledge improves by editing files, not tuning prompts
- Structure enables agent navigation — the agent reads, decides, synthesizes
Why This Works
Instead of probabilistic retrieval:- You control exactly what the agent reads
- Context selection is explicit, not fuzzy
- Improvements are local — edit a file, not a system
What Just Happened
You built a research partner that answers questions grounded in real papers. This same pattern scales to codebases, issue trackers, and multi-agent systems. This is an example of the Forge methodology — a way to build agent-native knowledge systems incrementally.Next Steps
- Add more papers — Expand the tracker, run the batch pipeline, write summaries
- Synthesize themes — Write cross-cutting analysis in
findings/ - Federate — Connect this KB to other projects via
KB-FEDERATION.md - Explore the full methodology — See the Agento Studio project page