AgentResearch

Agent Memory Architecture

A research implementation exploring how AI agents can maintain useful long-term memory. Implements three memory types: episodic (conversation history with importance scoring), semantic (extracted facts and relationships), and procedural (learned patterns and preferences). Includes forgetting curves to prevent memory bloat.

Claude Sonnetall-MiniLM-L6

3 memory typesEbbinghaus forgetting curvesGraph-based retrieval

Source Code

Status

Research

Tech Stack

PythonpgvectorNetworkXLangGraphNeo4j

Models

Claude Sonnetall-MiniLM-L6

Overview

Current AI agents forget everything between conversations. This research experiment implements a biologically-inspired memory architecture with three distinct memory types: episodic (what happened), semantic (what's true), and procedural (how to do things). An Ebbinghaus forgetting curve governs memory decay, ensuring the agent remembers important information while naturally forgetting irrelevant details.

Methodology

I implemented three memory stores and tested retention over 100 simulated conversation sessions spanning 30 days. Episodic memory stores full conversation turns with importance scoring (1-10) derived from user engagement signals. Semantic memory extracts facts and entity relationships into a Neo4j knowledge graph. Procedural memory records action patterns and user preferences. Memory decay follows Ebbinghaus's forgetting curve: R = e^(-t/S) where S (stability) increases with each successful retrieval.

Tech Stack

pgvector stores episodic memories with embedding-based retrieval. Neo4j hosts the semantic knowledge graph with entity-relationship triples. LangGraph manages the memory lifecycle (encode, consolidate, retrieve, decay). Python implements the Ebbinghaus forgetting curve with configurable stability parameters.

Key Findings

The most important insights from this experiment.

Forgetting curves prevent unbounded memory growth

Without decay, the memory store grew to 12,000 entries after 100 sessions, with retrieval latency degrading 4x. With Ebbinghaus curves, the active memory stabilized at ~800 high-value entries with consistent sub-100ms retrieval.

Semantic memory enables cross-conversation reasoning

By extracting entities and relationships into a knowledge graph, the agent could answer questions requiring synthesis across conversations — e.g., "What tools does this user prefer for Python projects?" by traversing preference nodes.

Procedural memory reduces repeated instructions by 60%

After learning that a user prefers TypeScript over JavaScript and pytest over unittest, the agent stopped asking clarifying questions about these preferences — reducing conversational friction measurably.

Importance scoring is the hardest problem

Automatically determining which memories matter is fundamentally difficult. The best heuristic combined explicit signals (user corrections, repeated mentions) with implicit signals (conversation length, follow-up questions).

Architecture

Each conversation flows through three phases: (1) Retrieval — relevant episodic, semantic, and procedural memories are fetched based on the current query embedding and recency. (2) Augmented generation — retrieved memories are injected into the system prompt with recency and importance weighting. (3) Consolidation — after the conversation, new episodic entries are created, facts are extracted to the knowledge graph, and procedural patterns are updated. A nightly decay job reduces stability scores for unretrieved memories.

Results

Over 100 simulated sessions: memory store stabilized at ~800 active entries (vs 12,000 without decay). Retrieval latency stayed under 100ms. User preference accuracy reached 89% by session 20. Cross-conversation reasoning succeeded on 73% of synthesis questions. Memory-augmented responses were rated 4.2/5 vs 3.1/5 for memoryless responses (n=15 evaluators).

Challenges

Key technical challenges encountered during this experiment.

Challenge 1

Memory collision and contradiction

When a user changes preferences, old memories contradict new ones. Implemented a temporal weighting system where newer memories override older ones, with explicit contradiction detection that marks outdated entries for accelerated decay.

Challenge 2

Privacy and selective forgetting

Users need the ability to delete specific memories ("forget my address"). Built a targeted deletion API that removes entries from all three memory stores and propagates deletions through the knowledge graph.