Semantic Memory System for AI Agents
Production-ready long-term memory with hybrid search, temporal intelligence, and smart context assembly.
Key Results
The Problem
AI assistants forget everything between sessions. Every conversation starts from zero. Users repeat themselves. Context is lost. Relationships don't deepen.
Standard RAG solutions retrieve relevant documents but don't understand temporal context ("yesterday", "last week"), don't extract structured information from unstructured memories, and don't build connections between related memories over time.
The Solution
Production-ready long-term memory infrastructure that gives AI persistent, searchable memory with human-like recall patterns.
Hybrid Search Engine
Semantic search via BGE-M3 embeddings — multilingual, 1024 dimensions, understands meaning not just keywords.
Keyword fallback via PostgreSQL full-text search — catches exact matches that embeddings might miss.
Cascaded strategy — fast embedding lookup first, keyword expansion when semantic search returns low confidence.
Cross-encoder reranking — precision boost on top results, ensures best matches surface first.
Temporal Intelligence
Natural language time parsing that understands context:
- "yesterday", "last week", "October"
- "21 августа" (Russian dates)
- Relative references: "two days ago", "last month"
Automatically filters search results by date range when temporal context is detected.
SASH-F Structure Extraction
Every memory is automatically parsed into structured components:
- Subject — who or what
- Act — action or verb
- State — current condition
- Habit — recurring patterns
- Frame — context or situation
This structure enables more precise retrieval and pattern detection across memory clusters.
Smart Context Assembly
Token-aware packing via tiktoken — knows exactly how much context fits.
Configurable budget — default 50k tokens, adjustable per use case.
Multi-factor relevance scoring:
- Semantic similarity (40%)
- Keyword overlap (30%)
- Recency (25%)
- Type bonuses (5%)
Technical Stack
| Component | Technology |
|---|---|
| Database | PostgreSQL + pgvector |
| Embeddings | BGE-M3 (FlagEmbedding) |
| Reranking | Cross-encoder (sentence-transformers) |
| Tokenization | tiktoken (cl100k_base) |
| Interface | MCP Protocol (stdio) |
| Language | Python 3.11+ / asyncio |
Performance
- Cold start — ~3s (model loading)
- Search latency — <100ms (semantic) / <50ms (keyword)
- Reranking — +50-100ms for top-10
- Embedding generation — ~20ms per memory
- Memory footprint — ~2GB (models loaded)
What Makes It Different
| Feature | Memory Nexus | RAG Systems | Vector DBs |
|---|---|---|---|
| Hybrid search | ✓ | Sometimes | ✗ |
| Temporal parsing | ✓ | ✗ | ✗ |
| Structure extraction | ✓ | ✗ | ✗ |
| Graph relations | ✓ | ✗ | Some |
| Token budgeting | ✓ | ✗ | ✗ |
| MCP native | ✓ | ✗ | ✗ |
Use Cases
Personal AI Assistant — remembers preferences, history, relationships across sessions.
Customer Support Bot — tracks issues, learns patterns, maintains context across tickets.
Research Assistant — accumulates knowledge, finds connections between papers and notes.
Enterprise Knowledge Base — searchable institutional memory that grows with the organization.
Production system actively used in multi-agent deployments. Available as part of our RAG & Knowledge Systems consulting.
