MemoryRAGEmbeddingsMCPPostgreSQL

Semantic Memory System for AI Agents

Production-ready long-term memory with hybrid search, temporal intelligence, and smart context assembly.

AI InfrastructureOngoingInternal Infrastructure

Key Results

<100ms search latency

Temporal parsing (EN/RU)

SASH-F structure extraction

Token-aware context assembly

Services Used:RAG & Knowledge Systems Custom AI Agents

The Problem

AI assistants forget everything between sessions. Every conversation starts from zero. Users repeat themselves. Context is lost. Relationships don't deepen.

Standard RAG solutions retrieve relevant documents but don't understand temporal context ("yesterday", "last week"), don't extract structured information from unstructured memories, and don't build connections between related memories over time.

The Solution

Production-ready long-term memory infrastructure that gives AI persistent, searchable memory with human-like recall patterns.

Hybrid Search Engine

Semantic search via BGE-M3 embeddings — multilingual, 1024 dimensions, understands meaning not just keywords.

Keyword fallback via PostgreSQL full-text search — catches exact matches that embeddings might miss.

Cascaded strategy — fast embedding lookup first, keyword expansion when semantic search returns low confidence.

Cross-encoder reranking — precision boost on top results, ensures best matches surface first.

Temporal Intelligence

Natural language time parsing that understands context:

"yesterday", "last week", "October"
"21 августа" (Russian dates)
Relative references: "two days ago", "last month"

Automatically filters search results by date range when temporal context is detected.

SASH-F Structure Extraction

Every memory is automatically parsed into structured components:

Subject — who or what
Act — action or verb
State — current condition
Habit — recurring patterns
Frame — context or situation

This structure enables more precise retrieval and pattern detection across memory clusters.

Smart Context Assembly

Token-aware packing via tiktoken — knows exactly how much context fits.

Configurable budget — default 50k tokens, adjustable per use case.

Multi-factor relevance scoring:

Semantic similarity (40%)
Keyword overlap (30%)
Recency (25%)
Type bonuses (5%)

Technical Stack

Component	Technology
Database	PostgreSQL + pgvector
Embeddings	BGE-M3 (FlagEmbedding)
Reranking	Cross-encoder (sentence-transformers)
Tokenization	tiktoken (cl100k_base)
Interface	MCP Protocol (stdio)
Language	Python 3.11+ / asyncio

Performance

Cold start — ~3s (model loading)
Search latency — <100ms (semantic) / <50ms (keyword)
Reranking — +50-100ms for top-10
Embedding generation — ~20ms per memory
Memory footprint — ~2GB (models loaded)

What Makes It Different

Feature	Memory Nexus	RAG Systems	Vector DBs
Hybrid search	✓	Sometimes	✗
Temporal parsing	✓	✗	✗
Structure extraction	✓	✗	✗
Graph relations	✓	✗	Some
Token budgeting	✓	✗	✗
MCP native	✓	✗	✗

Use Cases

Personal AI Assistant — remembers preferences, history, relationships across sessions.

Customer Support Bot — tracks issues, learns patterns, maintains context across tickets.

Research Assistant — accumulates knowledge, finds connections between papers and notes.

Enterprise Knowledge Base — searchable institutional memory that grows with the organization.

Production system actively used in multi-agent deployments. Available as part of our RAG & Knowledge Systems consulting.