We use only essential, cookie‑free logs by default. Turn on analytics to help us improve. Read our Privacy Policy.
Back to case studies
MemoryRAGEmbeddingsMCPPostgreSQL

Semantic Memory System for AI Agents

Production-ready long-term memory with hybrid search, temporal intelligence, and smart context assembly.

AI InfrastructureOngoingInternal Infrastructure

Key Results

<100ms search latency
Temporal parsing (EN/RU)
SASH-F structure extraction
Token-aware context assembly

The Problem

AI assistants forget everything between sessions. Every conversation starts from zero. Users repeat themselves. Context is lost. Relationships don't deepen.

Standard RAG solutions retrieve relevant documents but don't understand temporal context ("yesterday", "last week"), don't extract structured information from unstructured memories, and don't build connections between related memories over time.


The Solution

Production-ready long-term memory infrastructure that gives AI persistent, searchable memory with human-like recall patterns.


Hybrid Search Engine

Semantic search via BGE-M3 embeddings — multilingual, 1024 dimensions, understands meaning not just keywords.

Keyword fallback via PostgreSQL full-text search — catches exact matches that embeddings might miss.

Cascaded strategy — fast embedding lookup first, keyword expansion when semantic search returns low confidence.

Cross-encoder reranking — precision boost on top results, ensures best matches surface first.


Temporal Intelligence

Natural language time parsing that understands context:

  • "yesterday", "last week", "October"
  • "21 августа" (Russian dates)
  • Relative references: "two days ago", "last month"

Automatically filters search results by date range when temporal context is detected.


SASH-F Structure Extraction

Every memory is automatically parsed into structured components:

  • Subject — who or what
  • Act — action or verb
  • State — current condition
  • Habit — recurring patterns
  • Frame — context or situation

This structure enables more precise retrieval and pattern detection across memory clusters.


Smart Context Assembly

Token-aware packing via tiktoken — knows exactly how much context fits.

Configurable budget — default 50k tokens, adjustable per use case.

Multi-factor relevance scoring:

  • Semantic similarity (40%)
  • Keyword overlap (30%)
  • Recency (25%)
  • Type bonuses (5%)

Technical Stack

ComponentTechnology
DatabasePostgreSQL + pgvector
EmbeddingsBGE-M3 (FlagEmbedding)
RerankingCross-encoder (sentence-transformers)
Tokenizationtiktoken (cl100k_base)
InterfaceMCP Protocol (stdio)
LanguagePython 3.11+ / asyncio

Performance

  • Cold start — ~3s (model loading)
  • Search latency — <100ms (semantic) / <50ms (keyword)
  • Reranking — +50-100ms for top-10
  • Embedding generation — ~20ms per memory
  • Memory footprint — ~2GB (models loaded)

What Makes It Different

FeatureMemory NexusRAG SystemsVector DBs
Hybrid searchSometimes
Temporal parsing
Structure extraction
Graph relationsSome
Token budgeting
MCP native

Use Cases

Personal AI Assistant — remembers preferences, history, relationships across sessions.

Customer Support Bot — tracks issues, learns patterns, maintains context across tickets.

Research Assistant — accumulates knowledge, finds connections between papers and notes.

Enterprise Knowledge Base — searchable institutional memory that grows with the organization.


Production system actively used in multi-agent deployments. Available as part of our RAG & Knowledge Systems consulting.

Have a similar challenge?

Let's discuss how we can help. Free consultation, no obligations.

Book a Call