Building Cortex: How We Built an Open-Source Memory Layer for Claude Code
The technical story behind Cortex -- from the failed approaches that taught us what not to build, to the architecture decisions that made persistent AI memory actually work. 9 packages, 7 quality gate rules, and 6 memory types.
Koundinya Lanka
Enterprise AI
Cortex started as a personal frustration. I was building TheProductionLine -- a monorepo with Next.js 16, Turborepo, Clerk, Drizzle, Stripe, and a custom AI pipeline -- and every Claude Code session began the same way. I would explain the monorepo structure. Re-state the naming conventions. Re-describe the BYOA architecture. Re-identify the same bugs. It was like having Groundhog Day with my most important development tool.
So I built Cortex. What shipped as v1.0.0 is a 9-package TypeScript monorepo that gives Claude Code persistent memory via the Model Context Protocol. But the interesting story is not the final architecture -- it is the four approaches I tried first and why they failed. Every failure taught me something about what persistent AI memory actually requires.
What Failed Before Cortex Worked
Attempt 1: The Giant Markdown File
My first approach was a manually curated context document -- a massive CLAUDE.md that described everything about the project. Architecture, conventions, known bugs, deployment notes. It worked for about two weeks. Then it was 4,000 lines long, perpetually outdated, and took 15 minutes to update after every significant session. The maintenance cost defeated the purpose.
Attempt 2: Conversation Log Replay
Next I tried saving full conversation transcripts and replaying the important parts at session start. This hit the context window wall immediately. A typical session generates 100K+ tokens. Even extracting 'key moments' manually produced 10-20K tokens of injected context -- most of it conversational noise rather than actual knowledge. Claude's responses degraded because its attention was split across stale conversation fragments.
Attempt 3: Save Everything, Filter Later
The third attempt was an automated save-everything approach. Hook into Claude's tool calls, capture every piece of context, dump it into SQLite, retrieve by relevance score. Within three days I had 400+ 'memories,' of which maybe 30 were useful. The rest were variations of 'I have updated the file' and 'the user wants to fix the bug.' The system was technically working and practically useless.
Key Insight
The core insight from three failed attempts: the hard problem is not storage or retrieval. It is curation. A memory system that stores everything is a memory system that remembers nothing, because the signal gets buried in noise.
The Architecture That Worked
Cortex v1.0.0 shipped as a Turborepo monorepo with 9 packages. Here is the architecture and why each piece exists.
The MCP Server (@cortex.memory/server)
The server exposes 7 MCP tools to Claude Code: save_memory, get_memories, search_memories, list_projects, delete_memory, supersede_memory, and update_memory. It runs as a Fastify daemon binding to localhost:7434. The MCP protocol was the right choice over a proxy approach because it gives Claude native tool access -- saves and retrievals happen as natural parts of the conversation rather than through a separate integration.
The Quality Gate: 7 Rules, Zero Negotiation
This is the component that makes Cortex work and the component that took the most iteration. Every memory passes through 7 sequential validation rules. If any rule fails, the memory is rejected with a specific error code and the AI can retry with improved content.
// Quality Gate Rules (sequential)
1. Length Check → 50-2000 chars, reason ≥ 10 chars
2. Banned Phrases → 50 generic phrases rejected
3. Sensitive Data → 8 regex patterns (AWS keys, tokens, etc.)
4. Quality Score → Minimum specificity threshold
5. Duplicate Check → TF-IDF cosine similarity < 0.85
6. Rate Limit → 50/session, 200/day/project
7. Reason Validation → Non-empty, meaningful justificationThe banned phrase list was the most impactful single addition. It blocks 50 common meta-commentary patterns that LLMs generate reflexively: 'I will now proceed to,' 'as the user requested,' 'the code has been updated.' These phrases account for roughly 60% of what an unfiltered system would save, and none of them are useful as memories.
Six Memory Types with Distinct Lifecycles
Early prototypes used a flat list of memories. This caused two problems: permanent decisions got crowded out by temporary context, and expired information lingered indefinitely. The six-type system (Decision, Context, Preference, Thread, Error, Learning) assigns each memory a lifecycle that matches its real-world behavior. Decisions are permanent unless explicitly superseded. Threads expire when resolved. Errors persist until the bug is fixed.
Project Detection: The 4-Layer Strategy
Memory isolation between projects is critical -- your Next.js app's conventions should never leak into your Python data pipeline. Cortex uses a 4-layer fallback: first it looks for a .cortex/project.json file (explicit initialization), then checks the git remote URL (works across clones), then uses the directory path, and finally auto-creates a new project entry. The git remote layer is particularly useful for teams where multiple developers clone the same repo -- memories sync across machines because the project identity is tied to the remote, not the local path.
Token-Budget-Aware Context Injection
Each project has a configurable context budget (default 4,000 tokens, range 1,000-12,000). When Claude starts a session, Cortex fills this budget with the highest-ranked memories using a weighted scoring algorithm: importance at 50%, confidence at 30%, recency at 20%. Memories unreviewed for 90+ days get their score halved. This means the most important, most confident, most recent memories get injected first, and stale content naturally drops out.
The Monorepo: 9 Packages
0
Types & constants
Zod schemas, memory types, quality gate constants. Shared across all packages.
0
MCP + API + DB
Fastify daemon, MCP tools, SQLite with FTS5, Turso sync engine, AI summarizer.
0
33 commands
Terminal interface with Commander.js. Memory CRUD, daemon control, sync management, diagnostics.
0
Web UI
Next.js 14 dashboard at localhost:7433. Search, filter, bulk operations, project switching.
Plus five more packages: an Electron desktop app with system tray, a SwiftUI native macOS app, a VS Code extension (beta), an installer package for Homebrew/curl/npm distribution, and a marketing web package for the product page at theproductionline.ai/tools/cortex.
Design Decisions Worth Calling Out
SQLite Over Postgres
Cortex is a local-first tool. Requiring a Postgres instance or a cloud database for basic functionality would kill adoption. SQLite via better-sqlite3 gives us single-file storage, zero configuration, and excellent read performance. FTS5 virtual tables handle full-text search. The only trade-off is concurrent write performance, which does not matter for a single-user tool.
TF-IDF Over Embeddings for Duplicate Detection
We chose TF-IDF cosine similarity over vector embeddings for duplicate detection. Embeddings would require either a local model (adding 500MB+ to install size) or an API call (requiring an API key for basic functionality). TF-IDF runs locally, requires no model, and is fast enough for comparing against a few hundred memories. At the scale Cortex operates (hundreds of memories per project, not millions), the accuracy difference is negligible.
Zero Telemetry as a Hard Requirement
We made zero telemetry a non-negotiable design constraint from day one. Developers are trusting Cortex with their most sensitive context -- architectural decisions, security workarounds, infrastructure details. Any telemetry, even anonymous crash reporting, would be a legitimate reason not to adopt the tool. The trade-off is that we have no usage analytics. We learn about bugs from GitHub issues, not dashboards. This is the right trade-off for a developer tool that handles sensitive context.
What Shipped in v1.0.0
Action Checklist
0 of 13 complete
Cortex is open source at github.com/ProductionLineHQ/cortex and available via brew tap ProductionLineHQ/cortex && brew install cortex-memory or npx @cortex.memory/cli init. If you are building with Claude Code and want it to remember your project across sessions, give it a try.
The best developer tools disappear into your workflow. Cortex works because you install it once and then forget it exists -- while Claude Code never forgets again.
-- Koundinya Lanka
Koundinya Lanka
Founder & CEO of TheProductionLine. Former Brillio engineering leader and Berkeley Haas alum. Builder of Cortex.
Enjoyed this article? Get more like it every week.