Enterprise AI

Building Cortex: How We Built an Open-Source Memory Layer for Claude Code

The technical story behind Cortex -- from the failed approaches that taught us what not to build, to the architecture decisions that made persistent AI memory actually work. 9 packages, 7 quality gate rules, and 6 memory types.

Koundinya Lanka

Enterprise AI

Mar 18, 2026

12 min read

Cortex started as a personal frustration. I was building TheProductionLine -- a monorepo with Next.js 16, Turborepo, Clerk, Drizzle, Stripe, and a custom AI pipeline -- and every Claude Code session began the same way. I would explain the monorepo structure. Re-state the naming conventions. Re-describe the BYOA architecture. Re-identify the same bugs. It was like having Groundhog Day with my most important development tool.

So I built Cortex. What shipped as v1.0.0 is a 9-package TypeScript monorepo that gives Claude Code persistent memory via the Model Context Protocol. But the interesting story is not the final architecture -- it is the four approaches I tried first and why they failed. Every failure taught me something about what persistent AI memory actually requires.

What Failed Before Cortex Worked

Attempt 1: The Giant Markdown File

My first approach was a manually curated context document -- a massive CLAUDE.md that described everything about the project. Architecture, conventions, known bugs, deployment notes. It worked for about two weeks. Then it was 4,000 lines long, perpetually outdated, and took 15 minutes to update after every significant session. The maintenance cost defeated the purpose.

Attempt 2: Conversation Log Replay

Next I tried saving full conversation transcripts and replaying the important parts at session start. This hit the context window wall immediately. A typical session generates 100K+ tokens. Even extracting 'key moments' manually produced 10-20K tokens of injected context -- most of it conversational noise rather than actual knowledge. Claude's responses degraded because its attention was split across stale conversation fragments.

Attempt 3: Save Everything, Filter Later

The third attempt was an automated save-everything approach. Hook into Claude's tool calls, capture every piece of context, dump it into SQLite, retrieve by relevance score. Within three days I had 400+ 'memories,' of which maybe 30 were useful. The rest were variations of 'I have updated the file' and 'the user wants to fix the bug.' The system was technically working and practically useless.

Key Insight

The core insight from three failed attempts: the hard problem is not storage or retrieval. It is curation. A memory system that stores everything is a memory system that remembers nothing, because the signal gets buried in noise.

The Architecture That Worked

Cortex v1.0.0 shipped as a Turborepo monorepo with 9 packages. Here is the architecture and why each piece exists.

The MCP Server (@cortex.memory/server)

The server exposes 7 MCP tools to Claude Code: save_memory, get_memories, search_memories, list_projects, delete_memory, supersede_memory, and update_memory. It runs as a Fastify daemon binding to localhost:7434. The MCP protocol was the right choice over a proxy approach because it gives Claude native tool access -- saves and retrievals happen as natural parts of the conversation rather than through a separate integration.

The Quality Gate: 7 Rules, Zero Negotiation

This is the component that makes Cortex work and the component that took the most iteration. Every memory passes through 7 sequential validation rules. If any rule fails, the memory is rejected with a specific error code and the AI can retry with improved content.

typescript

// Quality Gate Rules (sequential)
1. Length Check      → 50-2000 chars, reason ≥ 10 chars
2. Banned Phrases    → 50 generic phrases rejected
3. Sensitive Data    → 8 regex patterns (AWS keys, tokens, etc.)
4. Quality Score     → Minimum specificity threshold
5. Duplicate Check   → TF-IDF cosine similarity < 0.85
6. Rate Limit        → 50/session, 200/day/project
7. Reason Validation → Non-empty, meaningful justification

The banned phrase list was the most impactful single addition. It blocks 50 common meta-commentary patterns that LLMs generate reflexively: 'I will now proceed to,' 'as the user requested,' 'the code has been updated.' These phrases account for roughly 60% of what an unfiltered system would save, and none of them are useful as memories.

Six Memory Types with Distinct Lifecycles

Early prototypes used a flat list of memories. This caused two problems: permanent decisions got crowded out by temporary context, and expired information lingered indefinitely. The six-type system (Decision, Context, Preference, Thread, Error, Learning) assigns each memory a lifecycle that matches its real-world behavior. Decisions are permanent unless explicitly superseded. Threads expire when resolved. Errors persist until the bug is fixed.

Project Detection: The 4-Layer Strategy

Memory isolation between projects is critical -- your Next.js app's conventions should never leak into your Python data pipeline. Cortex uses a 4-layer fallback: first it looks for a .cortex/project.json file (explicit initialization), then checks the git remote URL (works across clones), then uses the directory path, and finally auto-creates a new project entry. The git remote layer is particularly useful for teams where multiple developers clone the same repo -- memories sync across machines because the project identity is tied to the remote, not the local path.

Token-Budget-Aware Context Injection

Each project has a configurable context budget (default 4,000 tokens, range 1,000-12,000). When Claude starts a session, Cortex fills this budget with the highest-ranked memories using a weighted scoring algorithm: importance at 50%, confidence at 30%, recency at 20%. Memories unreviewed for 90+ days get their score halved. This means the most important, most confident, most recent memories get injected first, and stale content naturally drops out.

The Monorepo: 9 Packages

Types & constants

Zod schemas, memory types, quality gate constants. Shared across all packages.

MCP + API + DB

Fastify daemon, MCP tools, SQLite with FTS5, Turso sync engine, AI summarizer.

33 commands

Terminal interface with Commander.js. Memory CRUD, daemon control, sync management, diagnostics.

Web UI

Next.js 14 dashboard at localhost:7433. Search, filter, bulk operations, project switching.

Plus five more packages: an Electron desktop app with system tray, a SwiftUI native macOS app, a VS Code extension (beta), an installer package for Homebrew/curl/npm distribution, and a marketing web package for the product page at theproductionline.ai/tools/cortex.

Design Decisions Worth Calling Out

SQLite Over Postgres

Cortex is a local-first tool. Requiring a Postgres instance or a cloud database for basic functionality would kill adoption. SQLite via better-sqlite3 gives us single-file storage, zero configuration, and excellent read performance. FTS5 virtual tables handle full-text search. The only trade-off is concurrent write performance, which does not matter for a single-user tool.

TF-IDF Over Embeddings for Duplicate Detection

We chose TF-IDF cosine similarity over vector embeddings for duplicate detection. Embeddings would require either a local model (adding 500MB+ to install size) or an API call (requiring an API key for basic functionality). TF-IDF runs locally, requires no model, and is fast enough for comparing against a few hundred memories. At the scale Cortex operates (hundreds of memories per project, not millions), the accuracy difference is negligible.

Zero Telemetry as a Hard Requirement

We made zero telemetry a non-negotiable design constraint from day one. Developers are trusting Cortex with their most sensitive context -- architectural decisions, security workarounds, infrastructure details. Any telemetry, even anonymous crash reporting, would be a legitimate reason not to adopt the tool. The trade-off is that we have no usage analytics. We learn about bugs from GitHub issues, not dashboards. This is the right trade-off for a developer tool that handles sensitive context.

What Shipped in v1.0.0

Action Checklist

0 of 13 complete

Cortex is open source at github.com/ProductionLineHQ/cortex and available via brew tap ProductionLineHQ/cortex && brew install cortex-memory or npx @cortex.memory/cli init. If you are building with Claude Code and want it to remember your project across sessions, give it a try.

The best developer tools disappear into your workflow. Cortex works because you install it once and then forget it exists -- while Claude Code never forgets again.
-- Koundinya Lanka

Open SourceCortexClaude CodeMCPDeveloper ToolsArchitectureSQLite

Share this article

Koundinya Lanka

Founder & CEO of TheProductionLine. Former Brillio engineering leader and Berkeley Haas alum. Builder of Cortex.

Enjoyed this article? Get more like it every week.

Back to blog

Enterprise AI Use Cases by Industry: 50+ Real Examples That Actually Work

A comprehensive catalog of proven AI use cases across healthcare, finance, manufacturing, retail, logistics, and energy -- with real metrics, implementation complexity, and ROI timelines for each.

18 min read

How to Write an AI Business Case That Gets Funded (Template + Framework)

A step-by-step framework for building AI business cases that survive executive scrutiny. Includes ROI formulas, template sections, common pitfalls, and the exact structure used by teams that consistently get AI budgets approved.

14 min read

The Complete Guide to AI Governance for Enterprise (2026)

A comprehensive guide to building an AI governance framework for enterprise organizations. Covers regulatory requirements, policy templates, compliance strategies, and implementation steps for responsible AI deployment.

20 min read

Enterprise AI

Building Cortex: How We Built an Open-Source Memory Layer for Claude Code

Koundinya Lanka

Enterprise AI

Mar 18, 2026

12 min read

What Failed Before Cortex Worked

Attempt 1: The Giant Markdown File

Attempt 2: Conversation Log Replay

Attempt 3: Save Everything, Filter Later

Key Insight

The Architecture That Worked

Cortex v1.0.0 shipped as a Turborepo monorepo with 9 packages. Here is the architecture and why each piece exists.

The MCP Server (@cortex.memory/server)

The Quality Gate: 7 Rules, Zero Negotiation

typescript

// Quality Gate Rules (sequential)
1. Length Check      → 50-2000 chars, reason ≥ 10 chars
2. Banned Phrases    → 50 generic phrases rejected
3. Sensitive Data    → 8 regex patterns (AWS keys, tokens, etc.)
4. Quality Score     → Minimum specificity threshold
5. Duplicate Check   → TF-IDF cosine similarity < 0.85
6. Rate Limit        → 50/session, 200/day/project
7. Reason Validation → Non-empty, meaningful justification

Six Memory Types with Distinct Lifecycles

Project Detection: The 4-Layer Strategy

Token-Budget-Aware Context Injection

The Monorepo: 9 Packages

Types & constants

Zod schemas, memory types, quality gate constants. Shared across all packages.

MCP + API + DB

Fastify daemon, MCP tools, SQLite with FTS5, Turso sync engine, AI summarizer.

33 commands

Terminal interface with Commander.js. Memory CRUD, daemon control, sync management, diagnostics.

Web UI

Next.js 14 dashboard at localhost:7433. Search, filter, bulk operations, project switching.

Design Decisions Worth Calling Out

SQLite Over Postgres

TF-IDF Over Embeddings for Duplicate Detection

Zero Telemetry as a Hard Requirement

What Shipped in v1.0.0

Action Checklist

0 of 13 complete

The best developer tools disappear into your workflow. Cortex works because you install it once and then forget it exists -- while Claude Code never forgets again.
-- Koundinya Lanka

Open SourceCortexClaude CodeMCPDeveloper ToolsArchitectureSQLite

Share this article

Koundinya Lanka

Founder & CEO of TheProductionLine. Former Brillio engineering leader and Berkeley Haas alum. Builder of Cortex.

Enjoyed this article? Get more like it every week.

Back to blog

Enterprise AI Use Cases by Industry: 50+ Real Examples That Actually Work

A comprehensive catalog of proven AI use cases across healthcare, finance, manufacturing, retail, logistics, and energy -- with real metrics, implementation complexity, and ROI timelines for each.

18 min read

How to Write an AI Business Case That Gets Funded (Template + Framework)

14 min read

The Complete Guide to AI Governance for Enterprise (2026)

20 min read

What Failed Before Cortex Worked

Attempt 1: The Giant Markdown File

Attempt 2: Conversation Log Replay

Attempt 3: Save Everything, Filter Later

The Architecture That Worked

The MCP Server (@cortex.memory/server)

The Quality Gate: 7 Rules, Zero Negotiation

Six Memory Types with Distinct Lifecycles

Project Detection: The 4-Layer Strategy

Token-Budget-Aware Context Injection

The Monorepo: 9 Packages

Design Decisions Worth Calling Out

SQLite Over Postgres

TF-IDF Over Embeddings for Duplicate Detection

Zero Telemetry as a Hard Requirement

What Shipped in v1.0.0

Koundinya Lanka

Related articles

Enterprise AI Use Cases by Industry: 50+ Real Examples That Actually Work

How to Write an AI Business Case That Gets Funded (Template + Framework)

The Complete Guide to AI Governance for Enterprise (2026)

What Failed Before Cortex Worked

Attempt 1: The Giant Markdown File

Attempt 2: Conversation Log Replay

Attempt 3: Save Everything, Filter Later

The Architecture That Worked

The MCP Server (@cortex.memory/server)

The Quality Gate: 7 Rules, Zero Negotiation

Six Memory Types with Distinct Lifecycles

Project Detection: The 4-Layer Strategy

Token-Budget-Aware Context Injection

The Monorepo: 9 Packages

Design Decisions Worth Calling Out

SQLite Over Postgres

TF-IDF Over Embeddings for Duplicate Detection

Zero Telemetry as a Hard Requirement

What Shipped in v1.0.0

Koundinya Lanka

Related articles

Enterprise AI Use Cases by Industry: 50+ Real Examples That Actually Work

How to Write an AI Business Case That Gets Funded (Template + Framework)

The Complete Guide to AI Governance for Enterprise (2026)