Key Takeaway
By the end of this blueprint you will have a prompt management system with a versioned registry, an approval workflow, A/B testing via consistent hashing, rollback capabilities, and a runtime SDK that applications use to fetch prompts without code deployments.
Prerequisites
- PostgreSQL 15+ for the prompt registry
- Redis for the low-latency prompt cache
- Python 3.11+ or TypeScript 5+ for the SDK
- Basic understanding of content versioning and deployment pipelines
The Problem with Hardcoded Prompts
Hardcoded prompts create three problems as your team scales. First, every prompt change requires a code deployment, meaning your prompt engineer needs to go through the full CI/CD pipeline to test a wording tweak. Second, there is no audit trail — you cannot answer the question 'who changed the customer support prompt last Tuesday and what did it say before?' Third, A/B testing prompt variants requires feature flag infrastructure that most teams bolt on as an afterthought. A centralized prompt management system addresses all three by treating prompts as first-class managed artifacts.
Architecture Overview
The system consists of a prompt registry backed by a PostgreSQL database, a management UI for authoring and reviewing prompt versions, a deployment pipeline that syncs approved versions to a low-latency Redis cache, and an SDK that applications use to fetch prompts at runtime. Traffic splitting for A/B tests is handled at the SDK layer using consistent hashing on user or session identifiers.
Prompt Registry Schema
The registry stores prompts as named entities with multiple versions. Each version has a template (the prompt text with variable placeholders), metadata (author, change description, model compatibility), and a lifecycle status (draft, review, approved, deployed, archived). Only one version per prompt can be in the 'deployed' status at any time, which is the version that production applications receive.
-- Prompt registry schema
CREATE TYPE prompt_status AS ENUM (
'draft', 'review', 'approved', 'deployed', 'archived'
);
CREATE TABLE prompts (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
slug TEXT UNIQUE NOT NULL, -- e.g., 'customer-support-v2'
name TEXT NOT NULL,
description TEXT,
created_by TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE TABLE prompt_versions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
prompt_id UUID REFERENCES prompts(id) ON DELETE CASCADE,
version INTEGER NOT NULL,
template TEXT NOT NULL, -- Prompt text with {{variable}} placeholders
variables JSONB DEFAULT '[]', -- Expected variables and their types
model_hint TEXT, -- Recommended model
status prompt_status DEFAULT 'draft',
change_note TEXT,
created_by TEXT NOT NULL,
reviewed_by TEXT,
deployed_at TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT now(),
UNIQUE (prompt_id, version)
);
-- Only one deployed version per prompt
CREATE UNIQUE INDEX idx_one_deployed
ON prompt_versions (prompt_id)
WHERE status = 'deployed';
CREATE TABLE prompt_ab_tests (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
prompt_id UUID REFERENCES prompts(id),
control_id UUID REFERENCES prompt_versions(id),
variant_id UUID REFERENCES prompt_versions(id),
traffic_pct INTEGER DEFAULT 10, -- % to variant
started_at TIMESTAMPTZ DEFAULT now(),
ended_at TIMESTAMPTZ,
winner_id UUID REFERENCES prompt_versions(id)
);Runtime SDK
Applications never hardcode prompts. Instead, they use an SDK that fetches the current deployed version of a prompt by slug, renders the template with the provided variables, and returns the final prompt string. The SDK reads from the Redis cache with a fallback to PostgreSQL, so prompt fetches add sub-millisecond latency in the hot path. For A/B tests, the SDK uses consistent hashing on a provided identifier (user ID or session ID) to deterministically assign users to control or variant groups.
"""Prompt management SDK for runtime prompt fetching."""
from __future__ import annotations
import hashlib
import json
import re
from typing import Any
import redis.asyncio as redis
class PromptClient:
"""SDK for fetching and rendering managed prompts."""
def __init__(self, redis_url: str, db_fallback=None):
self._redis = redis.from_url(redis_url)
self._db = db_fallback
self._local_cache: dict[str, dict] = {}
async def get_prompt(
self,
slug: str,
variables: dict[str, Any] | None = None,
ab_identifier: str | None = None,
) -> str:
"""Fetch and render a prompt by slug.
Args:
slug: The prompt slug (e.g., 'customer-support').
variables: Template variables to render.
ab_identifier: User/session ID for A/B test assignment.
Returns:
The rendered prompt string.
"""
prompt_data = await self._fetch(slug)
# Check for active A/B test
if ab_identifier and prompt_data.get("ab_test"):
ab = prompt_data["ab_test"]
bucket = self._hash_to_bucket(ab_identifier)
if bucket < ab["traffic_pct"]:
template = ab["variant_template"]
else:
template = prompt_data["template"]
else:
template = prompt_data["template"]
return self._render(template, variables or {})
async def _fetch(self, slug: str) -> dict:
"""Fetch prompt data from cache with DB fallback."""
# L1: in-process cache
if slug in self._local_cache:
return self._local_cache[slug]
# L2: Redis cache
cached = await self._redis.get(f"prompt:{slug}")
if cached:
data = json.loads(cached)
self._local_cache[slug] = data
return data
# L3: Database fallback
if self._db:
data = await self._db.fetch_deployed_prompt(slug)
await self._redis.setex(
f"prompt:{slug}", 300, json.dumps(data)
)
self._local_cache[slug] = data
return data
raise ValueError(f"Prompt '{slug}' not found")
@staticmethod
def _hash_to_bucket(identifier: str) -> int:
"""Deterministic hash to 0-99 for A/B assignment."""
h = hashlib.sha256(identifier.encode()).hexdigest()
return int(h[:8], 16) % 100
@staticmethod
def _render(template: str, variables: dict) -> str:
"""Render template variables like {{name}}."""
def replacer(match):
key = match.group(1).strip()
return str(variables.get(key, match.group(0)))
return re.sub(r"\{\{(.+?)\}\}", replacer, template)A/B Testing Workflow
- 1
Create a prompt variant
Author a new version with the proposed changes. Set its status to 'approved' after review.
- 2
Configure the A/B test
Specify the control (current deployed) and variant versions, and set the traffic percentage (start at 5-10%).
- 3
Deploy the test
The deployment pipeline pushes both versions to cache with the A/B test configuration.
- 4
Monitor quality metrics
Track evaluation scores, user feedback, and task completion rates for both groups.
- 5
Promote or roll back
If the variant wins, promote it to deployed status. If it loses, archive it and end the test.
Use consistent hashing for A/B assignment rather than random sampling. This ensures the same user always sees the same prompt variant within a test, which is essential for measuring longitudinal quality metrics and avoids confusing users with inconsistent behavior.
CI/CD Integration
Prompt changes can be managed through a Git-based workflow. Store prompt templates as YAML files in a dedicated repository. A CI pipeline validates the template syntax, runs the prompt against an evaluation dataset, and creates a prompt_version record in the registry with status 'review'. Reviewers approve or reject through the management UI or via a PR approval. Approved prompts are deployed to cache by a CD pipeline that also handles cache invalidation across all application instances.
# Prompt definition file for CI/CD pipeline
slug: customer-support
name: Customer Support Agent
description: System prompt for the customer-facing support assistant
model_hint: claude-sonnet-4-20250514
template: |
You are a helpful customer support agent for {{company_name}}.
Guidelines:
- Be empathetic and professional
- Answer questions using only the provided context
- If you do not know the answer, say so honestly
- Never make up product features or policies
- Escalate billing and account deletion requests to a human agent
Customer tier: {{customer_tier}}
Context from knowledge base:
{{retrieved_context}}
variables:
- name: company_name
type: string
required: true
- name: customer_tier
type: string
required: true
enum: [free, starter, pro, enterprise]
- name: retrieved_context
type: string
required: trueRollback Strategy
Every deployed prompt version is immutable — you never edit a deployed version. To roll back, you re-deploy the previous version by changing its status back to 'deployed' and archiving the problematic version. The Redis cache is invalidated, and all applications pick up the previous version on their next prompt fetch. This gives you a clear audit trail and the ability to roll back in seconds rather than waiting for a code deployment.
Cache invalidation is the hard part. When you deploy a new prompt version, every application instance holding a local cache copy must refresh. Use a Redis pub/sub channel to broadcast invalidation events, and set a TTL on local caches as a safety net (5 minutes is a reasonable default).
Registry
Runtime
Process
Version History
1.0.0 · 2026-03-01
- • Initial publication with prompt registry schema and runtime SDK
- • A/B testing workflow with consistent hashing
- • CI/CD integration with YAML prompt definitions
- • Rollback strategy and cache invalidation patterns