Key Takeaway
By the end of this blueprint you will have an AI-aware feature flag system with percentage rollouts, quality-gated promotion that automatically advances rollout when evaluation scores are stable, and kill switches that instantly disable degraded AI features without a code deployment.
Prerequisites
- An existing feature flag service (LaunchDarkly, Unleash, or custom) or willingness to build one
- An AI observability stack producing quality scores (see AI Observability Stack blueprint)
- Python 3.11+ or TypeScript 5+ for the SDK
- Redis for flag state caching and quality signal aggregation
Why Standard Feature Flags Are Not Enough
Standard feature flags gate features on error rates and user targeting. AI features fail differently — a model upgrade can return valid HTTP responses with structurally correct output that is subtly wrong, off-brand, or unsafe. Error rate stays at zero while quality drops. You need flags that understand quality metrics, can gate on evaluation scores, and automatically respond to quality degradation without human intervention. This is the gap between a regular feature flag and an AI feature flag.
Architecture Overview
The system extends a standard feature flag service with AI-specific evaluation hooks. When a flag is evaluated, the SDK checks both the targeting rules and a quality signal aggregator that pulls recent evaluation scores from the observability stack. A promotion controller gradually increases rollout percentages when quality metrics remain stable and triggers automatic rollback when they degrade beyond configurable thresholds.
AI Flag Configuration Schema
/** AI-aware feature flag configuration. */
export interface AIFeatureFlag {
/** Unique flag key, e.g., "chat-v2-model-upgrade" */
key: string;
/** Human-readable description */
description: string;
/** Current rollout stage */
stage: "disabled" | "canary" | "gradual" | "full" | "killed";
/** Percentage of traffic receiving the new variant (0-100) */
rolloutPercentage: number;
/** Quality gate configuration */
qualityGate: {
/** Minimum average quality score (1-5) to maintain rollout */
minScore: number;
/** Rolling window in minutes for quality evaluation */
windowMinutes: number;
/** Minimum number of evaluations before gate is active */
minSamples: number;
};
/** Automatic promotion rules */
promotion: {
/** Hours at current percentage before auto-promoting */
stabilityHours: number;
/** Percentage increment per promotion step */
stepSize: number;
/** Target percentage for full rollout */
targetPercentage: number;
};
/** Kill switch: instantly disable if quality drops below */
killSwitch: {
enabled: boolean;
/** Quality score threshold that triggers kill */
threshold: number;
/** Number of consecutive low scores before killing */
consecutiveFailures: number;
};
/** Model/prompt version this flag controls */
variant: {
modelName?: string;
promptVersion?: string;
configOverrides?: Record<string, unknown>;
};
}Quality-Gated Promotion Controller
The promotion controller runs as a background process that checks flag quality metrics on a schedule (every 15 minutes). For each flag in the 'gradual' stage, it queries the quality signal aggregator for the average evaluation score over the configured window. If the score is above the minimum threshold and the flag has been stable for the configured number of hours, it advances the rollout percentage by the step size. If the score drops below the kill switch threshold for consecutive failures, it sets the flag to 'killed' immediately.
"""Quality-gated promotion controller for AI feature flags."""
from __future__ import annotations
import logging
from datetime import datetime, timedelta, timezone
from flags.store import FlagStore
from flags.quality import QualitySignalAggregator
logger = logging.getLogger(__name__)
class PromotionController:
"""Manages automatic promotion and kill switch for AI flags."""
def __init__(self, store: FlagStore, quality: QualitySignalAggregator):
self.store = store
self.quality = quality
async def evaluate_all_flags(self):
"""Check all active flags and promote or kill as needed."""
flags = await self.store.get_active_flags()
for flag in flags:
if flag.stage in ("disabled", "full", "killed"):
continue
score = await self.quality.get_average_score(
flag_key=flag.key,
window_minutes=flag.quality_gate.window_minutes,
)
sample_count = await self.quality.get_sample_count(
flag_key=flag.key,
window_minutes=flag.quality_gate.window_minutes,
)
logger.info(
"Flag %s: score=%.2f samples=%d rollout=%d%%",
flag.key, score or 0, sample_count, flag.rollout_percentage,
)
# Not enough data yet — skip evaluation
if sample_count < flag.quality_gate.min_samples:
continue
# Kill switch check
if (
flag.kill_switch.enabled
and score is not None
and score < flag.kill_switch.threshold
):
consecutive = await self.quality.get_consecutive_failures(
flag.key, flag.kill_switch.threshold
)
if consecutive >= flag.kill_switch.consecutive_failures:
logger.warning("KILL SWITCH: flag %s killed", flag.key)
await self.store.update_flag(
flag.key, stage="killed", rollout_percentage=0
)
continue
# Quality gate check for promotion
if score is not None and score >= flag.quality_gate.min_score:
last_change = await self.store.get_last_change_time(flag.key)
hours_stable = (
datetime.now(timezone.utc) - last_change
).total_seconds() / 3600
if hours_stable >= flag.promotion.stability_hours:
new_pct = min(
flag.rollout_percentage + flag.promotion.step_size,
flag.promotion.target_percentage,
)
new_stage = (
"full" if new_pct >= flag.promotion.target_percentage
else flag.stage
)
logger.info(
"Promoting flag %s: %d%% -> %d%%",
flag.key, flag.rollout_percentage, new_pct,
)
await self.store.update_flag(
flag.key, stage=new_stage, rollout_percentage=new_pct
)Rollout Stages
- 1
Canary (1-5%)
Deploy to a small slice of traffic. Monitor quality scores closely. No automatic promotion — requires manual advancement.
- 2
Gradual (5-50%)
Automatic promotion enabled. Controller advances rollout every stabilityHours if quality gate passes.
- 3
Broad (50-95%)
Continues automatic promotion. Kill switch sensitivity increases as blast radius grows.
- 4
Full (100%)
Flag reaches target percentage. Keep the flag active for a cooldown period before cleaning up.
Start every AI feature rollout at 1% canary with no automatic promotion. Watch the quality scores for at least 24 hours with real production traffic before enabling the promotion controller. The first day of canary data is your most valuable signal.
Kill switches must be faster than promotion. The promotion controller runs every 15 minutes, but the kill switch check should also run within the SDK on every flag evaluation. If the locally cached quality score is below threshold, the SDK should return the control variant immediately without waiting for the next controller cycle.
Flag System
Quality Integration
Operations
Version History
1.0.0 · 2026-03-01
- • Initial publication with AI-aware feature flag schema
- • Quality-gated promotion controller with automatic advancement
- • Kill switch pattern for immediate rollback on quality degradation
- • Rollout stage progression from canary to full deployment