Key Takeaway

By the end of this blueprint you will have an AI-aware feature flag system with percentage rollouts, quality-gated promotion that automatically advances rollout when evaluation scores are stable, and kill switches that instantly disable degraded AI features without a code deployment.

Prerequisites

An existing feature flag service (LaunchDarkly, Unleash, or custom) or willingness to build one
An AI observability stack producing quality scores (see AI Observability Stack blueprint)
Python 3.11+ or TypeScript 5+ for the SDK
Redis for flag state caching and quality signal aggregation

Why Standard Feature Flags Are Not Enough

Standard feature flags gate features on error rates and user targeting. AI features fail differently — a model upgrade can return valid HTTP responses with structurally correct output that is subtly wrong, off-brand, or unsafe. Error rate stays at zero while quality drops. You need flags that understand quality metrics, can gate on evaluation scores, and automatically respond to quality degradation without human intervention. This is the gap between a regular feature flag and an AI feature flag.

Architecture Overview

The system extends a standard feature flag service with AI-specific evaluation hooks. When a flag is evaluated, the SDK checks both the targeting rules and a quality signal aggregator that pulls recent evaluation scores from the observability stack. A promotion controller gradually increases rollout percentages when quality metrics remain stable and triggers automatic rollback when they degrade beyond configurable thresholds.

AI Flag Configuration Schema

lib/flags/types.ts

/** AI-aware feature flag configuration. */
export interface AIFeatureFlag {
  /** Unique flag key, e.g., "chat-v2-model-upgrade" */
  key: string;

  /** Human-readable description */
  description: string;

  /** Current rollout stage */
  stage: "disabled" | "canary" | "gradual" | "full" | "killed";

  /** Percentage of traffic receiving the new variant (0-100) */
  rolloutPercentage: number;

  /** Quality gate configuration */
  qualityGate: {
    /** Minimum average quality score (1-5) to maintain rollout */
    minScore: number;
    /** Rolling window in minutes for quality evaluation */
    windowMinutes: number;
    /** Minimum number of evaluations before gate is active */
    minSamples: number;
  };

  /** Automatic promotion rules */
  promotion: {
    /** Hours at current percentage before auto-promoting */
    stabilityHours: number;
    /** Percentage increment per promotion step */
    stepSize: number;
    /** Target percentage for full rollout */
    targetPercentage: number;
  };

  /** Kill switch: instantly disable if quality drops below */
  killSwitch: {
    enabled: boolean;
    /** Quality score threshold that triggers kill */
    threshold: number;
    /** Number of consecutive low scores before killing */
    consecutiveFailures: number;
  };

  /** Model/prompt version this flag controls */
  variant: {
    modelName?: string;
    promptVersion?: string;
    configOverrides?: Record<string, unknown>;
  };
}

Quality-Gated Promotion Controller

The promotion controller runs as a background process that checks flag quality metrics on a schedule (every 15 minutes). For each flag in the 'gradual' stage, it queries the quality signal aggregator for the average evaluation score over the configured window. If the score is above the minimum threshold and the flag has been stable for the configured number of hours, it advances the rollout percentage by the step size. If the score drops below the kill switch threshold for consecutive failures, it sets the flag to 'killed' immediately.

flags/promotion_controller.py

"""Quality-gated promotion controller for AI feature flags."""

from __future__ import annotations

import logging
from datetime import datetime, timedelta, timezone

from flags.store import FlagStore
from flags.quality import QualitySignalAggregator

logger = logging.getLogger(__name__)


class PromotionController:
    """Manages automatic promotion and kill switch for AI flags."""

    def __init__(self, store: FlagStore, quality: QualitySignalAggregator):
        self.store = store
        self.quality = quality

    async def evaluate_all_flags(self):
        """Check all active flags and promote or kill as needed."""
        flags = await self.store.get_active_flags()

        for flag in flags:
            if flag.stage in ("disabled", "full", "killed"):
                continue

            score = await self.quality.get_average_score(
                flag_key=flag.key,
                window_minutes=flag.quality_gate.window_minutes,
            )
            sample_count = await self.quality.get_sample_count(
                flag_key=flag.key,
                window_minutes=flag.quality_gate.window_minutes,
            )

            logger.info(
                "Flag %s: score=%.2f samples=%d rollout=%d%%",
                flag.key, score or 0, sample_count, flag.rollout_percentage,
            )

            # Not enough data yet — skip evaluation
            if sample_count < flag.quality_gate.min_samples:
                continue

            # Kill switch check
            if (
                flag.kill_switch.enabled
                and score is not None
                and score < flag.kill_switch.threshold
            ):
                consecutive = await self.quality.get_consecutive_failures(
                    flag.key, flag.kill_switch.threshold
                )
                if consecutive >= flag.kill_switch.consecutive_failures:
                    logger.warning("KILL SWITCH: flag %s killed", flag.key)
                    await self.store.update_flag(
                        flag.key, stage="killed", rollout_percentage=0
                    )
                    continue

            # Quality gate check for promotion
            if score is not None and score >= flag.quality_gate.min_score:
                last_change = await self.store.get_last_change_time(flag.key)
                hours_stable = (
                    datetime.now(timezone.utc) - last_change
                ).total_seconds() / 3600

                if hours_stable >= flag.promotion.stability_hours:
                    new_pct = min(
                        flag.rollout_percentage + flag.promotion.step_size,
                        flag.promotion.target_percentage,
                    )
                    new_stage = (
                        "full" if new_pct >= flag.promotion.target_percentage
                        else flag.stage
                    )
                    logger.info(
                        "Promoting flag %s: %d%% -> %d%%",
                        flag.key, flag.rollout_percentage, new_pct,
                    )
                    await self.store.update_flag(
                        flag.key, stage=new_stage, rollout_percentage=new_pct
                    )

Rollout Stages

1
Canary (1-5%)
Deploy to a small slice of traffic. Monitor quality scores closely. No automatic promotion — requires manual advancement.
2
Gradual (5-50%)
Automatic promotion enabled. Controller advances rollout every stabilityHours if quality gate passes.
3
Broad (50-95%)
Continues automatic promotion. Kill switch sensitivity increases as blast radius grows.
4
Full (100%)
Flag reaches target percentage. Keep the flag active for a cooldown period before cleaning up.

Start every AI feature rollout at 1% canary with no automatic promotion. Watch the quality scores for at least 24 hours with real production traffic before enabling the promotion controller. The first day of canary data is your most valuable signal.

Kill switches must be faster than promotion. The promotion controller runs every 15 minutes, but the kill switch check should also run within the SDK on every flag evaluation. If the locally cached quality score is below threshold, the SDK should return the control variant immediately without waiting for the next controller cycle.

Flag System

Quality Integration

Operations

Version History

1.0.0 · 2026-03-01

• Initial publication with AI-aware feature flag schema
• Quality-gated promotion controller with automatic advancement
• Kill switch pattern for immediate rollback on quality degradation
• Rollout stage progression from canary to full deployment

Why Standard Feature Flags Are Not Enough

Architecture Overview

AI Flag Configuration Schema

lib/flags/types.ts

/** AI-aware feature flag configuration. */
export interface AIFeatureFlag {
  /** Unique flag key, e.g., "chat-v2-model-upgrade" */
  key: string;

  /** Human-readable description */
  description: string;

  /** Current rollout stage */
  stage: "disabled" | "canary" | "gradual" | "full" | "killed";

  /** Percentage of traffic receiving the new variant (0-100) */
  rolloutPercentage: number;

  /** Quality gate configuration */
  qualityGate: {
    /** Minimum average quality score (1-5) to maintain rollout */
    minScore: number;
    /** Rolling window in minutes for quality evaluation */
    windowMinutes: number;
    /** Minimum number of evaluations before gate is active */
    minSamples: number;
  };

  /** Automatic promotion rules */
  promotion: {
    /** Hours at current percentage before auto-promoting */
    stabilityHours: number;
    /** Percentage increment per promotion step */
    stepSize: number;
    /** Target percentage for full rollout */
    targetPercentage: number;
  };

  /** Kill switch: instantly disable if quality drops below */
  killSwitch: {
    enabled: boolean;
    /** Quality score threshold that triggers kill */
    threshold: number;
    /** Number of consecutive low scores before killing */
    consecutiveFailures: number;
  };

  /** Model/prompt version this flag controls */
  variant: {
    modelName?: string;
    promptVersion?: string;
    configOverrides?: Record<string, unknown>;
  };
}

Quality-Gated Promotion Controller

flags/promotion_controller.py

"""Quality-gated promotion controller for AI feature flags."""

from __future__ import annotations

import logging
from datetime import datetime, timedelta, timezone

from flags.store import FlagStore
from flags.quality import QualitySignalAggregator

logger = logging.getLogger(__name__)


class PromotionController:
    """Manages automatic promotion and kill switch for AI flags."""

    def __init__(self, store: FlagStore, quality: QualitySignalAggregator):
        self.store = store
        self.quality = quality

    async def evaluate_all_flags(self):
        """Check all active flags and promote or kill as needed."""
        flags = await self.store.get_active_flags()

        for flag in flags:
            if flag.stage in ("disabled", "full", "killed"):
                continue

            score = await self.quality.get_average_score(
                flag_key=flag.key,
                window_minutes=flag.quality_gate.window_minutes,
            )
            sample_count = await self.quality.get_sample_count(
                flag_key=flag.key,
                window_minutes=flag.quality_gate.window_minutes,
            )

            logger.info(
                "Flag %s: score=%.2f samples=%d rollout=%d%%",
                flag.key, score or 0, sample_count, flag.rollout_percentage,
            )

            # Not enough data yet — skip evaluation
            if sample_count < flag.quality_gate.min_samples:
                continue

            # Kill switch check
            if (
                flag.kill_switch.enabled
                and score is not None
                and score < flag.kill_switch.threshold
            ):
                consecutive = await self.quality.get_consecutive_failures(
                    flag.key, flag.kill_switch.threshold
                )
                if consecutive >= flag.kill_switch.consecutive_failures:
                    logger.warning("KILL SWITCH: flag %s killed", flag.key)
                    await self.store.update_flag(
                        flag.key, stage="killed", rollout_percentage=0
                    )
                    continue

            # Quality gate check for promotion
            if score is not None and score >= flag.quality_gate.min_score:
                last_change = await self.store.get_last_change_time(flag.key)
                hours_stable = (
                    datetime.now(timezone.utc) - last_change
                ).total_seconds() / 3600

                if hours_stable >= flag.promotion.stability_hours:
                    new_pct = min(
                        flag.rollout_percentage + flag.promotion.step_size,
                        flag.promotion.target_percentage,
                    )
                    new_stage = (
                        "full" if new_pct >= flag.promotion.target_percentage
                        else flag.stage
                    )
                    logger.info(
                        "Promoting flag %s: %d%% -> %d%%",
                        flag.key, flag.rollout_percentage, new_pct,
                    )
                    await self.store.update_flag(
                        flag.key, stage=new_stage, rollout_percentage=new_pct
                    )

Rollout Stages

Canary (1-5%)

Deploy to a small slice of traffic. Monitor quality scores closely. No automatic promotion — requires manual advancement.

Gradual (5-50%)

Automatic promotion enabled. Controller advances rollout every stabilityHours if quality gate passes.

Broad (50-95%)

Continues automatic promotion. Kill switch sensitivity increases as blast radius grows.

Full (100%)

Flag reaches target percentage. Keep the flag active for a cooldown period before cleaning up.

Flag System

Quality Integration

Operations

Version History

1.0.0 · 2026-03-01

• Initial publication with AI-aware feature flag schema
• Quality-gated promotion controller with automatic advancement
• Kill switch pattern for immediate rollback on quality degradation
• Rollout stage progression from canary to full deployment

AI Feature Flags & Rollout

Why Standard Feature Flags Are Not Enough

Architecture Overview

AI Flag Configuration Schema

Quality-Gated Promotion Controller

Rollout Stages

Canary (1-5%)

Gradual (5-50%)

Broad (50-95%)

Full (100%)

Flag System

Quality Integration

Operations

Version History

Related content

AI Feature Flags & Rollout

Why Standard Feature Flags Are Not Enough

Architecture Overview

AI Flag Configuration Schema

Quality-Gated Promotion Controller

Rollout Stages

Canary (1-5%)

Gradual (5-50%)

Broad (50-95%)

Full (100%)

Flag System

Quality Integration

Operations

Version History

Related content