Key Takeaway

Comprehensive model documentation accelerates onboarding, simplifies incident investigation, and satisfies the documentation requirements of most AI regulations including the EU AI Act. This standard defines six required documentation sections for every production model, with templates, enforcement mechanisms, and CI/CD integration patterns.

Prerequisites

At least one ML model in or nearing production deployment
Access to model training artifacts (training scripts, data references, hyperparameters)
Evaluation results from model testing (accuracy, fairness metrics, performance benchmarks)
Understanding of the model's intended use cases and target user population
A documentation platform or repository for storing and versioning model documentation

Why Model Documentation Matters

Undocumented models are ungovernable models. Without documentation, no one can answer the questions that inevitably arise: What data was this model trained on? What are its known failure modes? What happens if it degrades? Who approved it for production? What is the rollback plan? These questions come from incident responders, auditors, regulators, new team members, product managers, and executives. When documentation does not exist, the answers depend on the memory and availability of whoever built the model, which is a single point of failure for organizational knowledge.

The EU AI Act's Article 11 requires technical documentation for high-risk AI systems that must be prepared before the system is placed on the market and kept up to date. The documentation must include a general description of the system, detailed design documentation, data governance practices, testing and validation procedures, and a description of the monitoring system. This standard satisfies those requirements while also serving engineering needs that regulatory documentation alone does not address: deployment topology, cost projections, and operational runbooks.

The Six Required Sections

Every production model must have documentation covering six sections. The sections are ordered from most strategic (Model Card) to most operational (Lifecycle Plan). Sections 1-3 are primarily for governance, compliance, and product stakeholders. Sections 4-6 are primarily for engineering and operations teams. All six are required before a model can be approved for production deployment.

Section 1: Model Card

The model card is the executive summary of the model. It describes what the model does, who it is for, what its limitations are, and what ethical considerations were evaluated. It should be understandable by a non-technical stakeholder. The model card concept was introduced by Mitchell et al. at Google and has become the de facto standard for model transparency documentation.

Section 2: Datasheet

The datasheet documents everything about the training data: where it came from, how it was collected, what preprocessing was applied, what biases are known or suspected, and what consent exists for its use in AI training. The datasheet concept was introduced by Gebru et al. and directly supports GDPR compliance (records of processing activities) and EU AI Act compliance (Article 10 data governance requirements).

Section 3: Evaluation Report

The evaluation report presents the results of model testing in a structured format. It includes overall performance metrics, slice-based analysis across demographic groups and use case segments, fairness metric results, a catalog of identified failure modes, and comparison against the previous model version or baseline. The evaluation report is the primary artifact reviewed during governance approval.

model-documentation-schema.ts

/**
 * Model Documentation Standard - Schema Definition
 *
 * Every production model must have a documentation record
 * conforming to this schema. CI/CD checks enforce completeness
 * before deployment approval.
 */

interface ModelCard {
  name: string;
  version: string;
  owner: string;
  teamContact: string;
  purpose: string;
  intendedUse: string[];
  outOfScopeUse: string[];
  targetUsers: string;
  limitations: string[];
  ethicalConsiderations: string[];
  riskTier: "minimal" | "limited" | "high" | "critical";
}

interface Datasheet {
  sources: Array<{
    name: string;
    description: string;
    size: string;
    collectionMethod: string;
    dateRange: string;
    consentStatus: "explicit" | "implied" | "public-domain" | "contractual";
    knownBiases: string[];
  }>;
  preprocessing: string[];
  labelingProcess: string;
  representativeness: string;
  privacyReview: {
    piiPresent: boolean;
    piiHandling: string;
    dpiaCompleted: boolean;
    dpiaReference?: string;
  };
}

interface EvaluationReport {
  evaluationDate: string;
  evaluator: string;
  datasets: Array<{
    name: string;
    size: number;
    purpose: "validation" | "test" | "fairness";
  }>;
  overallMetrics: Array<{
    metric: string;
    value: number;
    threshold: number;
    passed: boolean;
  }>;
  sliceAnalysis: Array<{
    sliceName: string;
    sliceCriteria: string;
    metrics: Record<string, number>;
  }>;
  fairnessResults: {
    testedGroups: string[];
    demographicParity: Record<string, number>;
    equalizedOdds: Record<string, number>;
    overallAssessment: "pass" | "conditional-pass" | "fail";
  };
  failureModes: Array<{
    description: string;
    likelihood: "rare" | "occasional" | "frequent";
    severity: "low" | "medium" | "high";
    mitigation: string;
  }>;
}

interface DeploymentSpec {
  infrastructure: string;
  gpuRequirements: string;
  memoryRequirements: string;
  scalingPolicy: string;
  latencyTargets: {
    p50Ms: number;
    p95Ms: number;
    p99Ms: number;
  };
  throughputTarget: string;
  estimatedMonthlyCost: string;
  rollbackProcedure: string;
  featureFlags: string[];
}

interface MonitoringPlan {
  metrics: Array<{
    name: string;
    type: "accuracy" | "drift" | "latency" | "cost" | "fairness";
    alertThreshold: string;
    checkFrequency: string;
  }>;
  driftDetection: {
    method: string;
    referenceDataset: string;
    threshold: string;
    action: string;
  };
  reviewCadence: string;
  dashboardUrl: string;
}

interface LifecyclePlan {
  retrainingTriggers: string[];
  retrainingCadence: string;
  versioningStrategy: string;
  deprecationCriteria: string[];
  dataRetentionPolicy: string;
  sunsetProcedure: string;
}

interface ModelDocumentation {
  modelCard: ModelCard;
  datasheet: Datasheet;
  evaluationReport: EvaluationReport;
  deploymentSpec: DeploymentSpec;
  monitoringPlan: MonitoringPlan;
  lifecyclePlan: LifecyclePlan;
  lastUpdated: string;
  approvedBy: string;
  approvalDate: string;
}

Enforcement via CI/CD

Documentation standards are only effective if they are enforced. The most reliable enforcement mechanism is a CI/CD check that validates documentation completeness before allowing a model deployment to proceed. The check verifies that a documentation record exists for the model being deployed, that all required sections are populated, that evaluation metrics meet minimum thresholds, and that the documentation version matches the model version being deployed.

doc_validation_check.py

"""CI/CD documentation validation check.

Run this as a pre-deployment gate to ensure model
documentation meets the organization's standard
before a model can be deployed to production.
"""

import json
import sys
from typing import List


REQUIRED_SECTIONS = [
    "modelCard",
    "datasheet",
    "evaluationReport",
    "deploymentSpec",
    "monitoringPlan",
    "lifecyclePlan",
]

REQUIRED_MODEL_CARD_FIELDS = [
    "name", "version", "owner", "purpose",
    "intendedUse", "limitations", "riskTier",
]


def validate_documentation(doc_path: str) -> List[str]:
    """Validate model documentation completeness.

    Returns a list of validation errors. Empty list means
    the documentation passes all checks.
    """
    errors: List[str] = []

    try:
        with open(doc_path) as f:
            doc = json.load(f)
    except (FileNotFoundError, json.JSONDecodeError) as e:
        return [f"Cannot read documentation file: {e}"]

    # Check all required sections exist
    for section in REQUIRED_SECTIONS:
        if section not in doc or not doc[section]:
            errors.append(f"Missing required section: {section}")

    # Validate model card fields
    model_card = doc.get("modelCard", {})
    for field in REQUIRED_MODEL_CARD_FIELDS:
        if not model_card.get(field):
            errors.append(
                f"Model card missing required field: {field}"
            )

    # Validate evaluation metrics meet thresholds
    eval_report = doc.get("evaluationReport", {})
    for metric in eval_report.get("overallMetrics", []):
        if not metric.get("passed", True):
            errors.append(
                f"Evaluation metric '{metric['metric']}' "
                f"below threshold: {metric['value']} "
                f"< {metric['threshold']}"
            )

    # Validate fairness results
    fairness = eval_report.get("fairnessResults", {})
    if fairness.get("overallAssessment") == "fail":
        errors.append(
            "Fairness evaluation failed. Model cannot "
            "be deployed until fairness issues are resolved."
        )

    # Validate approval
    if not doc.get("approvedBy"):
        errors.append("Documentation not yet approved")

    return errors


if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python doc_validation_check.py <doc.json>")
        sys.exit(1)

    errors = validate_documentation(sys.argv[1])
    if errors:
        print(f"FAILED: {len(errors)} validation errors:")
        for error in errors:
            print(f"  - {error}")
        sys.exit(1)
    else:
        print("PASSED: Documentation meets all requirements")
        sys.exit(0)

Store model documentation alongside the model artifact in your model registry. Version the documentation with the model so that any deployed model version can be traced to the exact documentation that was approved for that version. This creates an unbreakable link between the model and its governance record.

Documentation Review Process

Model documentation should be reviewed at three points in the model lifecycle: initial review before first production deployment (full review of all six sections), update review before any model retraining or version update (focused review of changed sections plus evaluation report), and periodic review on a scheduled cadence (annual review of all sections to ensure accuracy and currency). Each review produces an approval record that is stored alongside the documentation.

0/10 completed

Version History

1.0.0 · 2026-03-01

• Initial release with six-section documentation standard
• Complete TypeScript schema definition for machine-readable model documentation
• CI/CD validation check implementation in Python
• EU AI Act Article 11 alignment throughout
• Review process and enforcement guidance

Key Takeaway

Prerequisites

At least one ML model in or nearing production deployment
Access to model training artifacts (training scripts, data references, hyperparameters)
Evaluation results from model testing (accuracy, fairness metrics, performance benchmarks)
Understanding of the model's intended use cases and target user population
A documentation platform or repository for storing and versioning model documentation

Why Model Documentation Matters

The Six Required Sections

Section 1: Model Card

Section 2: Datasheet

Section 3: Evaluation Report

model-documentation-schema.ts

/**
 * Model Documentation Standard - Schema Definition
 *
 * Every production model must have a documentation record
 * conforming to this schema. CI/CD checks enforce completeness
 * before deployment approval.
 */

interface ModelCard {
  name: string;
  version: string;
  owner: string;
  teamContact: string;
  purpose: string;
  intendedUse: string[];
  outOfScopeUse: string[];
  targetUsers: string;
  limitations: string[];
  ethicalConsiderations: string[];
  riskTier: "minimal" | "limited" | "high" | "critical";
}

interface Datasheet {
  sources: Array<{
    name: string;
    description: string;
    size: string;
    collectionMethod: string;
    dateRange: string;
    consentStatus: "explicit" | "implied" | "public-domain" | "contractual";
    knownBiases: string[];
  }>;
  preprocessing: string[];
  labelingProcess: string;
  representativeness: string;
  privacyReview: {
    piiPresent: boolean;
    piiHandling: string;
    dpiaCompleted: boolean;
    dpiaReference?: string;
  };
}

interface EvaluationReport {
  evaluationDate: string;
  evaluator: string;
  datasets: Array<{
    name: string;
    size: number;
    purpose: "validation" | "test" | "fairness";
  }>;
  overallMetrics: Array<{
    metric: string;
    value: number;
    threshold: number;
    passed: boolean;
  }>;
  sliceAnalysis: Array<{
    sliceName: string;
    sliceCriteria: string;
    metrics: Record<string, number>;
  }>;
  fairnessResults: {
    testedGroups: string[];
    demographicParity: Record<string, number>;
    equalizedOdds: Record<string, number>;
    overallAssessment: "pass" | "conditional-pass" | "fail";
  };
  failureModes: Array<{
    description: string;
    likelihood: "rare" | "occasional" | "frequent";
    severity: "low" | "medium" | "high";
    mitigation: string;
  }>;
}

interface DeploymentSpec {
  infrastructure: string;
  gpuRequirements: string;
  memoryRequirements: string;
  scalingPolicy: string;
  latencyTargets: {
    p50Ms: number;
    p95Ms: number;
    p99Ms: number;
  };
  throughputTarget: string;
  estimatedMonthlyCost: string;
  rollbackProcedure: string;
  featureFlags: string[];
}

interface MonitoringPlan {
  metrics: Array<{
    name: string;
    type: "accuracy" | "drift" | "latency" | "cost" | "fairness";
    alertThreshold: string;
    checkFrequency: string;
  }>;
  driftDetection: {
    method: string;
    referenceDataset: string;
    threshold: string;
    action: string;
  };
  reviewCadence: string;
  dashboardUrl: string;
}

interface LifecyclePlan {
  retrainingTriggers: string[];
  retrainingCadence: string;
  versioningStrategy: string;
  deprecationCriteria: string[];
  dataRetentionPolicy: string;
  sunsetProcedure: string;
}

interface ModelDocumentation {
  modelCard: ModelCard;
  datasheet: Datasheet;
  evaluationReport: EvaluationReport;
  deploymentSpec: DeploymentSpec;
  monitoringPlan: MonitoringPlan;
  lifecyclePlan: LifecyclePlan;
  lastUpdated: string;
  approvedBy: string;
  approvalDate: string;
}

Enforcement via CI/CD

doc_validation_check.py

"""CI/CD documentation validation check.

Run this as a pre-deployment gate to ensure model
documentation meets the organization's standard
before a model can be deployed to production.
"""

import json
import sys
from typing import List


REQUIRED_SECTIONS = [
    "modelCard",
    "datasheet",
    "evaluationReport",
    "deploymentSpec",
    "monitoringPlan",
    "lifecyclePlan",
]

REQUIRED_MODEL_CARD_FIELDS = [
    "name", "version", "owner", "purpose",
    "intendedUse", "limitations", "riskTier",
]


def validate_documentation(doc_path: str) -> List[str]:
    """Validate model documentation completeness.

    Returns a list of validation errors. Empty list means
    the documentation passes all checks.
    """
    errors: List[str] = []

    try:
        with open(doc_path) as f:
            doc = json.load(f)
    except (FileNotFoundError, json.JSONDecodeError) as e:
        return [f"Cannot read documentation file: {e}"]

    # Check all required sections exist
    for section in REQUIRED_SECTIONS:
        if section not in doc or not doc[section]:
            errors.append(f"Missing required section: {section}")

    # Validate model card fields
    model_card = doc.get("modelCard", {})
    for field in REQUIRED_MODEL_CARD_FIELDS:
        if not model_card.get(field):
            errors.append(
                f"Model card missing required field: {field}"
            )

    # Validate evaluation metrics meet thresholds
    eval_report = doc.get("evaluationReport", {})
    for metric in eval_report.get("overallMetrics", []):
        if not metric.get("passed", True):
            errors.append(
                f"Evaluation metric '{metric['metric']}' "
                f"below threshold: {metric['value']} "
                f"< {metric['threshold']}"
            )

    # Validate fairness results
    fairness = eval_report.get("fairnessResults", {})
    if fairness.get("overallAssessment") == "fail":
        errors.append(
            "Fairness evaluation failed. Model cannot "
            "be deployed until fairness issues are resolved."
        )

    # Validate approval
    if not doc.get("approvedBy"):
        errors.append("Documentation not yet approved")

    return errors


if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python doc_validation_check.py <doc.json>")
        sys.exit(1)

    errors = validate_documentation(sys.argv[1])
    if errors:
        print(f"FAILED: {len(errors)} validation errors:")
        for error in errors:
            print(f"  - {error}")
        sys.exit(1)
    else:
        print("PASSED: Documentation meets all requirements")
        sys.exit(0)

Documentation Review Process

0/10 completed

Version History

1.0.0 · 2026-03-01

• Initial release with six-section documentation standard
• Complete TypeScript schema definition for machine-readable model documentation
• CI/CD validation check implementation in Python
• EU AI Act Article 11 alignment throughout
• Review process and enforcement guidance

AI Model Documentation Standard

Why Model Documentation Matters

The Six Required Sections

Section 1: Model Card

Section 2: Datasheet

Section 3: Evaluation Report

Enforcement via CI/CD

Documentation Review Process

Version History

Related content

AI Model Documentation Standard

Why Model Documentation Matters

The Six Required Sections

Section 1: Model Card

Section 2: Datasheet

Section 3: Evaluation Report

Enforcement via CI/CD

Documentation Review Process

Version History

Related content