Key Takeaway
Comprehensive model documentation accelerates onboarding, simplifies incident investigation, and satisfies the documentation requirements of most AI regulations including the EU AI Act. This standard defines six required documentation sections for every production model, with templates, enforcement mechanisms, and CI/CD integration patterns.
Prerequisites
- At least one ML model in or nearing production deployment
- Access to model training artifacts (training scripts, data references, hyperparameters)
- Evaluation results from model testing (accuracy, fairness metrics, performance benchmarks)
- Understanding of the model's intended use cases and target user population
- A documentation platform or repository for storing and versioning model documentation
Why Model Documentation Matters
Undocumented models are ungovernable models. Without documentation, no one can answer the questions that inevitably arise: What data was this model trained on? What are its known failure modes? What happens if it degrades? Who approved it for production? What is the rollback plan? These questions come from incident responders, auditors, regulators, new team members, product managers, and executives. When documentation does not exist, the answers depend on the memory and availability of whoever built the model, which is a single point of failure for organizational knowledge.
The EU AI Act's Article 11 requires technical documentation for high-risk AI systems that must be prepared before the system is placed on the market and kept up to date. The documentation must include a general description of the system, detailed design documentation, data governance practices, testing and validation procedures, and a description of the monitoring system. This standard satisfies those requirements while also serving engineering needs that regulatory documentation alone does not address: deployment topology, cost projections, and operational runbooks.
The Six Required Sections
Every production model must have documentation covering six sections. The sections are ordered from most strategic (Model Card) to most operational (Lifecycle Plan). Sections 1-3 are primarily for governance, compliance, and product stakeholders. Sections 4-6 are primarily for engineering and operations teams. All six are required before a model can be approved for production deployment.
Section 1: Model Card
The model card is the executive summary of the model. It describes what the model does, who it is for, what its limitations are, and what ethical considerations were evaluated. It should be understandable by a non-technical stakeholder. The model card concept was introduced by Mitchell et al. at Google and has become the de facto standard for model transparency documentation.
Section 2: Datasheet
The datasheet documents everything about the training data: where it came from, how it was collected, what preprocessing was applied, what biases are known or suspected, and what consent exists for its use in AI training. The datasheet concept was introduced by Gebru et al. and directly supports GDPR compliance (records of processing activities) and EU AI Act compliance (Article 10 data governance requirements).
Section 3: Evaluation Report
The evaluation report presents the results of model testing in a structured format. It includes overall performance metrics, slice-based analysis across demographic groups and use case segments, fairness metric results, a catalog of identified failure modes, and comparison against the previous model version or baseline. The evaluation report is the primary artifact reviewed during governance approval.
/**
* Model Documentation Standard - Schema Definition
*
* Every production model must have a documentation record
* conforming to this schema. CI/CD checks enforce completeness
* before deployment approval.
*/
interface ModelCard {
name: string;
version: string;
owner: string;
teamContact: string;
purpose: string;
intendedUse: string[];
outOfScopeUse: string[];
targetUsers: string;
limitations: string[];
ethicalConsiderations: string[];
riskTier: "minimal" | "limited" | "high" | "critical";
}
interface Datasheet {
sources: Array<{
name: string;
description: string;
size: string;
collectionMethod: string;
dateRange: string;
consentStatus: "explicit" | "implied" | "public-domain" | "contractual";
knownBiases: string[];
}>;
preprocessing: string[];
labelingProcess: string;
representativeness: string;
privacyReview: {
piiPresent: boolean;
piiHandling: string;
dpiaCompleted: boolean;
dpiaReference?: string;
};
}
interface EvaluationReport {
evaluationDate: string;
evaluator: string;
datasets: Array<{
name: string;
size: number;
purpose: "validation" | "test" | "fairness";
}>;
overallMetrics: Array<{
metric: string;
value: number;
threshold: number;
passed: boolean;
}>;
sliceAnalysis: Array<{
sliceName: string;
sliceCriteria: string;
metrics: Record<string, number>;
}>;
fairnessResults: {
testedGroups: string[];
demographicParity: Record<string, number>;
equalizedOdds: Record<string, number>;
overallAssessment: "pass" | "conditional-pass" | "fail";
};
failureModes: Array<{
description: string;
likelihood: "rare" | "occasional" | "frequent";
severity: "low" | "medium" | "high";
mitigation: string;
}>;
}
interface DeploymentSpec {
infrastructure: string;
gpuRequirements: string;
memoryRequirements: string;
scalingPolicy: string;
latencyTargets: {
p50Ms: number;
p95Ms: number;
p99Ms: number;
};
throughputTarget: string;
estimatedMonthlyCost: string;
rollbackProcedure: string;
featureFlags: string[];
}
interface MonitoringPlan {
metrics: Array<{
name: string;
type: "accuracy" | "drift" | "latency" | "cost" | "fairness";
alertThreshold: string;
checkFrequency: string;
}>;
driftDetection: {
method: string;
referenceDataset: string;
threshold: string;
action: string;
};
reviewCadence: string;
dashboardUrl: string;
}
interface LifecyclePlan {
retrainingTriggers: string[];
retrainingCadence: string;
versioningStrategy: string;
deprecationCriteria: string[];
dataRetentionPolicy: string;
sunsetProcedure: string;
}
interface ModelDocumentation {
modelCard: ModelCard;
datasheet: Datasheet;
evaluationReport: EvaluationReport;
deploymentSpec: DeploymentSpec;
monitoringPlan: MonitoringPlan;
lifecyclePlan: LifecyclePlan;
lastUpdated: string;
approvedBy: string;
approvalDate: string;
}Enforcement via CI/CD
Documentation standards are only effective if they are enforced. The most reliable enforcement mechanism is a CI/CD check that validates documentation completeness before allowing a model deployment to proceed. The check verifies that a documentation record exists for the model being deployed, that all required sections are populated, that evaluation metrics meet minimum thresholds, and that the documentation version matches the model version being deployed.
"""CI/CD documentation validation check.
Run this as a pre-deployment gate to ensure model
documentation meets the organization's standard
before a model can be deployed to production.
"""
import json
import sys
from typing import List
REQUIRED_SECTIONS = [
"modelCard",
"datasheet",
"evaluationReport",
"deploymentSpec",
"monitoringPlan",
"lifecyclePlan",
]
REQUIRED_MODEL_CARD_FIELDS = [
"name", "version", "owner", "purpose",
"intendedUse", "limitations", "riskTier",
]
def validate_documentation(doc_path: str) -> List[str]:
"""Validate model documentation completeness.
Returns a list of validation errors. Empty list means
the documentation passes all checks.
"""
errors: List[str] = []
try:
with open(doc_path) as f:
doc = json.load(f)
except (FileNotFoundError, json.JSONDecodeError) as e:
return [f"Cannot read documentation file: {e}"]
# Check all required sections exist
for section in REQUIRED_SECTIONS:
if section not in doc or not doc[section]:
errors.append(f"Missing required section: {section}")
# Validate model card fields
model_card = doc.get("modelCard", {})
for field in REQUIRED_MODEL_CARD_FIELDS:
if not model_card.get(field):
errors.append(
f"Model card missing required field: {field}"
)
# Validate evaluation metrics meet thresholds
eval_report = doc.get("evaluationReport", {})
for metric in eval_report.get("overallMetrics", []):
if not metric.get("passed", True):
errors.append(
f"Evaluation metric '{metric['metric']}' "
f"below threshold: {metric['value']} "
f"< {metric['threshold']}"
)
# Validate fairness results
fairness = eval_report.get("fairnessResults", {})
if fairness.get("overallAssessment") == "fail":
errors.append(
"Fairness evaluation failed. Model cannot "
"be deployed until fairness issues are resolved."
)
# Validate approval
if not doc.get("approvedBy"):
errors.append("Documentation not yet approved")
return errors
if __name__ == "__main__":
if len(sys.argv) < 2:
print("Usage: python doc_validation_check.py <doc.json>")
sys.exit(1)
errors = validate_documentation(sys.argv[1])
if errors:
print(f"FAILED: {len(errors)} validation errors:")
for error in errors:
print(f" - {error}")
sys.exit(1)
else:
print("PASSED: Documentation meets all requirements")
sys.exit(0)Store model documentation alongside the model artifact in your model registry. Version the documentation with the model so that any deployed model version can be traced to the exact documentation that was approved for that version. This creates an unbreakable link between the model and its governance record.
Documentation Review Process
Model documentation should be reviewed at three points in the model lifecycle: initial review before first production deployment (full review of all six sections), update review before any model retraining or version update (focused review of changed sections plus evaluation report), and periodic review on a scheduled cadence (annual review of all sections to ensure accuracy and currency). Each review produces an approval record that is stored alongside the documentation.
Version History
1.0.0 · 2026-03-01
- • Initial release with six-section documentation standard
- • Complete TypeScript schema definition for machine-readable model documentation
- • CI/CD validation check implementation in Python
- • EU AI Act Article 11 alignment throughout
- • Review process and enforcement guidance