Key Takeaway
Designing audit trails as a first-class concern from the start avoids expensive retrofitting and ensures you can answer regulatory inquiries about any past AI decision. AI audit trails must capture model versions, input features, confidence scores, human overrides, and data lineage for every consequential decision.
Prerequisites
- AI systems in production or nearing deployment that make decisions affecting users or business outcomes
- Understanding of which AI decisions require audit trails (regulatory, contractual, or internal policy-driven)
- A logging infrastructure capable of handling structured, high-volume event data
- Storage capacity planning for audit trail retention (typically 3-7 years depending on regulation)
- Familiarity with applicable audit requirements (EU AI Act Article 12, SOC 2, industry-specific regulations)
What AI Audit Trails Must Capture
Traditional application audit trails log access events, configuration changes, and errors. AI audit trails must go further: they must reconstruct the reasoning behind every consequential AI decision. This means capturing not just what the model output was, but what inputs it received, which model version produced the output, how confident the model was, whether a human reviewed or overrode the decision, and what the downstream impact was. The goal is to answer the question a regulator, auditor, or litigator will inevitably ask: why did your AI system make this specific decision for this specific person at this specific time?
The EU AI Act's Article 12 explicitly requires automatic logging of events during the operation of high-risk AI systems. These logs must be sufficient to enable monitoring, investigation of incidents, and post-market surveillance. The logs must include timestamps, reference datasets, input data (or references to it), and relevant operational parameters. While the Act's requirements are specific to high-risk systems operating in the EU, the logging standard it establishes is a useful baseline for any production AI system.
Decision Log Schema
The decision log captures every inference event where the AI system produces an output that influences a user experience, business decision, or automated action. The schema must be comprehensive enough for regulatory compliance but efficient enough for high-throughput inference workloads. The following schema balances these concerns by logging core fields for every decision and extended fields conditionally based on the risk tier of the system.
/**
* AI Decision Audit Log Schema
*
* Every AI inference that affects users or business outcomes
* must produce a record conforming to this schema.
*/
interface AIDecisionLog {
// Identity
traceId: string; // Distributed trace ID for request correlation
decisionId: string; // Unique identifier for this decision
timestamp: string; // ISO 8601 with timezone
// Model context
modelId: string; // Model identifier (e.g., "fraud-detector")
modelVersion: string; // Semantic version of the model
modelChecksum: string; // SHA-256 of model artifact
riskTier: "low" | "medium" | "high" | "critical";
// Input context
inputHash: string; // SHA-256 of raw input (for privacy)
inputFeatures: Record<string, number | string | boolean>;
featureStoreVersion?: string; // Feature store snapshot version
contextData?: Record<string, string>; // Additional context
// Output
prediction: string | number | Record<string, unknown>;
confidence: number; // 0.0 - 1.0
alternativePredictions?: Array<{
prediction: string | number;
confidence: number;
}>;
explanationRef?: string; // Reference to detailed SHAP/LIME explanation
// Human oversight
humanReviewRequired: boolean;
humanReviewed: boolean;
humanOverride: boolean;
humanReviewerId?: string;
humanDecision?: string;
overrideReason?: string;
// Impact
affectedUserId?: string; // Hashed user identifier
decisionCategory: string; // "recommendation", "scoring", "classification", etc.
consequenceLevel: "informational" | "material" | "significant";
// Metadata
latencyMs: number;
tokenUsage?: { input: number; output: number };
environment: "production" | "staging" | "shadow";
}
/**
* Model Lifecycle Event Log
* Tracks training, evaluation, deployment, and retirement events.
*/
interface ModelLifecycleEvent {
eventId: string;
eventType:
| "training_started"
| "training_completed"
| "evaluation_completed"
| "deployment_approved"
| "deployed"
| "rollback"
| "deprecated"
| "retired";
modelId: string;
modelVersion: string;
timestamp: string;
actor: string; // Who triggered the event
details: Record<string, unknown>;
trainingDataRef?: string; // Reference to training data lineage
evaluationResults?: {
metric: string;
value: number;
threshold: number;
passed: boolean;
}[];
}Storage Architecture
Audit trail storage must satisfy three competing requirements: high write throughput (logging every inference event), fast query capability (investigating specific decisions), and long-term retention (years of storage at reasonable cost). The standard pattern is a three-tier storage architecture: hot storage (last 30 days) in a fast query engine like Elasticsearch or ClickHouse, warm storage (30 days to 1 year) in a columnar store like Parquet on object storage, and cold storage (1-7 years) in compressed archives on object storage with a metadata index for retrieval.
Do not log raw input data containing PII in the decision log. Instead, log a hash of the input and store the mapping from hash to input in a separate, access-controlled store with encryption at rest. This limits PII exposure in the audit trail while preserving the ability to reconstruct the full decision context when needed for a specific inquiry.
Retention Policies by Regulation
| Regulation | Retention Requirement | What to Retain | Access Controls |
|---|---|---|---|
| EU AI Act | Duration proportionate to intended purpose; minimum while system is on market + 10 years for high-risk | Automatic event logs, technical documentation, quality management records | Available to market surveillance authorities on request |
| GDPR | No longer than necessary for processing purpose; must accommodate data subject access requests | Records of processing activities, consent records, data protection impact assessments | Data Protection Officer access; data subject access on request |
| SOC 2 / ISO 27001 | Typically 1 year minimum, varies by control objective | Access logs, change management records, incident records | Auditor access during examination periods |
| Financial Services (OCC/SEC) | 5-7 years depending on record type and jurisdiction | All records of automated decisions affecting customers, model validation records | Regulator access on examination; internal audit access |
Explainability Logging
For high-risk AI systems, the audit trail must include not just what the model decided but why. Explainability logging captures feature attributions, attention patterns, or other interpretability outputs alongside each decision. Because explainability artifacts can be large (a full SHAP explanation for a model with hundreds of features produces significant data), the practical approach is to log a reference ID in the decision log and store the full explanation in a separate store, computing explanations on-demand for low-risk decisions and pre-computing them for high-risk decisions.
"""Audit trail logger for AI decisions.
Provides a structured logging interface that captures
all required fields for regulatory compliance and
internal governance. Designed for high-throughput
inference pipelines.
"""
import json
import hashlib
import time
from typing import Dict, Any, Optional, List
from dataclasses import dataclass, asdict
@dataclass
class AuditEntry:
"""A single audit trail entry."""
decision_id: str
trace_id: str
model_id: str
model_version: str
input_hash: str
prediction: Any
confidence: float
risk_tier: str
human_review_required: bool
timestamp: float
latency_ms: float
environment: str
class AIAuditLogger:
"""Structured audit logger for AI decisions.
Writes to a structured log sink (stdout as JSON by default).
In production, configure to write to your log aggregation
system (e.g., Elasticsearch, ClickHouse, BigQuery).
"""
def __init__(
self,
model_id: str,
model_version: str,
risk_tier: str = "medium",
environment: str = "production",
):
self.model_id = model_id
self.model_version = model_version
self.risk_tier = risk_tier
self.environment = environment
def log_decision(
self,
trace_id: str,
input_data: Dict[str, Any],
prediction: Any,
confidence: float,
latency_ms: float,
human_review_required: bool = False,
explanation_ref: Optional[str] = None,
) -> AuditEntry:
"""Log an AI decision to the audit trail."""
# Hash input to avoid logging raw PII
input_hash = hashlib.sha256(
json.dumps(input_data, sort_keys=True).encode()
).hexdigest()
entry = AuditEntry(
decision_id=hashlib.sha256(
f"{trace_id}:{time.time_ns()}".encode()
).hexdigest()[:24],
trace_id=trace_id,
model_id=self.model_id,
model_version=self.model_version,
input_hash=input_hash,
prediction=prediction,
confidence=confidence,
risk_tier=self.risk_tier,
human_review_required=human_review_required,
timestamp=time.time(),
latency_ms=latency_ms,
environment=self.environment,
)
# Emit structured log entry
log_record = asdict(entry)
if explanation_ref:
log_record["explanation_ref"] = explanation_ref
print(json.dumps(log_record))
return entryQuery Capabilities
Audit trail data is useless if you cannot query it effectively. Your audit trail system must support five query patterns: point lookup (find the specific decision record for a given decision ID or trace ID), user history (find all decisions made about a specific user within a time range), model investigation (find all decisions made by a specific model version), anomaly investigation (find decisions with unusually low confidence or high latency), and aggregate analysis (compute decision distribution, accuracy trends, and fairness metrics over time periods).
Version History
1.0.0 · 2026-03-01
- • Initial release with comprehensive audit trail schema in TypeScript
- • Decision log, model lifecycle event, and explainability logging specifications
- • Retention policy comparison table across EU AI Act, GDPR, SOC 2, and financial services regulations
- • Python audit logger implementation for production use
- • Query capability requirements and readiness checklist