Key Takeaway

Designing audit trails as a first-class concern from the start avoids expensive retrofitting and ensures you can answer regulatory inquiries about any past AI decision. AI audit trails must capture model versions, input features, confidence scores, human overrides, and data lineage for every consequential decision.

Prerequisites

AI systems in production or nearing deployment that make decisions affecting users or business outcomes
Understanding of which AI decisions require audit trails (regulatory, contractual, or internal policy-driven)
A logging infrastructure capable of handling structured, high-volume event data
Storage capacity planning for audit trail retention (typically 3-7 years depending on regulation)
Familiarity with applicable audit requirements (EU AI Act Article 12, SOC 2, industry-specific regulations)

What AI Audit Trails Must Capture

Traditional application audit trails log access events, configuration changes, and errors. AI audit trails must go further: they must reconstruct the reasoning behind every consequential AI decision. This means capturing not just what the model output was, but what inputs it received, which model version produced the output, how confident the model was, whether a human reviewed or overrode the decision, and what the downstream impact was. The goal is to answer the question a regulator, auditor, or litigator will inevitably ask: why did your AI system make this specific decision for this specific person at this specific time?

The EU AI Act's Article 12 explicitly requires automatic logging of events during the operation of high-risk AI systems. These logs must be sufficient to enable monitoring, investigation of incidents, and post-market surveillance. The logs must include timestamps, reference datasets, input data (or references to it), and relevant operational parameters. While the Act's requirements are specific to high-risk systems operating in the EU, the logging standard it establishes is a useful baseline for any production AI system.

Decision Log Schema

The decision log captures every inference event where the AI system produces an output that influences a user experience, business decision, or automated action. The schema must be comprehensive enough for regulatory compliance but efficient enough for high-throughput inference workloads. The following schema balances these concerns by logging core fields for every decision and extended fields conditionally based on the risk tier of the system.

audit-trail-schema.ts

/**
 * AI Decision Audit Log Schema
 *
 * Every AI inference that affects users or business outcomes
 * must produce a record conforming to this schema.
 */

interface AIDecisionLog {
  // Identity
  traceId: string;          // Distributed trace ID for request correlation
  decisionId: string;       // Unique identifier for this decision
  timestamp: string;        // ISO 8601 with timezone

  // Model context
  modelId: string;          // Model identifier (e.g., "fraud-detector")
  modelVersion: string;     // Semantic version of the model
  modelChecksum: string;    // SHA-256 of model artifact
  riskTier: "low" | "medium" | "high" | "critical";

  // Input context
  inputHash: string;        // SHA-256 of raw input (for privacy)
  inputFeatures: Record<string, number | string | boolean>;
  featureStoreVersion?: string;  // Feature store snapshot version
  contextData?: Record<string, string>;  // Additional context

  // Output
  prediction: string | number | Record<string, unknown>;
  confidence: number;       // 0.0 - 1.0
  alternativePredictions?: Array<{
    prediction: string | number;
    confidence: number;
  }>;
  explanationRef?: string;  // Reference to detailed SHAP/LIME explanation

  // Human oversight
  humanReviewRequired: boolean;
  humanReviewed: boolean;
  humanOverride: boolean;
  humanReviewerId?: string;
  humanDecision?: string;
  overrideReason?: string;

  // Impact
  affectedUserId?: string;   // Hashed user identifier
  decisionCategory: string;  // "recommendation", "scoring", "classification", etc.
  consequenceLevel: "informational" | "material" | "significant";

  // Metadata
  latencyMs: number;
  tokenUsage?: { input: number; output: number };
  environment: "production" | "staging" | "shadow";
}

/**
 * Model Lifecycle Event Log
 * Tracks training, evaluation, deployment, and retirement events.
 */
interface ModelLifecycleEvent {
  eventId: string;
  eventType:
    | "training_started"
    | "training_completed"
    | "evaluation_completed"
    | "deployment_approved"
    | "deployed"
    | "rollback"
    | "deprecated"
    | "retired";
  modelId: string;
  modelVersion: string;
  timestamp: string;
  actor: string;           // Who triggered the event
  details: Record<string, unknown>;
  trainingDataRef?: string; // Reference to training data lineage
  evaluationResults?: {
    metric: string;
    value: number;
    threshold: number;
    passed: boolean;
  }[];
}

Storage Architecture

Audit trail storage must satisfy three competing requirements: high write throughput (logging every inference event), fast query capability (investigating specific decisions), and long-term retention (years of storage at reasonable cost). The standard pattern is a three-tier storage architecture: hot storage (last 30 days) in a fast query engine like Elasticsearch or ClickHouse, warm storage (30 days to 1 year) in a columnar store like Parquet on object storage, and cold storage (1-7 years) in compressed archives on object storage with a metadata index for retrieval.

Do not log raw input data containing PII in the decision log. Instead, log a hash of the input and store the mapping from hash to input in a separate, access-controlled store with encryption at rest. This limits PII exposure in the audit trail while preserving the ability to reconstruct the full decision context when needed for a specific inquiry.

Retention Policies by Regulation

Regulation	Retention Requirement	What to Retain	Access Controls
EU AI Act	Duration proportionate to intended purpose; minimum while system is on market + 10 years for high-risk	Automatic event logs, technical documentation, quality management records	Available to market surveillance authorities on request
GDPR	No longer than necessary for processing purpose; must accommodate data subject access requests	Records of processing activities, consent records, data protection impact assessments	Data Protection Officer access; data subject access on request
SOC 2 / ISO 27001	Typically 1 year minimum, varies by control objective	Access logs, change management records, incident records	Auditor access during examination periods
Financial Services (OCC/SEC)	5-7 years depending on record type and jurisdiction	All records of automated decisions affecting customers, model validation records	Regulator access on examination; internal audit access

Explainability Logging

For high-risk AI systems, the audit trail must include not just what the model decided but why. Explainability logging captures feature attributions, attention patterns, or other interpretability outputs alongside each decision. Because explainability artifacts can be large (a full SHAP explanation for a model with hundreds of features produces significant data), the practical approach is to log a reference ID in the decision log and store the full explanation in a separate store, computing explanations on-demand for low-risk decisions and pre-computing them for high-risk decisions.

audit_logger.py

"""Audit trail logger for AI decisions.

Provides a structured logging interface that captures
all required fields for regulatory compliance and
internal governance. Designed for high-throughput
inference pipelines.
"""

import json
import hashlib
import time
from typing import Dict, Any, Optional, List
from dataclasses import dataclass, asdict


@dataclass
class AuditEntry:
    """A single audit trail entry."""
    decision_id: str
    trace_id: str
    model_id: str
    model_version: str
    input_hash: str
    prediction: Any
    confidence: float
    risk_tier: str
    human_review_required: bool
    timestamp: float
    latency_ms: float
    environment: str


class AIAuditLogger:
    """Structured audit logger for AI decisions.

    Writes to a structured log sink (stdout as JSON by default).
    In production, configure to write to your log aggregation
    system (e.g., Elasticsearch, ClickHouse, BigQuery).
    """

    def __init__(
        self,
        model_id: str,
        model_version: str,
        risk_tier: str = "medium",
        environment: str = "production",
    ):
        self.model_id = model_id
        self.model_version = model_version
        self.risk_tier = risk_tier
        self.environment = environment

    def log_decision(
        self,
        trace_id: str,
        input_data: Dict[str, Any],
        prediction: Any,
        confidence: float,
        latency_ms: float,
        human_review_required: bool = False,
        explanation_ref: Optional[str] = None,
    ) -> AuditEntry:
        """Log an AI decision to the audit trail."""
        # Hash input to avoid logging raw PII
        input_hash = hashlib.sha256(
            json.dumps(input_data, sort_keys=True).encode()
        ).hexdigest()

        entry = AuditEntry(
            decision_id=hashlib.sha256(
                f"{trace_id}:{time.time_ns()}".encode()
            ).hexdigest()[:24],
            trace_id=trace_id,
            model_id=self.model_id,
            model_version=self.model_version,
            input_hash=input_hash,
            prediction=prediction,
            confidence=confidence,
            risk_tier=self.risk_tier,
            human_review_required=human_review_required,
            timestamp=time.time(),
            latency_ms=latency_ms,
            environment=self.environment,
        )

        # Emit structured log entry
        log_record = asdict(entry)
        if explanation_ref:
            log_record["explanation_ref"] = explanation_ref
        print(json.dumps(log_record))

        return entry

Query Capabilities

Audit trail data is useless if you cannot query it effectively. Your audit trail system must support five query patterns: point lookup (find the specific decision record for a given decision ID or trace ID), user history (find all decisions made about a specific user within a time range), model investigation (find all decisions made by a specific model version), anomaly investigation (find decisions with unusually low confidence or high latency), and aggregate analysis (compute decision distribution, accuracy trends, and fairness metrics over time periods).

0/10 completed

Version History

1.0.0 · 2026-03-01

• Initial release with comprehensive audit trail schema in TypeScript
• Decision log, model lifecycle event, and explainability logging specifications
• Retention policy comparison table across EU AI Act, GDPR, SOC 2, and financial services regulations
• Python audit logger implementation for production use
• Query capability requirements and readiness checklist

Key Takeaway

Prerequisites

AI systems in production or nearing deployment that make decisions affecting users or business outcomes
Understanding of which AI decisions require audit trails (regulatory, contractual, or internal policy-driven)
A logging infrastructure capable of handling structured, high-volume event data
Storage capacity planning for audit trail retention (typically 3-7 years depending on regulation)
Familiarity with applicable audit requirements (EU AI Act Article 12, SOC 2, industry-specific regulations)

What AI Audit Trails Must Capture

Decision Log Schema

audit-trail-schema.ts

/**
 * AI Decision Audit Log Schema
 *
 * Every AI inference that affects users or business outcomes
 * must produce a record conforming to this schema.
 */

interface AIDecisionLog {
  // Identity
  traceId: string;          // Distributed trace ID for request correlation
  decisionId: string;       // Unique identifier for this decision
  timestamp: string;        // ISO 8601 with timezone

  // Model context
  modelId: string;          // Model identifier (e.g., "fraud-detector")
  modelVersion: string;     // Semantic version of the model
  modelChecksum: string;    // SHA-256 of model artifact
  riskTier: "low" | "medium" | "high" | "critical";

  // Input context
  inputHash: string;        // SHA-256 of raw input (for privacy)
  inputFeatures: Record<string, number | string | boolean>;
  featureStoreVersion?: string;  // Feature store snapshot version
  contextData?: Record<string, string>;  // Additional context

  // Output
  prediction: string | number | Record<string, unknown>;
  confidence: number;       // 0.0 - 1.0
  alternativePredictions?: Array<{
    prediction: string | number;
    confidence: number;
  }>;
  explanationRef?: string;  // Reference to detailed SHAP/LIME explanation

  // Human oversight
  humanReviewRequired: boolean;
  humanReviewed: boolean;
  humanOverride: boolean;
  humanReviewerId?: string;
  humanDecision?: string;
  overrideReason?: string;

  // Impact
  affectedUserId?: string;   // Hashed user identifier
  decisionCategory: string;  // "recommendation", "scoring", "classification", etc.
  consequenceLevel: "informational" | "material" | "significant";

  // Metadata
  latencyMs: number;
  tokenUsage?: { input: number; output: number };
  environment: "production" | "staging" | "shadow";
}

/**
 * Model Lifecycle Event Log
 * Tracks training, evaluation, deployment, and retirement events.
 */
interface ModelLifecycleEvent {
  eventId: string;
  eventType:
    | "training_started"
    | "training_completed"
    | "evaluation_completed"
    | "deployment_approved"
    | "deployed"
    | "rollback"
    | "deprecated"
    | "retired";
  modelId: string;
  modelVersion: string;
  timestamp: string;
  actor: string;           // Who triggered the event
  details: Record<string, unknown>;
  trainingDataRef?: string; // Reference to training data lineage
  evaluationResults?: {
    metric: string;
    value: number;
    threshold: number;
    passed: boolean;
  }[];
}

Storage Architecture

Retention Policies by Regulation

Regulation	Retention Requirement	What to Retain	Access Controls
EU AI Act	Duration proportionate to intended purpose; minimum while system is on market + 10 years for high-risk	Automatic event logs, technical documentation, quality management records	Available to market surveillance authorities on request
GDPR	No longer than necessary for processing purpose; must accommodate data subject access requests	Records of processing activities, consent records, data protection impact assessments	Data Protection Officer access; data subject access on request
SOC 2 / ISO 27001	Typically 1 year minimum, varies by control objective	Access logs, change management records, incident records	Auditor access during examination periods
Financial Services (OCC/SEC)	5-7 years depending on record type and jurisdiction	All records of automated decisions affecting customers, model validation records	Regulator access on examination; internal audit access

Explainability Logging

audit_logger.py

"""Audit trail logger for AI decisions.

Provides a structured logging interface that captures
all required fields for regulatory compliance and
internal governance. Designed for high-throughput
inference pipelines.
"""

import json
import hashlib
import time
from typing import Dict, Any, Optional, List
from dataclasses import dataclass, asdict


@dataclass
class AuditEntry:
    """A single audit trail entry."""
    decision_id: str
    trace_id: str
    model_id: str
    model_version: str
    input_hash: str
    prediction: Any
    confidence: float
    risk_tier: str
    human_review_required: bool
    timestamp: float
    latency_ms: float
    environment: str


class AIAuditLogger:
    """Structured audit logger for AI decisions.

    Writes to a structured log sink (stdout as JSON by default).
    In production, configure to write to your log aggregation
    system (e.g., Elasticsearch, ClickHouse, BigQuery).
    """

    def __init__(
        self,
        model_id: str,
        model_version: str,
        risk_tier: str = "medium",
        environment: str = "production",
    ):
        self.model_id = model_id
        self.model_version = model_version
        self.risk_tier = risk_tier
        self.environment = environment

    def log_decision(
        self,
        trace_id: str,
        input_data: Dict[str, Any],
        prediction: Any,
        confidence: float,
        latency_ms: float,
        human_review_required: bool = False,
        explanation_ref: Optional[str] = None,
    ) -> AuditEntry:
        """Log an AI decision to the audit trail."""
        # Hash input to avoid logging raw PII
        input_hash = hashlib.sha256(
            json.dumps(input_data, sort_keys=True).encode()
        ).hexdigest()

        entry = AuditEntry(
            decision_id=hashlib.sha256(
                f"{trace_id}:{time.time_ns()}".encode()
            ).hexdigest()[:24],
            trace_id=trace_id,
            model_id=self.model_id,
            model_version=self.model_version,
            input_hash=input_hash,
            prediction=prediction,
            confidence=confidence,
            risk_tier=self.risk_tier,
            human_review_required=human_review_required,
            timestamp=time.time(),
            latency_ms=latency_ms,
            environment=self.environment,
        )

        # Emit structured log entry
        log_record = asdict(entry)
        if explanation_ref:
            log_record["explanation_ref"] = explanation_ref
        print(json.dumps(log_record))

        return entry

Query Capabilities

0/10 completed

Version History

1.0.0 · 2026-03-01

• Initial release with comprehensive audit trail schema in TypeScript
• Decision log, model lifecycle event, and explainability logging specifications
• Retention policy comparison table across EU AI Act, GDPR, SOC 2, and financial services regulations
• Python audit logger implementation for production use
• Query capability requirements and readiness checklist

AI Audit Trail Requirements

What AI Audit Trails Must Capture

Decision Log Schema

Storage Architecture

Retention Policies by Regulation

Explainability Logging

Query Capabilities

Version History

Related content

AI Audit Trail Requirements

What AI Audit Trails Must Capture

Decision Log Schema

Storage Architecture

Retention Policies by Regulation

Explainability Logging

Query Capabilities

Version History

Related content