Key Takeaway

Third-party AI risk assessment must evaluate dimensions that traditional vendor assessments miss: model versioning policies, training data provenance, output quality SLAs, and the shared responsibility gap for AI-generated decisions. This framework provides a structured evaluation process, contractual requirement templates, and ongoing monitoring procedures specific to AI vendor relationships.

Prerequisites

An existing vendor management or procurement process
Inventory of current and planned third-party AI services (APIs, models, platforms)
Understanding of your organization's risk appetite and AI governance policies
Access to legal counsel for contract review with AI-specific provisions
Defined data classification scheme for data sent to third-party AI services

Why AI Vendors Are Different

Traditional vendor risk assessment evaluates security posture, uptime SLAs, data handling practices, and financial stability. These remain important for AI vendors, but they miss the risks unique to AI services. An LLM provider can change model behavior overnight through a version update, degrading the quality of your application without any change to your code. A computer vision API can produce biased outputs that create legal liability for your organization, not the vendor. A model provider can use your input data to train their models, potentially leaking your proprietary information into outputs served to other customers.

The shared responsibility model for AI is immature. When a traditional SaaS vendor's service fails, responsibility is clear: the vendor is responsible for uptime, you are responsible for how you use the output. With AI services, the line is blurred. If the vendor's model produces a discriminatory output that you serve to your users, who is liable? If the vendor's training data includes copyrighted material that appears in outputs you distribute, who bears the legal risk? These questions must be addressed contractually before integration, not after an incident.

Risk Assessment Framework

The assessment evaluates five risk domains, each scored on a five-point scale from minimal risk to critical risk. The composite score determines the approval level required: low-risk integrations can be approved by the engineering team, medium-risk integrations require architecture review, and high-risk integrations require governance committee approval.

Risk Domain	Key Questions	Red Flags	Mitigations
Model Risk	What is the model versioning policy? How much deprecation notice is provided? Are outputs deterministic for identical inputs?	No version pinning, <30 days deprecation notice, non-deterministic outputs without disclosure	Pin model versions, automated regression testing on version changes, output quality monitoring
Data Risk	How is input data handled? Is it used for training? What is the data residency? How is PII processed?	Input data used for training by default, no data residency guarantees, PII processing without BAA	Opt out of training data use, contractual data residency, PII redaction before API calls
Operational Risk	What are the uptime SLAs? What are the rate limits? Is there a status page? What is the incident communication process?	No SLA or <99.5% availability, low rate limits without burst capacity, no status page	Multi-provider failover, request queuing, cached fallback responses
Compliance Risk	What certifications does the vendor hold? Do they cooperate with audits? Do they align with applicable AI regulations?	No SOC 2 or equivalent, refuses audit cooperation, no EU AI Act readiness plan	Independent security assessment, contractual audit rights, compliance roadmap review
Strategic Risk	What is the vendor lock-in risk? How portable are integrations? Is pricing predictable?	Proprietary APIs with no standard equivalent, opaque pricing, no export capability	Abstraction layers, multi-provider architecture, contractual pricing caps

Vendor Evaluation Scorecard

The following code provides a structured vendor evaluation scorecard that teams can use during procurement. Each risk domain is scored based on documented evidence from the vendor's responses to your assessment questionnaire, publicly available documentation, and independent security reports.

vendor-risk-scorecard.ts

interface RiskScore {
  domain: string;
  score: 1 | 2 | 3 | 4 | 5; // 1=minimal, 5=critical
  evidence: string;
  mitigations: string[];
}

interface VendorAssessment {
  vendorName: string;
  assessmentDate: string;
  assessor: string;
  serviceName: string;
  riskScores: RiskScore[];
  overallRisk: "low" | "medium" | "high" | "critical";
  approvalRequired: string;
  conditions: string[];
}

function calculateOverallRisk(
  scores: RiskScore[],
): VendorAssessment["overallRisk"] {
  const avg =
    scores.reduce((sum, s) => sum + s.score, 0) / scores.length;
  const maxScore = Math.max(...scores.map((s) => s.score));

  // Any critical domain score overrides the average
  if (maxScore >= 5) return "critical";
  if (maxScore >= 4 || avg >= 3.5) return "high";
  if (avg >= 2.5) return "medium";
  return "low";
}

function getApprovalLevel(
  risk: VendorAssessment["overallRisk"],
): string {
  const approvals: Record<string, string> = {
    low: "Engineering team lead",
    medium: "Architecture review board",
    high: "AI governance committee",
    critical: "CTO + Legal + CISO joint approval",
  };
  return approvals[risk];
}

Contractual Requirements

Standard vendor contracts are insufficient for AI services. Your contracts must include AI-specific provisions covering model versioning commitments, data usage restrictions, output quality guarantees, liability allocation for AI-generated outputs, and audit rights for model behavior. The following provisions should be treated as minimum requirements for any AI vendor contract.

1
Model Version Pinning and Deprecation Notice
Contract must specify the ability to pin model versions, a minimum deprecation notice period (recommend 90 days minimum), and access to the previous model version for at least 30 days after a new version is released.
2
Data Usage and Training Restrictions
Contract must explicitly prohibit the use of your input and output data for model training unless you opt in. Data residency must be specified. Subprocessor notification and approval rights must be included.
3
Output Quality SLAs
Beyond uptime SLAs, negotiate output quality commitments: maximum hallucination rate, consistency guarantees for identical inputs, latency percentile targets (p50, p95, p99), and remediation obligations when quality degrades.
4
Liability and Indemnification
Address liability for AI-generated outputs, including discriminatory outputs, copyright-infringing outputs, and outputs that cause harm to your users. Mutual indemnification provisions should cover AI-specific scenarios.
5
Audit and Compliance Cooperation
Include rights to audit model behavior, request fairness evaluations, obtain model cards or documentation, and receive cooperation during regulatory inquiries or compliance audits.

Many AI vendors include terms-of-service provisions that grant them broad rights to use your data for model improvement. These provisions are often buried in ToS updates that take effect automatically. Ensure your contract explicitly supersedes ToS provisions on data usage, and set up monitoring for ToS changes.

Ongoing Monitoring

Vendor assessment is not a one-time event. AI vendors change their models, policies, and pricing more frequently than traditional software vendors. Establish a continuous monitoring program that tracks model version changes, output quality trends, pricing changes, and policy updates. Reassess vendors annually or whenever a material change occurs.

0/10 completed

Version History

1.0.0 · 2026-03-01

• Initial release with five-domain risk assessment framework
• Vendor evaluation scorecard with TypeScript implementation
• Contractual requirements checklist for AI vendor agreements
• Risk-tiered approval process aligned with governance structure
• Ongoing monitoring and reassessment guidance

Why AI Vendors Are Different

Risk Assessment Framework

Risk Domain	Key Questions	Red Flags	Mitigations
Model Risk	What is the model versioning policy? How much deprecation notice is provided? Are outputs deterministic for identical inputs?	No version pinning, <30 days deprecation notice, non-deterministic outputs without disclosure	Pin model versions, automated regression testing on version changes, output quality monitoring
Data Risk	How is input data handled? Is it used for training? What is the data residency? How is PII processed?	Input data used for training by default, no data residency guarantees, PII processing without BAA	Opt out of training data use, contractual data residency, PII redaction before API calls
Operational Risk	What are the uptime SLAs? What are the rate limits? Is there a status page? What is the incident communication process?	No SLA or <99.5% availability, low rate limits without burst capacity, no status page	Multi-provider failover, request queuing, cached fallback responses
Compliance Risk	What certifications does the vendor hold? Do they cooperate with audits? Do they align with applicable AI regulations?	No SOC 2 or equivalent, refuses audit cooperation, no EU AI Act readiness plan	Independent security assessment, contractual audit rights, compliance roadmap review
Strategic Risk	What is the vendor lock-in risk? How portable are integrations? Is pricing predictable?	Proprietary APIs with no standard equivalent, opaque pricing, no export capability	Abstraction layers, multi-provider architecture, contractual pricing caps

Vendor Evaluation Scorecard

vendor-risk-scorecard.ts

interface RiskScore {
  domain: string;
  score: 1 | 2 | 3 | 4 | 5; // 1=minimal, 5=critical
  evidence: string;
  mitigations: string[];
}

interface VendorAssessment {
  vendorName: string;
  assessmentDate: string;
  assessor: string;
  serviceName: string;
  riskScores: RiskScore[];
  overallRisk: "low" | "medium" | "high" | "critical";
  approvalRequired: string;
  conditions: string[];
}

function calculateOverallRisk(
  scores: RiskScore[],
): VendorAssessment["overallRisk"] {
  const avg =
    scores.reduce((sum, s) => sum + s.score, 0) / scores.length;
  const maxScore = Math.max(...scores.map((s) => s.score));

  // Any critical domain score overrides the average
  if (maxScore >= 5) return "critical";
  if (maxScore >= 4 || avg >= 3.5) return "high";
  if (avg >= 2.5) return "medium";
  return "low";
}

function getApprovalLevel(
  risk: VendorAssessment["overallRisk"],
): string {
  const approvals: Record<string, string> = {
    low: "Engineering team lead",
    medium: "Architecture review board",
    high: "AI governance committee",
    critical: "CTO + Legal + CISO joint approval",
  };
  return approvals[risk];
}

Contractual Requirements

Model Version Pinning and Deprecation Notice

Contract must specify the ability to pin model versions, a minimum deprecation notice period (recommend 90 days minimum), and access to the previous model version for at least 30 days after a new version is released.

Data Usage and Training Restrictions

Contract must explicitly prohibit the use of your input and output data for model training unless you opt in. Data residency must be specified. Subprocessor notification and approval rights must be included.

Output Quality SLAs

Beyond uptime SLAs, negotiate output quality commitments: maximum hallucination rate, consistency guarantees for identical inputs, latency percentile targets (p50, p95, p99), and remediation obligations when quality degrades.

Liability and Indemnification

Address liability for AI-generated outputs, including discriminatory outputs, copyright-infringing outputs, and outputs that cause harm to your users. Mutual indemnification provisions should cover AI-specific scenarios.

Audit and Compliance Cooperation

Include rights to audit model behavior, request fairness evaluations, obtain model cards or documentation, and receive cooperation during regulatory inquiries or compliance audits.

Ongoing Monitoring

0/10 completed

Version History

1.0.0 · 2026-03-01

• Initial release with five-domain risk assessment framework
• Vendor evaluation scorecard with TypeScript implementation
• Contractual requirements checklist for AI vendor agreements
• Risk-tiered approval process aligned with governance structure
• Ongoing monitoring and reassessment guidance

Third-Party AI Risk Assessment

Why AI Vendors Are Different

Risk Assessment Framework

Vendor Evaluation Scorecard

Contractual Requirements

Model Version Pinning and Deprecation Notice

Data Usage and Training Restrictions

Output Quality SLAs

Liability and Indemnification

Audit and Compliance Cooperation

Ongoing Monitoring

Version History

Related content

Third-Party AI Risk Assessment

Why AI Vendors Are Different

Risk Assessment Framework

Vendor Evaluation Scorecard

Contractual Requirements

Model Version Pinning and Deprecation Notice

Data Usage and Training Restrictions

Output Quality SLAs

Liability and Indemnification

Audit and Compliance Cooperation

Ongoing Monitoring

Version History

Related content