Key Takeaway
Third-party AI risk assessment must evaluate dimensions that traditional vendor assessments miss: model versioning policies, training data provenance, output quality SLAs, and the shared responsibility gap for AI-generated decisions. This framework provides a structured evaluation process, contractual requirement templates, and ongoing monitoring procedures specific to AI vendor relationships.
Prerequisites
- An existing vendor management or procurement process
- Inventory of current and planned third-party AI services (APIs, models, platforms)
- Understanding of your organization's risk appetite and AI governance policies
- Access to legal counsel for contract review with AI-specific provisions
- Defined data classification scheme for data sent to third-party AI services
Why AI Vendors Are Different
Traditional vendor risk assessment evaluates security posture, uptime SLAs, data handling practices, and financial stability. These remain important for AI vendors, but they miss the risks unique to AI services. An LLM provider can change model behavior overnight through a version update, degrading the quality of your application without any change to your code. A computer vision API can produce biased outputs that create legal liability for your organization, not the vendor. A model provider can use your input data to train their models, potentially leaking your proprietary information into outputs served to other customers.
The shared responsibility model for AI is immature. When a traditional SaaS vendor's service fails, responsibility is clear: the vendor is responsible for uptime, you are responsible for how you use the output. With AI services, the line is blurred. If the vendor's model produces a discriminatory output that you serve to your users, who is liable? If the vendor's training data includes copyrighted material that appears in outputs you distribute, who bears the legal risk? These questions must be addressed contractually before integration, not after an incident.
Risk Assessment Framework
The assessment evaluates five risk domains, each scored on a five-point scale from minimal risk to critical risk. The composite score determines the approval level required: low-risk integrations can be approved by the engineering team, medium-risk integrations require architecture review, and high-risk integrations require governance committee approval.
| Risk Domain | Key Questions | Red Flags | Mitigations |
|---|---|---|---|
| Model Risk | What is the model versioning policy? How much deprecation notice is provided? Are outputs deterministic for identical inputs? | No version pinning, <30 days deprecation notice, non-deterministic outputs without disclosure | Pin model versions, automated regression testing on version changes, output quality monitoring |
| Data Risk | How is input data handled? Is it used for training? What is the data residency? How is PII processed? | Input data used for training by default, no data residency guarantees, PII processing without BAA | Opt out of training data use, contractual data residency, PII redaction before API calls |
| Operational Risk | What are the uptime SLAs? What are the rate limits? Is there a status page? What is the incident communication process? | No SLA or <99.5% availability, low rate limits without burst capacity, no status page | Multi-provider failover, request queuing, cached fallback responses |
| Compliance Risk | What certifications does the vendor hold? Do they cooperate with audits? Do they align with applicable AI regulations? | No SOC 2 or equivalent, refuses audit cooperation, no EU AI Act readiness plan | Independent security assessment, contractual audit rights, compliance roadmap review |
| Strategic Risk | What is the vendor lock-in risk? How portable are integrations? Is pricing predictable? | Proprietary APIs with no standard equivalent, opaque pricing, no export capability | Abstraction layers, multi-provider architecture, contractual pricing caps |
Vendor Evaluation Scorecard
The following code provides a structured vendor evaluation scorecard that teams can use during procurement. Each risk domain is scored based on documented evidence from the vendor's responses to your assessment questionnaire, publicly available documentation, and independent security reports.
interface RiskScore {
domain: string;
score: 1 | 2 | 3 | 4 | 5; // 1=minimal, 5=critical
evidence: string;
mitigations: string[];
}
interface VendorAssessment {
vendorName: string;
assessmentDate: string;
assessor: string;
serviceName: string;
riskScores: RiskScore[];
overallRisk: "low" | "medium" | "high" | "critical";
approvalRequired: string;
conditions: string[];
}
function calculateOverallRisk(
scores: RiskScore[],
): VendorAssessment["overallRisk"] {
const avg =
scores.reduce((sum, s) => sum + s.score, 0) / scores.length;
const maxScore = Math.max(...scores.map((s) => s.score));
// Any critical domain score overrides the average
if (maxScore >= 5) return "critical";
if (maxScore >= 4 || avg >= 3.5) return "high";
if (avg >= 2.5) return "medium";
return "low";
}
function getApprovalLevel(
risk: VendorAssessment["overallRisk"],
): string {
const approvals: Record<string, string> = {
low: "Engineering team lead",
medium: "Architecture review board",
high: "AI governance committee",
critical: "CTO + Legal + CISO joint approval",
};
return approvals[risk];
}Contractual Requirements
Standard vendor contracts are insufficient for AI services. Your contracts must include AI-specific provisions covering model versioning commitments, data usage restrictions, output quality guarantees, liability allocation for AI-generated outputs, and audit rights for model behavior. The following provisions should be treated as minimum requirements for any AI vendor contract.
- 1
Model Version Pinning and Deprecation Notice
Contract must specify the ability to pin model versions, a minimum deprecation notice period (recommend 90 days minimum), and access to the previous model version for at least 30 days after a new version is released.
- 2
Data Usage and Training Restrictions
Contract must explicitly prohibit the use of your input and output data for model training unless you opt in. Data residency must be specified. Subprocessor notification and approval rights must be included.
- 3
Output Quality SLAs
Beyond uptime SLAs, negotiate output quality commitments: maximum hallucination rate, consistency guarantees for identical inputs, latency percentile targets (p50, p95, p99), and remediation obligations when quality degrades.
- 4
Liability and Indemnification
Address liability for AI-generated outputs, including discriminatory outputs, copyright-infringing outputs, and outputs that cause harm to your users. Mutual indemnification provisions should cover AI-specific scenarios.
- 5
Audit and Compliance Cooperation
Include rights to audit model behavior, request fairness evaluations, obtain model cards or documentation, and receive cooperation during regulatory inquiries or compliance audits.
Many AI vendors include terms-of-service provisions that grant them broad rights to use your data for model improvement. These provisions are often buried in ToS updates that take effect automatically. Ensure your contract explicitly supersedes ToS provisions on data usage, and set up monitoring for ToS changes.
Ongoing Monitoring
Vendor assessment is not a one-time event. AI vendors change their models, policies, and pricing more frequently than traditional software vendors. Establish a continuous monitoring program that tracks model version changes, output quality trends, pricing changes, and policy updates. Reassess vendors annually or whenever a material change occurs.
Version History
1.0.0 · 2026-03-01
- • Initial release with five-domain risk assessment framework
- • Vendor evaluation scorecard with TypeScript implementation
- • Contractual requirements checklist for AI vendor agreements
- • Risk-tiered approval process aligned with governance structure
- • Ongoing monitoring and reassessment guidance