Key Takeaway
Embedding strategy decisions are hard to reverse once data is indexed, so documenting the chunking approach and model choice upfront prevents costly re-indexing.
When to Use This Template
Use this ADR when designing a new RAG system, migrating to a new embedding model, or changing your document chunking strategy. Embedding decisions affect retrieval quality, storage costs, and re-indexing effort. Because changing an embedding model requires re-embedding your entire corpus, these decisions have significant downstream cost implications and should be documented before implementation begins.
ADR Template
# ADR: Embedding Strategy
## Status
[Proposed | Accepted | Deprecated | Superseded by ADR-XXX]
## Date
YYYY-MM-DD
## Decision Makers
- [Name, Role]
## Context
### Content Characteristics
- Content types: [e.g., "technical documentation, support tickets, product descriptions"]
- Languages: [e.g., "English primary, Spanish and French secondary"]
- Average document length: [e.g., "2,000-5,000 tokens"]
- Corpus size: [e.g., "50,000 documents, growing ~1,000/month"]
- Update frequency: [e.g., "documents updated weekly, new docs daily"]
### Retrieval Requirements
- Query types: [e.g., "natural language questions, keyword-heavy searches"]
- Retrieval accuracy target: [e.g., "relevant document in top-5 results 90% of the time"]
- Latency budget for embedding: [e.g., "< 200ms per query embedding"]
- Hybrid search needed: [yes/no — combining vector + keyword search]
## Embedding Model Options
| Criterion | Model A | Model B | Model C |
|-----------|---------|---------|---------|
| Dimensions | | | |
| Max input tokens | | | |
| Multilingual support | | | |
| Cost per 1M tokens | | | |
| Latency (p50) | | | |
| Matryoshka support | | | |
| Self-host option | | | |
## Chunking Strategy Options
| Strategy | Chunk Size | Overlap | Pros | Cons |
|----------|-----------|---------|------|------|
| Fixed-size | [e.g., 512 tokens] | [e.g., 50 tokens] | Simple, predictable | Breaks mid-sentence |
| Sentence-based | Variable | Per-sentence | Semantic boundaries | Uneven chunk sizes |
| Recursive/semantic | Variable | Context-aware | Best retrieval quality | Complex implementation |
| Parent-child | Small + large | Hierarchical | Precise retrieval + full context | Storage overhead |
## Decision
### Embedding Model
We will use [model] because [rationale].
### Chunking Strategy
We will use [strategy] with [chunk size] and [overlap] because [rationale].
### Refresh Policy
- Full re-index trigger: [e.g., "new embedding model adoption"]
- Incremental update: [e.g., "daily for new/modified documents"]
- Staleness tolerance: [e.g., "24 hours maximum"]
## Consequences
- Re-indexing cost: [estimated time and compute cost for full re-index]
- Storage requirement: [estimated storage for embeddings at target scale]
- Ongoing cost: [monthly embedding generation cost at projected volume]
## Review Trigger
- [ ] Retrieval accuracy drops below [threshold] on evaluation set
- [ ] New embedding model with >5% improvement on relevant benchmarks
- [ ] Corpus size exceeds [threshold] requiring storage optimization
- [ ] New language support requirementSection-by-Section Guidance
Chunking Strategy
Chunking strategy has a larger impact on retrieval quality than most teams expect. Fixed-size chunking is the simplest to implement but often produces chunks that break mid-thought, degrading retrieval quality. Recursive or semantic chunking respects content boundaries but adds implementation complexity. The parent-child approach (small chunks for retrieval, large chunks for context) offers the best of both worlds but doubles storage requirements. Start with recursive chunking as a default unless you have a strong reason to use fixed-size.
Dimensionality Trade-offs
Higher dimensions generally capture more semantic nuance but increase storage cost and query latency. If your embedding model supports Matryoshka representations (variable-dimension outputs), you can start with lower dimensions and increase later without re-embedding. This is a significant advantage when you are uncertain about your dimensionality needs. For most text retrieval use cases, 768 to 1024 dimensions provide a strong balance of quality and efficiency.
Build an evaluation set before choosing your embedding model. Create a set of 50-100 representative queries with known relevant documents, then measure retrieval accuracy (recall@5, recall@10) across candidate models with your actual data. This investment pays for itself many times over by preventing a poor model choice.
Do not skip the refresh policy section. Teams that fail to plan for embedding updates end up with stale embeddings that silently degrade retrieval quality. Document exactly when and how embeddings are refreshed, including the cost and time required for a full re-index.
Version History
1.0.0 · 2026-03-01
- • Initial ADR template for embedding strategy