mirror of
https://github.com/ksyasuda/dotfiles.git
synced 2026-03-20 06:11:27 -07:00
134 lines
4.2 KiB
Markdown
134 lines
4.2 KiB
Markdown
# Cloudflare Vectorize
|
|
|
|
Globally distributed vector database for AI applications. Store and query vector embeddings for semantic search, recommendations, RAG, and classification.
|
|
|
|
**Status:** Generally Available (GA) | **Last Updated:** 2026-01-27
|
|
|
|
## Quick Start
|
|
|
|
```typescript
|
|
// 1. Create index
|
|
// npx wrangler vectorize create my-index --dimensions=768 --metric=cosine
|
|
|
|
// 2. Configure binding (wrangler.jsonc)
|
|
// { "vectorize": [{ "binding": "VECTORIZE", "index_name": "my-index" }] }
|
|
|
|
// 3. Query vectors
|
|
const matches = await env.VECTORIZE.query(queryVector, { topK: 5 });
|
|
```
|
|
|
|
## Key Features
|
|
|
|
- **10M vectors per index** (V2)
|
|
- Dimensions up to 1536 (32-bit float)
|
|
- Three distance metrics: cosine, euclidean, dot-product
|
|
- Metadata filtering (up to 10 indexes)
|
|
- Namespace support (50K namespaces paid, 1K free)
|
|
- Seamless Workers AI integration
|
|
- Global distribution
|
|
|
|
## Reading Order
|
|
|
|
| Task | Files to Read |
|
|
|------|---------------|
|
|
| New to Vectorize | README only |
|
|
| Implement feature | README + api + patterns |
|
|
| Setup/configure | README + configuration |
|
|
| Debug issues | gotchas |
|
|
| Integrate with AI | README + patterns |
|
|
| RAG implementation | README + patterns |
|
|
|
|
## File Guide
|
|
|
|
- **README.md** (this file): Overview, quick decisions
|
|
- **api.md**: Runtime API, types, operations (query/insert/upsert)
|
|
- **configuration.md**: Setup, CLI, metadata indexes
|
|
- **patterns.md**: RAG, Workers AI, OpenAI, LangChain, multi-tenant
|
|
- **gotchas.md**: Limits, pitfalls, troubleshooting
|
|
|
|
## Distance Metric Selection
|
|
|
|
Choose based on your use case:
|
|
|
|
```
|
|
What are you building?
|
|
├─ Text/semantic search → cosine (most common)
|
|
├─ Image similarity → euclidean
|
|
├─ Recommendation system → dot-product
|
|
└─ Pre-normalized vectors → dot-product
|
|
```
|
|
|
|
| Metric | Best For | Score Interpretation |
|
|
|--------|----------|---------------------|
|
|
| `cosine` | Text embeddings, semantic similarity | Higher = closer (1.0 = identical) |
|
|
| `euclidean` | Absolute distance, spatial data | Lower = closer (0.0 = identical) |
|
|
| `dot-product` | Recommendations, normalized vectors | Higher = closer |
|
|
|
|
**Note:** Index configuration is immutable. Cannot change dimensions or metric after creation.
|
|
|
|
## Multi-Tenancy Strategy
|
|
|
|
```
|
|
How many tenants?
|
|
├─ < 50K tenants → Use namespaces (recommended)
|
|
│ ├─ Fastest (filter before vector search)
|
|
│ └─ Strict isolation
|
|
├─ > 50K tenants → Use metadata filtering
|
|
│ ├─ Slower (post-filter after vector search)
|
|
│ └─ Requires metadata index
|
|
└─ Per-tenant indexes → Only if compliance mandated
|
|
└─ 50K index limit per account (paid plan)
|
|
```
|
|
|
|
## Common Workflows
|
|
|
|
### Semantic Search
|
|
|
|
```typescript
|
|
// 1. Generate embedding
|
|
const result = await env.AI.run("@cf/baai/bge-base-en-v1.5", { text: [query] });
|
|
|
|
// 2. Query Vectorize
|
|
const matches = await env.VECTORIZE.query(result.data[0], {
|
|
topK: 5,
|
|
returnMetadata: "indexed"
|
|
});
|
|
```
|
|
|
|
### RAG Pattern
|
|
|
|
```typescript
|
|
// 1. Generate query embedding
|
|
const embedding = await env.AI.run("@cf/baai/bge-base-en-v1.5", { text: [query] });
|
|
|
|
// 2. Search Vectorize
|
|
const matches = await env.VECTORIZE.query(embedding.data[0], { topK: 5 });
|
|
|
|
// 3. Fetch full documents from R2/D1/KV
|
|
const docs = await Promise.all(matches.matches.map(m =>
|
|
env.R2.get(m.metadata.key).then(obj => obj?.text())
|
|
));
|
|
|
|
// 4. Generate LLM response with context
|
|
const answer = await env.AI.run("@cf/meta/llama-3-8b-instruct", {
|
|
prompt: `Context: ${docs.join("\n\n")}\n\nQuestion: ${query}\n\nAnswer:`
|
|
});
|
|
```
|
|
|
|
## Critical Gotchas
|
|
|
|
See `gotchas.md` for details. Most important:
|
|
|
|
1. **Async mutations**: Inserts take 5-10s to be queryable
|
|
2. **500 batch limit**: Workers API enforces 500 vectors per call (undocumented)
|
|
3. **Metadata truncation**: `"indexed"` returns first 64 bytes only
|
|
4. **topK with metadata**: Max 20 (not 100) when using returnValues or returnMetadata: "all"
|
|
5. **Metadata indexes first**: Must create before inserting vectors
|
|
|
|
## Resources
|
|
|
|
- [Official Docs](https://developers.cloudflare.com/vectorize/)
|
|
- [Client API Reference](https://developers.cloudflare.com/vectorize/reference/client-api/)
|
|
- [Workers AI Models](https://developers.cloudflare.com/workers-ai/models/#text-embeddings)
|
|
- [Discord: #vectorize](https://discord.cloudflare.com)
|