update skills

This commit is contained in:
2026-03-17 16:53:22 -07:00
parent 0b0783ef8e
commit f9a530667e
389 changed files with 54512 additions and 1 deletions

View File

@@ -0,0 +1,197 @@
# Cloudflare Workers AI
Expert guidance for Cloudflare Workers AI - serverless GPU-powered AI inference at the edge.
## Overview
Workers AI provides:
- 50+ pre-trained models (LLMs, embeddings, image generation, speech-to-text, translation)
- Native Workers binding (no external API calls)
- Pay-per-use pricing (neurons consumed per inference)
- OpenAI-compatible REST API
- Streaming support for text generation
- Function calling with compatible models
**Architecture**: Inference runs on Cloudflare's GPU network. Models load on first request (cold start 1-3s), subsequent requests are faster.
## Quick Start
```typescript
interface Env {
AI: Ai;
}
export default {
async fetch(request: Request, env: Env) {
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [{ role: 'user', content: 'What is Cloudflare?' }]
});
return Response.json(response);
}
};
```
```bash
# Setup - add binding to wrangler.jsonc
wrangler dev --remote # Must use --remote for AI
wrangler deploy
```
## Model Selection Decision Tree
### Text Generation (Chat/Completion)
**Quality Priority**:
- **Best quality**: `@cf/meta/llama-3.1-70b-instruct` (expensive, ~2000 neurons)
- **Balanced**: `@cf/meta/llama-3.1-8b-instruct` (good quality, ~200 neurons)
- **Fastest/cheapest**: `@cf/mistral/mistral-7b-instruct-v0.1` (~50 neurons)
**Function Calling**:
- Use `@cf/meta/llama-3.1-8b-instruct` or `@cf/meta/llama-3.1-70b-instruct` (native tool support)
**Code Generation**:
- Use `@cf/deepseek-ai/deepseek-coder-6.7b-instruct` (specialized for code)
### Embeddings (Semantic Search/RAG)
**English text**:
- **Best**: `@cf/baai/bge-large-en-v1.5` (1024 dims, highest quality)
- **Balanced**: `@cf/baai/bge-base-en-v1.5` (768 dims, good quality)
- **Fast**: `@cf/baai/bge-small-en-v1.5` (384 dims, lower quality but fast)
**Multilingual**:
- Use `@hf/sentence-transformers/paraphrase-multilingual-minilm-l12-v2`
### Image Generation
- **Stable Diffusion**: `@cf/stabilityai/stable-diffusion-xl-base-1.0` (~10,000 neurons)
- **Portraits**: `@cf/lykon/dreamshaper-8-lcm` (optimized for faces)
### Other Tasks
- **Speech-to-text**: `@cf/openai/whisper`
- **Translation**: `@cf/meta/m2m100-1.2b` (100 languages)
- **Image classification**: `@cf/microsoft/resnet-50`
## SDK Approach Decision Tree
### Native Binding (Recommended)
**When**: Building Workers/Pages with TypeScript
**Why**: Zero external dependencies, best performance, native types
```typescript
await env.AI.run(model, input);
```
### REST API
**When**: External services, non-Workers environments, testing
**Why**: Standard HTTP, works anywhere
```bash
curl https://api.cloudflare.com/client/v4/accounts/<ACCOUNT_ID>/ai/run/@cf/meta/llama-3.1-8b-instruct \
-H "Authorization: Bearer <API_TOKEN>" \
-d '{"messages":[{"role":"user","content":"Hello"}]}'
```
### Vercel AI SDK Integration
**When**: Using Vercel AI SDK features (streaming UI, tool calling abstractions)
**Why**: Unified interface across providers
```typescript
import { openai } from '@ai-sdk/openai';
const model = openai('model-name', {
baseURL: 'https://api.cloudflare.com/client/v4/accounts/<ACCOUNT_ID>/ai/v1',
headers: { Authorization: 'Bearer <API_TOKEN>' }
});
```
## RAG vs Direct Generation
### Use RAG (Vectorize + Workers AI) When:
- Answering questions about specific documents/data
- Need factual accuracy from known corpus
- Context exceeds model's window (>4K tokens)
- Building knowledge base chat
### Use Direct Generation When:
- Creative writing, brainstorming
- General knowledge questions
- Small context fits in prompt (<4K tokens)
- Cost optimization (RAG adds embedding + vector search costs)
## Platform Limits
| Limit | Free Tier | Paid Plans |
|-------|-----------|------------|
| Neurons/day | 10,000 | Pay per use |
| Rate limit | Varies by model | Higher (contact support) |
| Context window | Model dependent (2K-8K) | Same |
| Streaming | ✅ Supported | ✅ Supported |
| Function calling | ✅ Supported (select models) | ✅ Supported |
**Pricing**: Free 10K neurons/day, then pay per neuron consumed (varies by model)
## Common Tasks
```typescript
// Streaming text generation
const stream = await env.AI.run(model, { messages, stream: true });
for await (const chunk of stream) {
console.log(chunk.response);
}
// Embeddings for RAG
const { data } = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
text: ['Query text', 'Document 1', 'Document 2']
});
// Function calling
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [{ role: 'user', content: 'What is the weather?' }],
tools: [{
type: 'function',
function: { name: 'getWeather', parameters: { ... } }
}]
});
```
## Development Workflow
```bash
# Always use --remote for AI (local doesn't have models)
wrangler dev --remote
# Deploy to production
wrangler deploy
# View model catalog
# https://developers.cloudflare.com/workers-ai/models/
```
## Reading Order
**Start here**: Quick Start above → configuration.md (setup)
**Common tasks**:
- First time setup: configuration.md → Add binding + deploy
- Choose model: Model Selection Decision Tree (above) → api.md
- Build RAG: patterns.md → Vectorize integration
- Optimize costs: Model Selection + gotchas.md (rate limits)
- Debugging: gotchas.md → Common errors
## In This Reference
- [configuration.md](./configuration.md) - wrangler.jsonc setup, TypeScript types, bindings, environment variables
- [api.md](./api.md) - env.AI.run(), streaming, function calling, REST API, response types
- [patterns.md](./patterns.md) - RAG with Vectorize, prompt engineering, batching, error handling, caching
- [gotchas.md](./gotchas.md) - Deprecated @cloudflare/ai package, rate limits, pricing, common errors
## See Also
- [vectorize](../vectorize/) - Vector database for RAG patterns
- [ai-gateway](../ai-gateway/) - Caching, rate limiting, analytics for AI requests
- [workers](../workers/) - Worker runtime and fetch handler patterns

View File

@@ -0,0 +1,112 @@
# Workers AI API Reference
## Core Method
```typescript
const response = await env.AI.run(model, input);
```
## Text Generation
```typescript
const result = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [
{ role: 'system', content: 'You are helpful' },
{ role: 'user', content: 'Hello' }
],
temperature: 0.7, // 0-1
max_tokens: 100
});
console.log(result.response);
```
**Streaming:**
```typescript
const stream = await env.AI.run(model, { messages, stream: true });
return new Response(stream, { headers: { 'Content-Type': 'text/event-stream' } });
```
## Embeddings
```typescript
const result = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
text: ['Query', 'Doc 1', 'Doc 2'] // Batch for efficiency
});
const [queryEmbed, doc1Embed, doc2Embed] = result.data; // 768-dim vectors
```
## Function Calling
```typescript
const tools = [{
type: 'function',
function: {
name: 'getWeather',
description: 'Get weather for location',
parameters: {
type: 'object',
properties: { location: { type: 'string' } },
required: ['location']
}
}
}];
const response = await env.AI.run(model, { messages, tools });
if (response.tool_calls) {
const args = JSON.parse(response.tool_calls[0].function.arguments);
// Execute function, send result back
}
```
## Image Generation
```typescript
const image = await env.AI.run('@cf/stabilityai/stable-diffusion-xl-base-1.0', {
prompt: 'Mountain sunset',
num_steps: 20, // 1-20
guidance: 7.5 // 1-20
});
return new Response(image, { headers: { 'Content-Type': 'image/png' } });
```
## Speech Recognition
```typescript
const audioArray = Array.from(new Uint8Array(await request.arrayBuffer()));
const result = await env.AI.run('@cf/openai/whisper', { audio: audioArray });
console.log(result.text);
```
## Translation
```typescript
const result = await env.AI.run('@cf/meta/m2m100-1.2b', {
text: 'Hello',
source_lang: 'en',
target_lang: 'es'
});
console.log(result.translated_text);
```
## REST API
```bash
curl https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/@cf/meta/llama-3.1-8b-instruct \
-H "Authorization: Bearer $TOKEN" \
-d '{"messages":[{"role":"user","content":"Hello"}]}'
```
## Error Codes
| Code | Meaning | Fix |
|------|---------|-----|
| 7502 | Model not found | Check spelling |
| 7504 | Validation failed | Verify input schema |
| 7505 | Rate limited | Reduce rate or upgrade |
| 7506 | Context exceeded | Reduce input size |
## Performance Tips
1. **Batch embeddings** - single request for multiple texts
2. **Stream long responses** - reduce perceived latency
3. **Accept cold starts** - first request ~1-3s, subsequent ~100-500ms

View File

@@ -0,0 +1,97 @@
# Workers AI Configuration
## wrangler.jsonc
```jsonc
{
"name": "my-ai-worker",
"main": "src/index.ts",
"compatibility_date": "2024-01-01",
"ai": {
"binding": "AI"
}
}
```
## TypeScript
```bash
npm install --save-dev @cloudflare/workers-types
```
```typescript
interface Env {
AI: Ai;
}
export default {
async fetch(request: Request, env: Env) {
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [{ role: 'user', content: 'Hello' }]
});
return Response.json(response);
}
};
```
## Local Development
```bash
wrangler dev --remote # Required for AI - no local inference
```
## REST API
```typescript
const response = await fetch(
`https://api.cloudflare.com/client/v4/accounts/${ACCOUNT_ID}/ai/run/@cf/meta/llama-3.1-8b-instruct`,
{
method: 'POST',
headers: { 'Authorization': `Bearer ${API_TOKEN}` },
body: JSON.stringify({ messages: [{ role: 'user', content: 'Hello' }] })
}
);
```
Create API token at: dash.cloudflare.com/profile/api-tokens (Workers AI - Read permission)
## SDK Compatibility
**OpenAI SDK:**
```typescript
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: env.CLOUDFLARE_API_TOKEN,
baseURL: `https://api.cloudflare.com/client/v4/accounts/${env.ACCOUNT_ID}/ai/v1`
});
```
## Multi-Model Setup
```typescript
const MODELS = {
chat: '@cf/meta/llama-3.1-8b-instruct',
embed: '@cf/baai/bge-base-en-v1.5',
image: '@cf/stabilityai/stable-diffusion-xl-base-1.0'
};
```
## RAG Setup (with Vectorize)
```jsonc
{
"ai": { "binding": "AI" },
"vectorize": {
"bindings": [{ "binding": "VECTORIZE", "index_name": "embeddings-index" }]
}
}
```
## Troubleshooting
| Error | Fix |
|-------|-----|
| `env.AI is undefined` | Check `ai` binding in wrangler.jsonc |
| Local AI doesn't work | Use `wrangler dev --remote` |
| Type 'Ai' not found | Install `@cloudflare/workers-types` |
| @cloudflare/ai package error | Don't install - use native binding |

View File

@@ -0,0 +1,114 @@
# Workers AI Gotchas
## Critical: @cloudflare/ai is DEPRECATED
```typescript
// ❌ WRONG - Don't install @cloudflare/ai
import Ai from '@cloudflare/ai';
// ✅ CORRECT - Use native binding
export default {
async fetch(request: Request, env: Env) {
await env.AI.run('@cf/meta/llama-3.1-8b-instruct', { messages: [...] });
}
}
```
## Development
### "AI inference doesn't work locally"
```bash
# ❌ Local AI doesn't work
wrangler dev
# ✅ Use remote
wrangler dev --remote
```
### "env.AI is undefined"
Add binding to wrangler.jsonc:
```jsonc
{ "ai": { "binding": "AI" } }
```
## API Responses
### Embedding response shape varies
```typescript
// @cf/baai/bge-base-en-v1.5 returns: { data: [[0.1, 0.2, ...]] }
const embedding = response.data[0]; // Get first element
```
### Stream returns ReadableStream
```typescript
const stream = await env.AI.run(model, { messages: [...], stream: true });
for await (const chunk of stream) { console.log(chunk.response); }
```
## Rate Limits & Pricing
| Model Type | Neurons/Request |
|------------|-----------------|
| Small text (7B) | ~50-200 |
| Large text (70B) | ~500-2000 |
| Embeddings | ~5-20 |
| Image gen | ~10,000+ |
**Free tier**: 10,000 neurons/day
```typescript
// ❌ EXPENSIVE - 70B model
await env.AI.run('@cf/meta/llama-3.1-70b-instruct', ...);
// ✅ CHEAPER - Use smallest that works
await env.AI.run('@cf/meta/llama-3.1-8b-instruct', ...);
```
## Model-Specific
### Function calling
Only `@cf/meta/llama-3.1-*` and `mistral-7b-instruct-v0.2` support tools.
### Empty response
Check context limits (2K-8K tokens). Validate input structure.
### Inconsistent responses
Set `temperature: 0` for deterministic outputs.
### Cold start latency
First request: 1-3s. Use AI Gateway caching for frequent prompts.
## TypeScript
```typescript
interface Env {
AI: Ai; // From @cloudflare/workers-types
}
interface TextGenerationResponse { response: string; }
interface EmbeddingResponse { data: number[][]; shape: number[]; }
```
## Common Errors
### 7502: Model not found
Check exact model name at developers.cloudflare.com/workers-ai/models/
### 7504: Input validation failed
```typescript
// Text gen requires messages array
await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [{ role: 'user', content: 'Hello' }] // ✅
});
// Embeddings require text
await env.AI.run('@cf/baai/bge-base-en-v1.5', { text: 'Hello' }); // ✅
```
## Vercel AI SDK Integration
```typescript
import { openai } from '@ai-sdk/openai';
const model = openai('gpt-3.5-turbo', {
baseURL: 'https://api.cloudflare.com/client/v4/accounts/<ACCOUNT_ID>/ai/v1',
headers: { Authorization: 'Bearer <API_TOKEN>' }
});
```

View File

@@ -0,0 +1,120 @@
# Workers AI Patterns
## RAG (Retrieval-Augmented Generation)
```typescript
// 1. Embed query
const embedding = await env.AI.run('@cf/baai/bge-base-en-v1.5', { text: query });
// 2. Search vectors
const results = await env.VECTORIZE.query(embedding.data[0], {
topK: 5, returnMetadata: true
});
// 3. Build context
const context = results.matches.map(m => m.metadata?.text).join('\n\n');
// 4. Generate with context
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [
{ role: 'system', content: `Answer based on:\n\n${context}` },
{ role: 'user', content: query }
]
});
```
## Streaming (SSE)
```typescript
const stream = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages, stream: true
});
const { readable, writable } = new TransformStream();
const writer = writable.getWriter();
(async () => {
for await (const chunk of stream) {
await writer.write(new TextEncoder().encode(`data: ${JSON.stringify(chunk)}\n\n`));
}
await writer.write(new TextEncoder().encode('data: [DONE]\n\n'));
await writer.close();
})();
return new Response(readable, {
headers: { 'Content-Type': 'text/event-stream' }
});
```
## Error Handling & Retry
```typescript
async function runWithRetry(env, model, input, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await env.AI.run(model, input);
} catch (error) {
if (error.message?.includes('7505') && attempt < maxRetries - 1) {
await new Promise(r => setTimeout(r, Math.pow(2, attempt) * 1000));
continue;
}
throw error;
}
}
}
```
## Model Fallback
```typescript
try {
return await env.AI.run('@cf/meta/llama-3.1-70b-instruct', { messages });
} catch {
return await env.AI.run('@cf/meta/llama-3.1-8b-instruct', { messages });
}
```
## Prompt Patterns
```typescript
// System prompts
const PROMPTS = {
json: 'Respond with valid JSON only.',
concise: 'Keep responses brief.',
cot: 'Think step by step before answering.'
};
// Few-shot
messages: [
{ role: 'system', content: 'Extract as JSON' },
{ role: 'user', content: 'John bought 3 apples for $5' },
{ role: 'assistant', content: '{"name":"John","item":"apples","qty":3}' },
{ role: 'user', content: actualInput }
]
```
## Parallel Execution
```typescript
const [sentiment, summary, embedding] = await Promise.all([
env.AI.run('@cf/mistral/mistral-7b-instruct-v0.1', { messages: sentimentPrompt }),
env.AI.run('@cf/meta/llama-3.1-8b-instruct', { messages: summaryPrompt }),
env.AI.run('@cf/baai/bge-base-en-v1.5', { text })
]);
```
## Cost Optimization
| Task | Model | Neurons |
|------|-------|---------|
| Classify | `@cf/mistral/mistral-7b-instruct-v0.1` | ~50 |
| Chat | `@cf/meta/llama-3.1-8b-instruct` | ~200 |
| Complex | `@cf/meta/llama-3.1-70b-instruct` | ~2000 |
| Embed | `@cf/baai/bge-base-en-v1.5` | ~10 |
```typescript
// Batch embeddings
const response = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
text: textsArray // Process multiple at once
});
```