update skills

2026-05-09 00:41:27 -07:00 · 2026-03-17 16:53:22 -07:00
parent 0b0783ef8e
commit f9a530667e
389 changed files with 54512 additions and 1 deletions
@@ -0,0 +1,197 @@
+# Cloudflare Workers AI
+
+Expert guidance for Cloudflare Workers AI - serverless GPU-powered AI inference at the edge.
+
+## Overview
+
+Workers AI provides:
+- 50+ pre-trained models (LLMs, embeddings, image generation, speech-to-text, translation)
+- Native Workers binding (no external API calls)
+- Pay-per-use pricing (neurons consumed per inference)
+- OpenAI-compatible REST API
+- Streaming support for text generation
+- Function calling with compatible models
+
+**Architecture**: Inference runs on Cloudflare's GPU network. Models load on first request (cold start 1-3s), subsequent requests are faster.
+
+## Quick Start
+
+```typescript
+interface Env {
+  AI: Ai;
+}
+
+export default {
+  async fetch(request: Request, env: Env) {
+    const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
+      messages: [{ role: 'user', content: 'What is Cloudflare?' }]
+    });
+    return Response.json(response);
+  }
+};
+```
+
+```bash
+# Setup - add binding to wrangler.jsonc
+wrangler dev --remote  # Must use --remote for AI
+wrangler deploy
+```
+
+## Model Selection Decision Tree
+
+### Text Generation (Chat/Completion)
+
+**Quality Priority**:
+- **Best quality**: `@cf/meta/llama-3.1-70b-instruct` (expensive, ~2000 neurons)
+- **Balanced**: `@cf/meta/llama-3.1-8b-instruct` (good quality, ~200 neurons)
+- **Fastest/cheapest**: `@cf/mistral/mistral-7b-instruct-v0.1` (~50 neurons)
+
+**Function Calling**:
+- Use `@cf/meta/llama-3.1-8b-instruct` or `@cf/meta/llama-3.1-70b-instruct` (native tool support)
+
+**Code Generation**:
+- Use `@cf/deepseek-ai/deepseek-coder-6.7b-instruct` (specialized for code)
+
+### Embeddings (Semantic Search/RAG)
+
+**English text**:
+- **Best**: `@cf/baai/bge-large-en-v1.5` (1024 dims, highest quality)
+- **Balanced**: `@cf/baai/bge-base-en-v1.5` (768 dims, good quality)
+- **Fast**: `@cf/baai/bge-small-en-v1.5` (384 dims, lower quality but fast)
+
+**Multilingual**:
+- Use `@hf/sentence-transformers/paraphrase-multilingual-minilm-l12-v2`
+
+### Image Generation
+
+- **Stable Diffusion**: `@cf/stabilityai/stable-diffusion-xl-base-1.0` (~10,000 neurons)
+- **Portraits**: `@cf/lykon/dreamshaper-8-lcm` (optimized for faces)
+
+### Other Tasks
+
+- **Speech-to-text**: `@cf/openai/whisper`
+- **Translation**: `@cf/meta/m2m100-1.2b` (100 languages)
+- **Image classification**: `@cf/microsoft/resnet-50`
+
+## SDK Approach Decision Tree
+
+### Native Binding (Recommended)
+
+**When**: Building Workers/Pages with TypeScript  
+**Why**: Zero external dependencies, best performance, native types
+
+```typescript
+await env.AI.run(model, input);
+```
+
+### REST API
+
+**When**: External services, non-Workers environments, testing  
+**Why**: Standard HTTP, works anywhere
+
+```bash
+curl https://api.cloudflare.com/client/v4/accounts/<ACCOUNT_ID>/ai/run/@cf/meta/llama-3.1-8b-instruct \
+  -H "Authorization: Bearer <API_TOKEN>" \
+  -d '{"messages":[{"role":"user","content":"Hello"}]}'
+```
+
+### Vercel AI SDK Integration
+
+**When**: Using Vercel AI SDK features (streaming UI, tool calling abstractions)  
+**Why**: Unified interface across providers
+
+```typescript
+import { openai } from '@ai-sdk/openai';
+
+const model = openai('model-name', {
+  baseURL: 'https://api.cloudflare.com/client/v4/accounts/<ACCOUNT_ID>/ai/v1',
+  headers: { Authorization: 'Bearer <API_TOKEN>' }
+});
+```
+
+## RAG vs Direct Generation
+
+### Use RAG (Vectorize + Workers AI) When:
+- Answering questions about specific documents/data
+- Need factual accuracy from known corpus
+- Context exceeds model's window (>4K tokens)
+- Building knowledge base chat
+
+### Use Direct Generation When:
+- Creative writing, brainstorming
+- General knowledge questions
+- Small context fits in prompt (<4K tokens)
+- Cost optimization (RAG adds embedding + vector search costs)
+
+## Platform Limits
+
+| Limit | Free Tier | Paid Plans |
+|-------|-----------|------------|
+| Neurons/day | 10,000 | Pay per use |
+| Rate limit | Varies by model | Higher (contact support) |
+| Context window | Model dependent (2K-8K) | Same |
+| Streaming | ✅ Supported | ✅ Supported |
+| Function calling | ✅ Supported (select models) | ✅ Supported |
+
+**Pricing**: Free 10K neurons/day, then pay per neuron consumed (varies by model)
+
+## Common Tasks
+
+```typescript
+// Streaming text generation
+const stream = await env.AI.run(model, { messages, stream: true });
+for await (const chunk of stream) {
+  console.log(chunk.response);
+}
+
+// Embeddings for RAG
+const { data } = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
+  text: ['Query text', 'Document 1', 'Document 2']
+});
+
+// Function calling
+const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
+  messages: [{ role: 'user', content: 'What is the weather?' }],
+  tools: [{
+    type: 'function',
+    function: { name: 'getWeather', parameters: { ... } }
+  }]
+});
+```
+
+## Development Workflow
+
+```bash
+# Always use --remote for AI (local doesn't have models)
+wrangler dev --remote
+
+# Deploy to production
+wrangler deploy
+
+# View model catalog
+# https://developers.cloudflare.com/workers-ai/models/
+```
+
+## Reading Order
+
+**Start here**: Quick Start above → configuration.md (setup)
+
+**Common tasks**:
+- First time setup: configuration.md → Add binding + deploy
+- Choose model: Model Selection Decision Tree (above) → api.md
+- Build RAG: patterns.md → Vectorize integration
+- Optimize costs: Model Selection + gotchas.md (rate limits)
+- Debugging: gotchas.md → Common errors
+
+## In This Reference
+
+- [configuration.md](./configuration.md) - wrangler.jsonc setup, TypeScript types, bindings, environment variables
+- [api.md](./api.md) - env.AI.run(), streaming, function calling, REST API, response types
+- [patterns.md](./patterns.md) - RAG with Vectorize, prompt engineering, batching, error handling, caching
+- [gotchas.md](./gotchas.md) - Deprecated @cloudflare/ai package, rate limits, pricing, common errors
+
+## See Also
+
+- [vectorize](../vectorize/) - Vector database for RAG patterns
+- [ai-gateway](../ai-gateway/) - Caching, rate limiting, analytics for AI requests
+- [workers](../workers/) - Worker runtime and fetch handler patterns
@@ -0,0 +1,112 @@
+# Workers AI API Reference
+
+## Core Method
+
+```typescript
+const response = await env.AI.run(model, input);
+```
+
+## Text Generation
+
+```typescript
+const result = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
+  messages: [
+    { role: 'system', content: 'You are helpful' },
+    { role: 'user', content: 'Hello' }
+  ],
+  temperature: 0.7,  // 0-1
+  max_tokens: 100
+});
+console.log(result.response);
+```
+
+**Streaming:**
+```typescript
+const stream = await env.AI.run(model, { messages, stream: true });
+return new Response(stream, { headers: { 'Content-Type': 'text/event-stream' } });
+```
+
+## Embeddings
+
+```typescript
+const result = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
+  text: ['Query', 'Doc 1', 'Doc 2'] // Batch for efficiency
+});
+const [queryEmbed, doc1Embed, doc2Embed] = result.data; // 768-dim vectors
+```
+
+## Function Calling
+
+```typescript
+const tools = [{
+  type: 'function',
+  function: {
+    name: 'getWeather',
+    description: 'Get weather for location',
+    parameters: {
+      type: 'object',
+      properties: { location: { type: 'string' } },
+      required: ['location']
+    }
+  }
+}];
+
+const response = await env.AI.run(model, { messages, tools });
+if (response.tool_calls) {
+  const args = JSON.parse(response.tool_calls[0].function.arguments);
+  // Execute function, send result back
+}
+```
+
+## Image Generation
+
+```typescript
+const image = await env.AI.run('@cf/stabilityai/stable-diffusion-xl-base-1.0', {
+  prompt: 'Mountain sunset',
+  num_steps: 20,   // 1-20
+  guidance: 7.5    // 1-20
+});
+return new Response(image, { headers: { 'Content-Type': 'image/png' } });
+```
+
+## Speech Recognition
+
+```typescript
+const audioArray = Array.from(new Uint8Array(await request.arrayBuffer()));
+const result = await env.AI.run('@cf/openai/whisper', { audio: audioArray });
+console.log(result.text);
+```
+
+## Translation
+
+```typescript
+const result = await env.AI.run('@cf/meta/m2m100-1.2b', {
+  text: 'Hello',
+  source_lang: 'en',
+  target_lang: 'es'
+});
+console.log(result.translated_text);
+```
+
+## REST API
+
+```bash
+curl https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/@cf/meta/llama-3.1-8b-instruct \
+  -H "Authorization: Bearer $TOKEN" \
+  -d '{"messages":[{"role":"user","content":"Hello"}]}'
+```
+
+## Error Codes
+
+| Code | Meaning | Fix |
+|------|---------|-----|
+| 7502 | Model not found | Check spelling |
+| 7504 | Validation failed | Verify input schema |
+| 7505 | Rate limited | Reduce rate or upgrade |
+| 7506 | Context exceeded | Reduce input size |
+
+## Performance Tips
+
+1. **Batch embeddings** - single request for multiple texts
+2. **Stream long responses** - reduce perceived latency
+3. **Accept cold starts** - first request ~1-3s, subsequent ~100-500ms
@@ -0,0 +1,97 @@
+# Workers AI Configuration
+
+## wrangler.jsonc
+
+```jsonc
+{
+  "name": "my-ai-worker",
+  "main": "src/index.ts",
+  "compatibility_date": "2024-01-01",
+  "ai": {
+    "binding": "AI"
+  }
+}
+```
+
+## TypeScript
+
+```bash
+npm install --save-dev @cloudflare/workers-types
+```
+
+```typescript
+interface Env {
+  AI: Ai;
+}
+
+export default {
+  async fetch(request: Request, env: Env) {
+    const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
+      messages: [{ role: 'user', content: 'Hello' }]
+    });
+    return Response.json(response);
+  }
+};
+```
+
+## Local Development
+
+```bash
+wrangler dev --remote  # Required for AI - no local inference
+```
+
+## REST API
+
+```typescript
+const response = await fetch(
+  `https://api.cloudflare.com/client/v4/accounts/${ACCOUNT_ID}/ai/run/@cf/meta/llama-3.1-8b-instruct`,
+  {
+    method: 'POST',
+    headers: { 'Authorization': `Bearer ${API_TOKEN}` },
+    body: JSON.stringify({ messages: [{ role: 'user', content: 'Hello' }] })
+  }
+);
+```
+
+Create API token at: dash.cloudflare.com/profile/api-tokens (Workers AI - Read permission)
+
+## SDK Compatibility
+
+**OpenAI SDK:**
+```typescript
+import OpenAI from 'openai';
+const client = new OpenAI({
+  apiKey: env.CLOUDFLARE_API_TOKEN,
+  baseURL: `https://api.cloudflare.com/client/v4/accounts/${env.ACCOUNT_ID}/ai/v1`
+});
+```
+
+## Multi-Model Setup
+
+```typescript
+const MODELS = {
+  chat: '@cf/meta/llama-3.1-8b-instruct',
+  embed: '@cf/baai/bge-base-en-v1.5',
+  image: '@cf/stabilityai/stable-diffusion-xl-base-1.0'
+};
+```
+
+## RAG Setup (with Vectorize)
+
+```jsonc
+{
+  "ai": { "binding": "AI" },
+  "vectorize": {
+    "bindings": [{ "binding": "VECTORIZE", "index_name": "embeddings-index" }]
+  }
+}
+```
+
+## Troubleshooting
+
+| Error | Fix |
+|-------|-----|
+| `env.AI is undefined` | Check `ai` binding in wrangler.jsonc |
+| Local AI doesn't work | Use `wrangler dev --remote` |
+| Type 'Ai' not found | Install `@cloudflare/workers-types` |
+| @cloudflare/ai package error | Don't install - use native binding |
@@ -0,0 +1,114 @@
+# Workers AI Gotchas
+
+## Critical: @cloudflare/ai is DEPRECATED
+
+```typescript
+// ❌ WRONG - Don't install @cloudflare/ai
+import Ai from '@cloudflare/ai';
+
+// ✅ CORRECT - Use native binding
+export default {
+  async fetch(request: Request, env: Env) {
+    await env.AI.run('@cf/meta/llama-3.1-8b-instruct', { messages: [...] });
+  }
+}
+```
+
+## Development
+
+### "AI inference doesn't work locally"
+```bash
+# ❌ Local AI doesn't work
+wrangler dev
+# ✅ Use remote
+wrangler dev --remote
+```
+
+### "env.AI is undefined"
+Add binding to wrangler.jsonc:
+```jsonc
+{ "ai": { "binding": "AI" } }
+```
+
+## API Responses
+
+### Embedding response shape varies
+```typescript
+// @cf/baai/bge-base-en-v1.5 returns: { data: [[0.1, 0.2, ...]] }
+const embedding = response.data[0]; // Get first element
+```
+
+### Stream returns ReadableStream
+```typescript
+const stream = await env.AI.run(model, { messages: [...], stream: true });
+for await (const chunk of stream) { console.log(chunk.response); }
+```
+
+## Rate Limits & Pricing
+
+| Model Type | Neurons/Request |
+|------------|-----------------|
+| Small text (7B) | ~50-200 |
+| Large text (70B) | ~500-2000 |
+| Embeddings | ~5-20 |
+| Image gen | ~10,000+ |
+
+**Free tier**: 10,000 neurons/day
+
+```typescript
+// ❌ EXPENSIVE - 70B model
+await env.AI.run('@cf/meta/llama-3.1-70b-instruct', ...);
+// ✅ CHEAPER - Use smallest that works
+await env.AI.run('@cf/meta/llama-3.1-8b-instruct', ...);
+```
+
+## Model-Specific
+
+### Function calling
+Only `@cf/meta/llama-3.1-*` and `mistral-7b-instruct-v0.2` support tools.
+
+### Empty response
+Check context limits (2K-8K tokens). Validate input structure.
+
+### Inconsistent responses
+Set `temperature: 0` for deterministic outputs.
+
+### Cold start latency
+First request: 1-3s. Use AI Gateway caching for frequent prompts.
+
+## TypeScript
+
+```typescript
+interface Env {
+  AI: Ai; // From @cloudflare/workers-types
+}
+
+interface TextGenerationResponse { response: string; }
+interface EmbeddingResponse { data: number[][]; shape: number[]; }
+```
+
+## Common Errors
+
+### 7502: Model not found
+Check exact model name at developers.cloudflare.com/workers-ai/models/
+
+### 7504: Input validation failed
+```typescript
+// Text gen requires messages array
+await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
+  messages: [{ role: 'user', content: 'Hello' }]  // ✅
+});
+
+// Embeddings require text
+await env.AI.run('@cf/baai/bge-base-en-v1.5', { text: 'Hello' });  // ✅
+```
+
+## Vercel AI SDK Integration
+
+```typescript
+import { openai } from '@ai-sdk/openai';
+const model = openai('gpt-3.5-turbo', {
+  baseURL: 'https://api.cloudflare.com/client/v4/accounts/<ACCOUNT_ID>/ai/v1',
+  headers: { Authorization: 'Bearer <API_TOKEN>' }
+});
+```
@@ -0,0 +1,120 @@
+# Workers AI Patterns
+
+## RAG (Retrieval-Augmented Generation)
+
+```typescript
+// 1. Embed query
+const embedding = await env.AI.run('@cf/baai/bge-base-en-v1.5', { text: query });
+
+// 2. Search vectors
+const results = await env.VECTORIZE.query(embedding.data[0], {
+  topK: 5, returnMetadata: true
+});
+
+// 3. Build context
+const context = results.matches.map(m => m.metadata?.text).join('\n\n');
+
+// 4. Generate with context
+const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
+  messages: [
+    { role: 'system', content: `Answer based on:\n\n${context}` },
+    { role: 'user', content: query }
+  ]
+});
+```
+
+## Streaming (SSE)
+
+```typescript
+const stream = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
+  messages, stream: true
+});
+
+const { readable, writable } = new TransformStream();
+const writer = writable.getWriter();
+
+(async () => {
+  for await (const chunk of stream) {
+    await writer.write(new TextEncoder().encode(`data: ${JSON.stringify(chunk)}\n\n`));
+  }
+  await writer.write(new TextEncoder().encode('data: [DONE]\n\n'));
+  await writer.close();
+})();
+
+return new Response(readable, {
+  headers: { 'Content-Type': 'text/event-stream' }
+});
+```
+
+## Error Handling & Retry
+
+```typescript
+async function runWithRetry(env, model, input, maxRetries = 3) {
+  for (let attempt = 0; attempt < maxRetries; attempt++) {
+    try {
+      return await env.AI.run(model, input);
+    } catch (error) {
+      if (error.message?.includes('7505') && attempt < maxRetries - 1) {
+        await new Promise(r => setTimeout(r, Math.pow(2, attempt) * 1000));
+        continue;
+      }
+      throw error;
+    }
+  }
+}
+```
+
+## Model Fallback
+
+```typescript
+try {
+  return await env.AI.run('@cf/meta/llama-3.1-70b-instruct', { messages });
+} catch {
+  return await env.AI.run('@cf/meta/llama-3.1-8b-instruct', { messages });
+}
+```
+
+## Prompt Patterns
+
+```typescript
+// System prompts
+const PROMPTS = {
+  json: 'Respond with valid JSON only.',
+  concise: 'Keep responses brief.',
+  cot: 'Think step by step before answering.'
+};
+
+// Few-shot
+messages: [
+  { role: 'system', content: 'Extract as JSON' },
+  { role: 'user', content: 'John bought 3 apples for $5' },
+  { role: 'assistant', content: '{"name":"John","item":"apples","qty":3}' },
+  { role: 'user', content: actualInput }
+]
+```
+
+## Parallel Execution
+
+```typescript
+const [sentiment, summary, embedding] = await Promise.all([
+  env.AI.run('@cf/mistral/mistral-7b-instruct-v0.1', { messages: sentimentPrompt }),
+  env.AI.run('@cf/meta/llama-3.1-8b-instruct', { messages: summaryPrompt }),
+  env.AI.run('@cf/baai/bge-base-en-v1.5', { text })
+]);
+```
+
+## Cost Optimization
+
+| Task | Model | Neurons |
+|------|-------|---------|
+| Classify | `@cf/mistral/mistral-7b-instruct-v0.1` | ~50 |
+| Chat | `@cf/meta/llama-3.1-8b-instruct` | ~200 |
+| Complex | `@cf/meta/llama-3.1-70b-instruct` | ~2000 |
+| Embed | `@cf/baai/bge-base-en-v1.5` | ~10 |
+
+```typescript
+// Batch embeddings
+const response = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
+  text: textsArray // Process multiple at once
+});
+```