update skills

2026-07-03 21:13:31 -07:00 · 2026-03-17 16:53:22 -07:00
parent 0b0783ef8e
commit f9a530667e
389 changed files with 54512 additions and 1 deletions
@@ -0,0 +1,175 @@
+# Cloudflare AI Gateway
+
+Expert guidance for implementing Cloudflare AI Gateway - a universal gateway for AI model providers with analytics, caching, rate limiting, and routing capabilities.
+
+## When to Use This Reference
+
+- Setting up AI Gateway for any AI provider (OpenAI, Anthropic, Workers AI, etc.)
+- Implementing caching, rate limiting, or request retry/fallback
+- Configuring dynamic routing with A/B testing or model fallbacks
+- Managing provider API keys securely with BYOK
+- Adding security features (guardrails, DLP)
+- Setting up observability with logging and custom metadata
+- Debugging AI Gateway requests or optimizing configurations
+
+## Quick Start
+
+**What's your setup?**
+
+- **Using Vercel AI SDK** → Pattern 1 (recommended) - see [sdk-integration.md](./sdk-integration.md)
+- **Using OpenAI SDK** → Pattern 2 - see [sdk-integration.md](./sdk-integration.md)
+- **Cloudflare Worker + Workers AI** → Pattern 3 - see [sdk-integration.md](./sdk-integration.md)
+- **Direct HTTP (any language)** → Pattern 4 - see [configuration.md](./configuration.md)
+- **Framework (LangChain, etc.)** → See [sdk-integration.md](./sdk-integration.md)
+
+## Pattern 1: Vercel AI SDK (Recommended)
+
+Most modern pattern using official `ai-gateway-provider` package with automatic fallbacks.
+
+```typescript
+import { createAiGateway } from 'ai-gateway-provider';
+import { createOpenAI } from '@ai-sdk/openai';
+import { generateText } from 'ai';
+
+const gateway = createAiGateway({
+  accountId: process.env.CF_ACCOUNT_ID,
+  gateway: process.env.CF_GATEWAY_ID,
+});
+
+const openai = createOpenAI({ 
+  apiKey: process.env.OPENAI_API_KEY 
+});
+
+// Single model
+const { text } = await generateText({
+  model: gateway(openai('gpt-4o')),
+  prompt: 'Hello'
+});
+
+// Automatic fallback array
+const { text } = await generateText({
+  model: gateway([
+    openai('gpt-4o'),              // Try first
+    anthropic('claude-sonnet-4-5'), // Fallback
+  ]),
+  prompt: 'Hello'
+});
+```
+
+**Install:** `npm install ai-gateway-provider ai @ai-sdk/openai @ai-sdk/anthropic`
+
+## Pattern 2: OpenAI SDK
+
+Drop-in replacement for OpenAI API with multi-provider support.
+
+```typescript
+import OpenAI from 'openai';
+
+const client = new OpenAI({
+  apiKey: process.env.OPENAI_API_KEY,
+  baseURL: `https://gateway.ai.cloudflare.com/v1/${accountId}/${gatewayId}/compat`,
+  defaultHeaders: {
+    'cf-aig-authorization': `Bearer ${cfToken}` // For authenticated gateways
+  }
+});
+
+// Switch providers by changing model format: {provider}/{model}
+const response = await client.chat.completions.create({
+  model: 'openai/gpt-4o', // or 'anthropic/claude-sonnet-4-5'
+  messages: [{ role: 'user', content: 'Hello!' }]
+});
+```
+
+## Pattern 3: Workers AI Binding
+
+For Cloudflare Workers using Workers AI.
+
+```typescript
+export default {
+  async fetch(request, env, ctx) {
+    const response = await env.AI.run(
+      '@cf/meta/llama-3-8b-instruct',
+      { messages: [{ role: 'user', content: 'Hello!' }] },
+      { 
+        gateway: { 
+          id: 'my-gateway',
+          metadata: { userId: '123', team: 'engineering' }
+        } 
+      }
+    );
+    
+    return Response.json(response);
+  }
+};
+```
+
+## Headers Quick Reference
+
+| Header | Purpose | Example | Notes |
+|--------|---------|---------|-------|
+| `cf-aig-authorization` | Gateway auth | `Bearer {token}` | Required for authenticated gateways |
+| `cf-aig-metadata` | Tracking | `{"userId":"x"}` | Max 5 entries, flat structure |
+| `cf-aig-cache-ttl` | Cache duration | `3600` | Seconds, min 60, max 2592000 (30 days) |
+| `cf-aig-skip-cache` | Bypass cache | `true` | - |
+| `cf-aig-cache-key` | Custom cache key | `my-key` | Must be unique per response |
+| `cf-aig-collect-log` | Skip logging | `false` | Default: true |
+| `cf-aig-cache-status` | Cache hit/miss | Response only | `HIT` or `MISS` |
+
+## In This Reference
+
+| File | Purpose |
+|------|---------|
+| [sdk-integration.md](./sdk-integration.md) | Vercel AI SDK, OpenAI SDK, Workers binding patterns |
+| [configuration.md](./configuration.md) | Dashboard setup, wrangler, API tokens |
+| [features.md](./features.md) | Caching, rate limits, guardrails, DLP, BYOK, unified billing |
+| [dynamic-routing.md](./dynamic-routing.md) | Fallbacks, A/B testing, conditional routing |
+| [troubleshooting.md](./troubleshooting.md) | Debugging, errors, observability, gotchas |
+
+## Reading Order
+
+| Task | Files |
+|------|-------|
+| First-time setup | README + [configuration.md](./configuration.md) |
+| SDK integration | README + [sdk-integration.md](./sdk-integration.md) |
+| Enable caching | README + [features.md](./features.md) |
+| Setup fallbacks | README + [dynamic-routing.md](./dynamic-routing.md) |
+| Debug errors | README + [troubleshooting.md](./troubleshooting.md) |
+
+## Architecture
+
+AI Gateway acts as a proxy between your application and AI providers:
+
+```
+Your App → AI Gateway → AI Provider (OpenAI, Anthropic, etc.)
+         ↓
+    Analytics, Caching, Rate Limiting, Logging
+```
+
+**Key URL patterns:**
+- Unified API (OpenAI-compatible): `https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/compat/chat/completions`
+- Provider-specific: `https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/{provider}/{endpoint}`
+- Dynamic routes: Use route name instead of model: `dynamic/{route-name}`
+
+## Gateway Types
+
+1. **Unauthenticated Gateway**: Open access (not recommended for production)
+2. **Authenticated Gateway**: Requires `cf-aig-authorization` header with Cloudflare API token (recommended)
+
+## Provider Authentication Options
+
+1. **Unified Billing**: Use AI Gateway billing to pay for inference (keyless mode - no provider API key needed)
+2. **BYOK (Store Keys)**: Store provider API keys in Cloudflare dashboard
+3. **Request Headers**: Include provider API key in each request
+
+## Related Skills
+
+- [Workers AI](../workers-ai/README.md) - For `env.AI.run()` details
+- [Agents SDK](../agents-sdk/README.md) - For stateful AI patterns
+- [Vectorize](../vectorize/README.md) - For RAG patterns with embeddings
+
+## Resources
+
+- [Official Docs](https://developers.cloudflare.com/ai-gateway/)
+- [API Reference](https://developers.cloudflare.com/api/resources/ai_gateway/)
+- [Provider Guides](https://developers.cloudflare.com/ai-gateway/usage/providers/)
+- [Discord Community](https://discord.cloudflare.com)
@@ -0,0 +1,111 @@
+# Configuration & Setup
+
+## Creating a Gateway
+
+### Dashboard
+AI > AI Gateway > Create Gateway > Configure (auth, caching, rate limiting, logging)
+
+### API
+```bash
+curl -X POST https://api.cloudflare.com/client/v4/accounts/{account_id}/ai-gateway/gateways \
+  -H "Authorization: Bearer $CF_API_TOKEN" -H "Content-Type: application/json" \
+  -d '{"id":"my-gateway","cache_ttl":3600,"rate_limiting_interval":60,"rate_limiting_limit":100,"collect_logs":true}'
+```
+
+**Naming:** lowercase alphanumeric + hyphens (e.g., `prod-api`, `dev-chat`)
+
+## Wrangler Integration
+
+```toml
+[ai]
+binding = "AI"
+
+[[ai.gateway]]
+id = "my-gateway"
+```
+
+```bash
+wrangler secret put CF_API_TOKEN
+wrangler secret put OPENAI_API_KEY  # If not using BYOK
+```
+
+## Authentication
+
+### Gateway Auth (protects gateway access)
+```typescript
+const client = new OpenAI({
+  baseURL: `https://gateway.ai.cloudflare.com/v1/${accountId}/${gatewayId}/openai`,
+  defaultHeaders: { 'cf-aig-authorization': `Bearer ${cfToken}` }
+});
+```
+
+### Provider Auth Options
+
+**1. Unified Billing (keyless)** - pay through Cloudflare, no provider key:
+```typescript
+const client = new OpenAI({
+  baseURL: `https://gateway.ai.cloudflare.com/v1/${accountId}/${gatewayId}/openai`,
+  defaultHeaders: { 'cf-aig-authorization': `Bearer ${cfToken}` }
+});
+```
+Supports: OpenAI, Anthropic, Google AI Studio
+
+**2. BYOK** - store keys in dashboard (Provider Keys > Add), no key in code
+
+**3. Request Headers** - pass provider key per request:
+```typescript
+const client = new OpenAI({
+  apiKey: process.env.OPENAI_API_KEY,
+  baseURL: `https://gateway.ai.cloudflare.com/v1/${accountId}/${gatewayId}/openai`,
+  defaultHeaders: { 'cf-aig-authorization': `Bearer ${cfToken}` }
+});
+```
+
+## API Token Permissions
+
+- **Gateway management:** AI Gateway - Read + Edit
+- **Gateway access:** AI Gateway - Read (minimum)
+
+## Gateway Management API
+
+```bash
+# List
+curl https://api.cloudflare.com/client/v4/accounts/{account_id}/ai-gateway/gateways \
+  -H "Authorization: Bearer $CF_API_TOKEN"
+
+# Get
+curl .../gateways/{gateway_id}
+
+# Update
+curl -X PUT .../gateways/{gateway_id} \
+  -d '{"cache_ttl":7200,"rate_limiting_limit":200}'
+
+# Delete
+curl -X DELETE .../gateways/{gateway_id}
+```
+
+## Getting IDs
+
+- **Account ID:** Dashboard > Overview > Copy
+- **Gateway ID:** AI Gateway > Gateway name column
+
+## Python Example
+
+```python
+from openai import OpenAI
+import os
+
+client = OpenAI(
+    api_key=os.environ.get("OPENAI_API_KEY"),
+    base_url=f"https://gateway.ai.cloudflare.com/v1/{os.environ['CF_ACCOUNT_ID']}/{os.environ['GATEWAY_ID']}/openai",
+    default_headers={"cf-aig-authorization": f"Bearer {os.environ['CF_API_TOKEN']}"}
+)
+```
+
+## Best Practices
+
+1. **Always authenticate gateways in production**
+2. **Use BYOK or unified billing** - secrets out of code
+3. **Environment-specific gateways** - separate dev/staging/prod
+4. **Set rate limits** - prevent runaway costs
+5. **Enable logging** - track usage, debug issues
@@ -0,0 +1,82 @@
+# Dynamic Routing
+
+Configure complex routing in dashboard without code changes. Use route names instead of model names.
+
+## Usage
+
+```typescript
+const response = await client.chat.completions.create({
+  model: 'dynamic/smart-chat', // Route name from dashboard
+  messages: [{ role: 'user', content: 'Hello!' }]
+});
+```
+
+## Node Types
+
+| Node | Purpose | Use Case |
+|------|---------|----------|
+| **Conditional** | Branch on metadata | Paid vs free users, geo routing |
+| **Percentage** | A/B split traffic | Model testing, gradual rollouts |
+| **Rate Limit** | Enforce quotas | Per-user/team limits |
+| **Budget Limit** | Cost quotas | Per-user spending caps |
+| **Model** | Call provider | Final destination |
+
+## Metadata
+
+Pass via header (max 5 entries, flat only):
+```typescript
+headers: {
+  'cf-aig-metadata': JSON.stringify({
+    userId: 'user-123',
+    tier: 'pro',
+    region: 'us-east'
+  })
+}
+```
+
+## Common Patterns
+
+**Multi-model fallback:**
+```
+Start → GPT-4 → On error: Claude → On error: Llama
+```
+
+**Tiered access:**
+```
+Conditional: tier == 'enterprise' → GPT-4 (no limit)
+Conditional: tier == 'pro' → Rate Limit 1000/hr → GPT-4o
+Conditional: tier == 'free' → Rate Limit 10/hr → GPT-4o-mini
+```
+
+**Gradual rollout:**
+```
+Percentage: 10% → New model, 90% → Old model
+```
+
+**Cost-based fallback:**
+```
+Budget Limit: $100/day per teamId
+  < 80%: GPT-4
+  >= 80%: GPT-4o-mini
+  >= 100%: Error
+```
+
+## Version Management
+
+- Save changes as new version
+- Test with `model: 'dynamic/route@v2'`
+- Roll back by deploying previous version
+
+## Monitoring
+
+Dashboard → Gateway → Dynamic Routes:
+- Request count per path
+- Success/error rates
+- Latency/cost by path
+
+## Limitations
+
+- Max 5 metadata entries
+- Values: string/number/boolean/null only
+- No nested objects
+- Route names: alphanumeric + hyphens
@@ -0,0 +1,96 @@
+# Features & Capabilities
+
+## Caching
+
+Dashboard: Settings → Cache Responses → Enable
+
+```typescript
+// Custom TTL (1 hour)
+headers: { 'cf-aig-cache-ttl': '3600' }
+
+// Skip cache
+headers: { 'cf-aig-skip-cache': 'true' }
+
+// Custom cache key
+headers: { 'cf-aig-cache-key': 'greeting-en' }
+```
+
+**Limits:** TTL 60s - 30 days. **Does NOT work with streaming.**
+
+## Rate Limiting
+
+Dashboard: Settings → Rate-limiting → Enable
+
+- **Fixed window:** Resets at intervals
+- **Sliding window:** Rolling window (more accurate)
+- Returns `429` when exceeded
+
+## Guardrails
+
+Dashboard: Settings → Guardrails → Enable
+
+Filter prompts/responses for inappropriate content. Actions: Flag (log) or Block (reject).
+
+## Data Loss Prevention (DLP)
+
+Dashboard: Settings → DLP → Enable
+
+Detect PII (emails, SSNs, credit cards). Actions: Flag, Block, or Redact.
+
+## Billing Modes
+
+| Mode | Description | Setup |
+|------|-------------|-------|
+| **Unified Billing** | Pay through Cloudflare, no provider keys | Use `cf-aig-authorization` header only |
+| **BYOK** | Store provider keys in dashboard | Add keys in Provider Keys section |
+| **Pass-through** | Send provider key with each request | Include provider's auth header |
+
+## Zero Data Retention
+
+Dashboard: Settings → Privacy → Zero Data Retention
+
+No prompts/responses stored. Request counts and costs still tracked.
+
+## Logging
+
+Dashboard: Settings → Logs → Enable (up to 10M logs)
+
+Each entry: prompt, response, provider, model, tokens, cost, duration, cache status, metadata.
+
+```typescript
+// Skip logging for request
+headers: { 'cf-aig-collect-log': 'false' }
+```
+
+**Export:** Use Logpush to S3, GCS, Datadog, Splunk, etc.
+
+## Custom Cost Tracking
+
+For models not in Cloudflare's pricing database:
+
+Dashboard: Gateway → Settings → Custom Costs
+
+Or via API: set `model`, `input_cost`, `output_cost`.
+
+## Supported Providers (22+)
+
+| Provider | Unified API | Notes |
+|----------|-------------|-------|
+| OpenAI | `openai/gpt-4o` | Full support |
+| Anthropic | `anthropic/claude-sonnet-4-5` | Full support |
+| Google AI | `google-ai-studio/gemini-2.0-flash` | Full support |
+| Workers AI | `workersai/@cf/meta/llama-3` | Native |
+| Azure OpenAI | `azure-openai/*` | Deployment names |
+| AWS Bedrock | Provider endpoint only | `/bedrock/*` |
+| Groq | `groq/*` | Fast inference |
+| Mistral, Cohere, Perplexity, xAI, DeepSeek, Cerebras | Full support | - |
+
+## Best Practices
+
+1. Enable caching for deterministic prompts
+2. Set rate limits to prevent abuse
+3. Use guardrails for user-facing AI
+4. Enable DLP for sensitive data
+5. Use unified billing or BYOK for simpler key management
+6. Enable logging for debugging
+7. Use zero data retention when privacy required
@@ -0,0 +1,114 @@
+# AI Gateway SDK Integration
+
+## Vercel AI SDK (Recommended)
+
+```typescript
+import { createAiGateway } from 'ai-gateway-provider';
+import { createOpenAI } from '@ai-sdk/openai';
+import { generateText } from 'ai';
+
+const gateway = createAiGateway({
+  accountId: process.env.CF_ACCOUNT_ID,
+  gateway: process.env.CF_GATEWAY_ID,
+  apiKey: process.env.CF_API_TOKEN // Optional for auth gateways
+});
+
+const openai = createOpenAI({ apiKey: process.env.OPENAI_API_KEY });
+
+// Single model
+const { text } = await generateText({
+  model: gateway(openai('gpt-4o')),
+  prompt: 'Hello'
+});
+
+// Automatic fallback array
+const { text } = await generateText({
+  model: gateway([
+    openai('gpt-4o'),
+    anthropic('claude-sonnet-4-5'),
+    openai('gpt-4o-mini')
+  ]),
+  prompt: 'Complex task'
+});
+```
+
+### Options
+
+```typescript
+model: gateway(openai('gpt-4o'), {
+  cacheKey: 'my-key',
+  cacheTtl: 3600,
+  metadata: { userId: 'u123', team: 'eng' }, // Max 5 entries
+  retries: { maxAttempts: 3, backoff: 'exponential' }
+})
+```
+
+## OpenAI SDK
+
+```typescript
+const client = new OpenAI({
+  apiKey: process.env.OPENAI_API_KEY,
+  baseURL: `https://gateway.ai.cloudflare.com/v1/${accountId}/${gatewayId}/openai`,
+  defaultHeaders: { 'cf-aig-authorization': `Bearer ${cfToken}` }
+});
+
+// Unified API - switch providers via model name
+model: 'openai/gpt-4o'  // or 'anthropic/claude-sonnet-4-5'
+```
+
+## Anthropic SDK
+
+```typescript
+const client = new Anthropic({
+  apiKey: process.env.ANTHROPIC_API_KEY,
+  baseURL: `https://gateway.ai.cloudflare.com/v1/${accountId}/${gatewayId}/anthropic`,
+  defaultHeaders: { 'cf-aig-authorization': `Bearer ${cfToken}` }
+});
+```
+
+## Workers AI Binding
+
+```toml
+# wrangler.toml
+[ai]
+binding = "AI"
+[[ai.gateway]]
+id = "my-gateway"
+```
+
+```typescript
+await env.AI.run('@cf/meta/llama-3-8b-instruct', 
+  { messages: [...] },
+  { gateway: { id: 'my-gateway', metadata: { userId: '123' } } }
+);
+```
+
+## LangChain / LlamaIndex
+
+```typescript
+// Use OpenAI SDK pattern with custom baseURL
+new ChatOpenAI({
+  configuration: {
+    baseURL: `https://gateway.ai.cloudflare.com/v1/${accountId}/${gatewayId}/openai`
+  }
+});
+```
+
+## HTTP / cURL
+
+```bash
+curl https://gateway.ai.cloudflare.com/v1/{account}/{gateway}/openai/chat/completions \
+  -H "Authorization: Bearer $OPENAI_KEY" \
+  -H "cf-aig-authorization: Bearer $CF_TOKEN" \
+  -H "cf-aig-metadata: {\"userId\":\"123\"}" \
+  -d '{"model":"gpt-4o","messages":[...]}'
+```
+
+## Headers Reference
+
+| Header | Purpose |
+|--------|---------|
+| `cf-aig-authorization` | Gateway auth token |
+| `cf-aig-metadata` | JSON object (max 5 keys) |
+| `cf-aig-cache-ttl` | Cache TTL in seconds |
+| `cf-aig-skip-cache` | `true` to bypass cache |
@@ -0,0 +1,88 @@
+# AI Gateway Troubleshooting
+
+## Common Errors
+
+| Error | Cause | Fix |
+|-------|-------|-----|
+| 401 | Missing `cf-aig-authorization` header | Add header with CF API token |
+| 403 | Invalid provider key / BYOK expired | Check provider key in dashboard |
+| 429 | Rate limit exceeded | Increase limit or implement backoff |
+
+### 401 Fix
+
+```typescript
+const client = new OpenAI({
+  baseURL: `https://gateway.ai.cloudflare.com/v1/${accountId}/${gatewayId}/openai`,
+  defaultHeaders: { 'cf-aig-authorization': `Bearer ${CF_API_TOKEN}` }
+});
+```
+
+### 429 Retry Pattern
+
+```typescript
+async function requestWithRetry(fn, maxRetries = 3) {
+  for (let i = 0; i < maxRetries; i++) {
+    try { return await fn(); }
+    catch (e) {
+      if (e.status === 429 && i < maxRetries - 1) {
+        await new Promise(r => setTimeout(r, Math.pow(2, i) * 1000));
+        continue;
+      }
+      throw e;
+    }
+  }
+}
+```
+
+## Gotchas
+
+| Issue | Reality |
+|-------|---------|
+| Metadata limits | Max 5 entries, flat only (no nesting) |
+| Cache key collision | Use unique keys per expected response |
+| BYOK + Unified Billing | Mutually exclusive |
+| Rate limit scope | Per-gateway, not per-user (use dynamic routing for per-user) |
+| Log delay | 30-60 seconds normal |
+| Streaming + caching | **Incompatible** |
+| Model name (unified API) | Prefix required: `openai/gpt-4o`, not `gpt-4o` |
+
+## Cache Not Working
+
+**Causes:**
+- Different request params (temperature, etc.)
+- Streaming enabled
+- Caching disabled in settings
+
+**Check:** `response.headers.get('cf-aig-cache-status')` → HIT or MISS
+
+## Logs Not Appearing
+
+1. Check logging enabled: Dashboard → Gateway → Settings
+2. Remove `cf-aig-collect-log: false` header
+3. Wait 30-60 seconds
+4. Check log limit (10M default)
+
+## Debugging
+
+```bash
+# Test connectivity
+curl -v https://gateway.ai.cloudflare.com/v1/{account}/{gateway}/openai/models \
+  -H "Authorization: Bearer $OPENAI_KEY" \
+  -H "cf-aig-authorization: Bearer $CF_TOKEN"
+```
+
+```typescript
+// Check response headers
+console.log('Cache:', response.headers.get('cf-aig-cache-status'));
+console.log('Request ID:', response.headers.get('cf-ray'));
+```
+
+## Analytics
+
+Dashboard → AI Gateway → Select gateway
+
+**Metrics:** Requests, tokens, latency (p50/p95/p99), cache hit rate, costs
+
+**Log filters:** `status: error`, `provider: openai`, `cost > 0.01`, `duration > 1000`
+
+**Export:** Logpush to S3/GCS/Datadog/Splunk