update skills

2026-07-03 21:13:31 -07:00 · 2026-03-17 16:53:22 -07:00
parent 0b0783ef8e
commit f9a530667e
389 changed files with 54512 additions and 1 deletions
@@ -0,0 +1,87 @@
+# Cloudflare Observability Skill Reference
+
+**Purpose**: Comprehensive guidance for implementing observability in Cloudflare Workers, covering traces, logs, metrics, and analytics.
+
+**Scope**: Cloudflare Observability features ONLY - Workers Logs, Traces, Analytics Engine, Logpush, Metrics & Analytics, and OpenTelemetry exports.
+
+---
+
+## Decision Tree: Which File to Load?
+
+Use this to route to the correct file without loading all content:
+
+```
+├─ "How do I enable/configure X?"           → configuration.md
+├─ "What's the API/method/binding for X?"   → api.md
+├─ "How do I implement X pattern?"          → patterns.md
+│   ├─ Usage tracking/billing               → patterns.md
+│   ├─ Error tracking                       → patterns.md
+│   ├─ Performance monitoring               → patterns.md
+│   ├─ Multi-tenant tracking                → patterns.md
+│   ├─ Tail Worker filtering                → patterns.md
+│   └─ OpenTelemetry export                 → patterns.md
+└─ "Why isn't X working?" / "Limits?"       → gotchas.md
+```
+
+## Reading Order
+
+Load files in this order based on task:
+
+| Task Type | Load Order | Reason |
+|-----------|------------|--------|
+| **Initial setup** | configuration.md → gotchas.md | Setup first, avoid pitfalls |
+| **Implement feature** | patterns.md → api.md → gotchas.md | Pattern → API details → edge cases |
+| **Debug issue** | gotchas.md → configuration.md | Common issues first |
+| **Query data** | api.md → patterns.md | API syntax → query examples |
+
+## Product Overview
+
+### Workers Logs
+- **What:** Console output from Workers (console.log/warn/error)
+- **Access:** Dashboard (Real-time Logs), Logpush, Tail Workers
+- **Cost:** Free (included with all Workers)
+- **Retention:** Real-time only (no historical storage in dashboard)
+
+### Workers Traces
+- **What:** Execution traces with timing, CPU usage, outcome
+- **Access:** Dashboard (Workers Analytics → Traces), Logpush
+- **Cost:** $0.10/1M spans (GA pricing starts March 1, 2026), 10M free/month
+- **Retention:** 14 days included
+
+### Analytics Engine
+- **What:** High-cardinality event storage and SQL queries
+- **Access:** SQL API, Dashboard (Analytics → Analytics Engine)
+- **Cost:** $0.25/1M writes beyond 10M free/month
+- **Retention:** 90 days (configurable up to 1 year)
+
+### Tail Workers
+- **What:** Workers that receive logs/traces from other Workers
+- **Use Cases:** Log filtering, transformation, external export
+- **Cost:** Standard Workers pricing
+
+### Logpush
+- **What:** Stream logs to external storage (S3, R2, Datadog, etc.)
+- **Access:** Dashboard, API
+- **Cost:** Requires Business/Enterprise plan
+
+## Pricing Summary (2026)
+
+| Feature | Free Tier | Cost Beyond Free Tier | Plan Requirement |
+|---------|-----------|----------------------|------------------|
+| Workers Logs | Unlimited | Free | Any |
+| Workers Traces | 10M spans/month | $0.10/1M spans | Paid Workers (GA: March 1, 2026) |
+| Analytics Engine | 10M writes/month | $0.25/1M writes | Paid Workers |
+| Logpush | N/A | Included in plan | Business/Enterprise |
+
+## In This Reference
+
+- **[configuration.md](configuration.md)** - Setup, deployment, configuration (Logs, Traces, Analytics Engine, Tail Workers, Logpush)
+- **[api.md](api.md)** - API endpoints, methods, interfaces (GraphQL, SQL, bindings, types)
+- **[patterns.md](patterns.md)** - Common patterns, use cases, examples (billing, monitoring, error tracking, exports)
+- **[gotchas.md](gotchas.md)** - Troubleshooting, best practices, limitations (common errors, performance gotchas, pricing)
+
+## See Also
+
+- [Cloudflare Workers Docs](https://developers.cloudflare.com/workers/)
+- [Analytics Engine Docs](https://developers.cloudflare.com/analytics/analytics-engine/)
+- [Workers Traces Docs](https://developers.cloudflare.com/workers/observability/traces/)
@@ -0,0 +1,164 @@
+## API Reference
+
+### GraphQL Analytics API
+
+**Endpoint**: `https://api.cloudflare.com/client/v4/graphql`
+
+**Query Workers Metrics**:
+```graphql
+query {
+  viewer {
+    accounts(filter: { accountTag: $accountId }) {
+      workersInvocationsAdaptive(
+        limit: 100
+        filter: {
+          datetime_geq: "2025-01-01T00:00:00Z"
+          datetime_leq: "2025-01-31T23:59:59Z"
+          scriptName: "my-worker"
+        }
+      ) {
+        sum {
+          requests
+          errors
+          subrequests
+        }
+        quantiles {
+          cpuTimeP50
+          cpuTimeP99
+          wallTimeP50
+          wallTimeP99
+        }
+      }
+    }
+  }
+}
+```
+
+### Analytics Engine SQL API
+
+**Endpoint**: `https://api.cloudflare.com/client/v4/accounts/{account_id}/analytics_engine/sql`
+
+**Authentication**: `Authorization: Bearer <API_TOKEN>` (Account Analytics Read permission)
+
+**Common Queries**:
+
+```sql
+-- List all datasets
+SHOW TABLES;
+
+-- Time-series aggregation (5-minute buckets)
+SELECT
+  intDiv(toUInt32(timestamp), 300) * 300 AS time_bucket,
+  blob1 AS endpoint,
+  SUM(_sample_interval) AS total_requests,
+  AVG(double1) AS avg_response_time_ms
+FROM api_metrics
+WHERE timestamp >= NOW() - INTERVAL '24' HOUR
+GROUP BY time_bucket, endpoint
+ORDER BY time_bucket DESC;
+
+-- Top customers by usage
+SELECT
+  index1 AS customer_id,
+  SUM(_sample_interval * double1) AS total_api_calls,
+  AVG(double2) AS avg_response_time_ms
+FROM api_usage
+WHERE timestamp >= NOW() - INTERVAL '7' DAY
+GROUP BY customer_id
+ORDER BY total_api_calls DESC
+LIMIT 100;
+
+-- Error rate analysis
+SELECT
+  blob1 AS error_type,
+  COUNT(*) AS occurrences,
+  MAX(timestamp) AS last_seen
+FROM error_tracking
+WHERE timestamp >= NOW() - INTERVAL '1' HOUR
+GROUP BY error_type
+ORDER BY occurrences DESC;
+```
+
+### Console Logging API
+
+**Methods**:
+```typescript
+// Standard methods (all appear in Workers Logs)
+console.log('info message');
+console.info('info message');
+console.warn('warning message');
+console.error('error message');
+console.debug('debug message');
+
+// Structured logging (recommended)
+console.log({
+  level: 'info',
+  user_id: '123',
+  action: 'checkout',
+  amount: 99.99,
+  currency: 'USD'
+});
+```
+
+**Log Levels**: All console methods produce logs; use structured fields for filtering:
+```typescript
+console.log({ 
+  level: 'error', 
+  message: 'Payment failed', 
+  error_code: 'CARD_DECLINED' 
+});
+```
+
+### Analytics Engine Binding Types
+
+```typescript
+interface AnalyticsEngineDataset {
+  writeDataPoint(event: AnalyticsEngineDataPoint): void;
+}
+
+interface AnalyticsEngineDataPoint {
+  // Indexed strings (use for filtering/grouping)
+  indexes?: string[];
+  
+  // Non-indexed strings (metadata, IDs, URLs)
+  blobs?: string[];
+  
+  // Numeric values (counts, durations, amounts)
+  doubles?: number[];
+}
+```
+
+**Field Limits**:
+- Max 20 indexes
+- Max 20 blobs
+- Max 20 doubles
+- Max 25 `writeDataPoint` calls per request
+
+### Tail Consumer Event Type
+
+```typescript
+interface TraceItem {
+  event: TraceEvent;
+  logs: TraceLog[];
+  exceptions: TraceException[];
+  scriptName?: string;
+}
+
+interface TraceEvent {
+  outcome: 'ok' | 'exception' | 'exceededCpu' | 'exceededMemory' | 'unknown';
+  cpuTime: number; // microseconds
+  wallTime: number; // microseconds
+}
+
+interface TraceLog {
+  timestamp: number;
+  level: 'log' | 'info' | 'debug' | 'warn' | 'error';
+  message: any; // string or structured object
+}
+
+interface TraceException {
+  name: string;
+  message: string;
+  timestamp: number;
+}
+```
@@ -0,0 +1,169 @@
+## Configuration Patterns
+
+### Enable Workers Logs
+
+```jsonc
+{
+  "observability": {
+    "enabled": true,
+    "head_sampling_rate": 1  // 100% sampling (default)
+  }
+}
+```
+
+**Best Practice**: Use structured JSON logging for better indexing
+
+```typescript
+// Good - structured logging
+console.log({ 
+  user_id: 123, 
+  action: "login", 
+  status: "success",
+  duration_ms: 45
+});
+
+// Avoid - unstructured string
+console.log("user_id: 123 logged in successfully in 45ms");
+```
+
+### Enable Workers Traces
+
+```jsonc
+{
+  "observability": {
+    "traces": {
+      "enabled": true,
+      "head_sampling_rate": 0.05  // 5% sampling
+    }
+  }
+}
+```
+
+**Note**: Default sampling is 100%. For high-traffic Workers, use lower sampling (0.01-0.1).
+
+### Configure Analytics Engine
+
+**Bind to Worker**:
+```toml
+# wrangler.toml
+analytics_engine_datasets = [
+  { binding = "ANALYTICS", dataset = "api_metrics" }
+]
+```
+
+**Write Data Points**:
+```typescript
+export interface Env {
+  ANALYTICS: AnalyticsEngineDataset;
+}
+
+export default {
+  async fetch(request: Request, env: Env): Promise<Response> {
+    // Track metrics
+    env.ANALYTICS.writeDataPoint({
+      blobs: ['customer_123', 'POST', '/api/v1/users'],
+      doubles: [1, 245.5], // request_count, response_time_ms
+      indexes: ['customer_123'] // for efficient filtering
+    });
+    
+    return new Response('OK');
+  }
+}
+```
+
+### Configure Tail Workers
+
+Tail Workers receive logs/traces from other Workers for filtering, transformation, or export.
+
+**Setup**:
+```toml
+# wrangler.toml
+name = "log-processor"
+main = "src/tail.ts"
+
+[[tail_consumers]]
+service = "my-worker" # Worker to tail
+```
+
+**Tail Worker Example**:
+```typescript
+export default {
+  async tail(events: TraceItem[], env: Env, ctx: ExecutionContext) {
+    // Filter errors only
+    const errors = events.filter(event => 
+      event.outcome === 'exception' || event.outcome === 'exceededCpu'
+    );
+    
+    if (errors.length > 0) {
+      // Send to external monitoring
+      ctx.waitUntil(
+        fetch('https://monitoring.example.com/errors', {
+          method: 'POST',
+          body: JSON.stringify(errors)
+        })
+      );
+    }
+  }
+}
+```
+
+### Configure Logpush
+
+Send logs to external storage (S3, R2, GCS, Azure, Datadog, etc.). Requires Business/Enterprise plan.
+
+**Via Dashboard**:
+1. Navigate to Analytics → Logs → Logpush
+2. Select destination type
+3. Provide credentials and bucket/endpoint
+4. Choose dataset (e.g., Workers Trace Events)
+5. Configure filters and fields
+
+**Via API**:
+```bash
+curl -X POST "https://api.cloudflare.com/client/v4/accounts/{account_id}/logpush/jobs" \
+  -H "Authorization: Bearer <API_TOKEN>" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "name": "workers-logs-to-s3",
+    "destination_conf": "s3://my-bucket/logs?region=us-east-1",
+    "dataset": "workers_trace_events",
+    "enabled": true,
+    "frequency": "high",
+    "filter": "{\"where\":{\"and\":[{\"key\":\"ScriptName\",\"operator\":\"eq\",\"value\":\"my-worker\"}]}}"
+  }'
+```
+
+### Environment-Specific Configuration
+
+**Development** (verbose logs, full sampling):
+```jsonc
+// wrangler.dev.jsonc
+{
+  "observability": {
+    "enabled": true,
+    "head_sampling_rate": 1.0,
+    "traces": {
+      "enabled": true
+    }
+  }
+}
+```
+
+**Production** (reduced sampling, structured logs):
+```jsonc
+// wrangler.prod.jsonc
+{
+  "observability": {
+    "enabled": true,
+    "head_sampling_rate": 0.1, // 10% sampling
+    "traces": {
+      "enabled": true
+    }
+  }
+}
+```
+
+Deploy with env-specific config:
+```bash
+wrangler deploy --config wrangler.prod.jsonc --env production
+```
@@ -0,0 +1,115 @@
+## Common Errors
+
+### "Logs not appearing"
+
+**Cause:** Observability disabled, Worker not redeployed, no traffic, low sampling rate, or log size exceeds 256 KB
+**Solution:** 
+```bash
+# Verify config
+cat wrangler.jsonc | jq '.observability'
+
+# Check deployment
+wrangler deployments list <WORKER_NAME>
+
+# Test with curl
+curl https://your-worker.workers.dev
+```
+Ensure `observability.enabled = true`, redeploy Worker, check `head_sampling_rate`, verify traffic
+
+### "Traces not being captured"
+
+**Cause:** Traces not enabled, incorrect sampling rate, Worker not redeployed, or destination unavailable
+**Solution:**
+```jsonc
+// Temporarily set to 100% sampling for debugging
+{
+  "observability": {
+    "enabled": true,
+    "head_sampling_rate": 1.0,
+    "traces": {
+      "enabled": true
+    }
+  }
+}
+```
+Ensure `observability.traces.enabled = true`, set `head_sampling_rate` to 1.0 for testing, redeploy, check destination status
+
+## Limits
+
+| Resource/Limit | Value | Notes |
+|----------------|-------|-------|
+| Max log size | 256 KB | Logs exceeding this are truncated |
+| Default sampling rate | 1.0 (100%) | Reduce for high-traffic Workers |
+| Max destinations | Varies by plan | Check dashboard |
+| Trace context propagation | 100 spans max | Deep call chains may lose spans |
+| Analytics Engine write rate | 25 writes/request | Excess writes dropped silently |
+
+## Performance Gotchas
+
+### Spectre Mitigation Timing
+
+**Problem:** `Date.now()` and `performance.now()` have reduced precision (coarsened to 100μs)
+**Cause:** Spectre vulnerability mitigation in V8
+**Solution:** Accept reduced precision or use Workers Traces for accurate timing
+```typescript
+// Date.now() is coarsened - trace spans are accurate
+export default {
+  async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
+    // For user-facing timing, Date.now() is fine
+    const start = Date.now();
+    const response = await processRequest(request);
+    const duration = Date.now() - start;
+    
+    // For detailed performance analysis, use Workers Traces instead
+    return response;
+  }
+}
+```
+
+### Analytics Engine _sample_interval Aggregation
+
+**Problem:** Queries return incorrect totals when not multiplying by `_sample_interval`
+**Cause:** Analytics Engine stores sampled data points, each representing multiple events
+**Solution:** Always multiply counts/sums by `_sample_interval` in aggregations
+```sql
+-- WRONG: Undercounts actual events
+SELECT blob1 AS customer_id, COUNT(*) AS total_calls
+FROM api_usage GROUP BY customer_id;
+
+-- CORRECT: Accounts for sampling
+SELECT blob1 AS customer_id, SUM(_sample_interval) AS total_calls
+FROM api_usage GROUP BY customer_id;
+```
+
+### Trace Context Propagation Limits
+
+**Problem:** Deep call chains lose trace context after 100 spans
+**Cause:** Cloudflare limits trace depth to prevent performance impact
+**Solution:** Design for flatter architectures or use custom correlation IDs for deep chains
+```typescript
+// For deep call chains, add custom correlation ID
+const correlationId = crypto.randomUUID();
+console.log({ correlationId, event: 'request_start' });
+
+// Pass correlationId through headers to downstream services
+await fetch('https://api.example.com', {
+  headers: { 'X-Correlation-ID': correlationId }
+});
+```
+
+## Pricing (2026)
+
+### Workers Traces
+- **GA Pricing (starts March 1, 2026):**
+  - $0.10 per 1M trace spans captured
+  - Retention: 14 days included
+- **Free tier:** 10M trace spans/month
+- **Note:** Beta usage (before March 1, 2026) is free
+
+### Workers Logs
+- **Included:** Free for all Workers
+- **Logpush:** Requires Business/Enterprise plan
+
+### Analytics Engine
+- **Included:** 10M writes/month on Paid Workers plan
+- **Additional:** $0.25 per 1M writes beyond included quota
@@ -0,0 +1,105 @@
+# Observability Patterns
+
+## Usage-Based Billing
+
+```typescript
+env.ANALYTICS.writeDataPoint({
+  blobs: [customerId, request.url, request.method],
+  doubles: [1], // request_count
+  indexes: [customerId]
+});
+```
+
+```sql
+SELECT blob1 AS customer_id, SUM(_sample_interval * double1) AS total_calls
+FROM api_usage WHERE timestamp >= DATE_TRUNC('month', NOW())
+GROUP BY customer_id
+```
+
+## Performance Monitoring
+
+```typescript
+const start = Date.now();
+const response = await fetch(url);
+env.ANALYTICS.writeDataPoint({
+  blobs: [url, response.status.toString()],
+  doubles: [Date.now() - start, response.status]
+});
+```
+
+```sql
+SELECT blob1 AS url, AVG(double1) AS avg_ms, percentile(double1, 0.95) AS p95_ms
+FROM fetch_metrics WHERE timestamp >= NOW() - INTERVAL '1' HOUR
+GROUP BY url
+```
+
+## Error Tracking
+
+```typescript
+env.ANALYTICS.writeDataPoint({
+  blobs: [error.name, request.url, request.method],
+  doubles: [1],
+  indexes: [error.name]
+});
+```
+
+## Multi-Tenant Tracking
+
+```typescript
+env.ANALYTICS.writeDataPoint({
+  indexes: [tenantId], // efficient filtering
+  blobs: [tenantId, url.pathname, method, status],
+  doubles: [1, duration, bytesSize]
+});
+```
+
+## Tail Worker Log Filtering
+
+```typescript
+export default {
+  async tail(events, env, ctx) {
+    const critical = events.filter(e => 
+      e.exceptions.length > 0 || e.event.wallTime > 1000000
+    );
+    if (critical.length === 0) return;
+    
+    ctx.waitUntil(
+      fetch('https://logging.example.com/ingest', {
+        method: 'POST',
+        headers: { 'Authorization': `Bearer ${env.API_KEY}` },
+        body: JSON.stringify(critical.map(e => ({
+          outcome: e.event.outcome,
+          cpu_ms: e.event.cpuTime / 1000,
+          errors: e.exceptions
+        })))
+      })
+    );
+  }
+};
+```
+
+## OpenTelemetry Export
+
+```typescript
+export default {
+  async tail(events, env, ctx) {
+    const otelSpans = events.map(e => ({
+      traceId: generateId(32),
+      spanId: generateId(16),
+      name: e.scriptName || 'worker.request',
+      attributes: [
+        { key: 'worker.outcome', value: { stringValue: e.event.outcome } },
+        { key: 'worker.cpu_time_us', value: { intValue: String(e.event.cpuTime) } }
+      ]
+    }));
+    
+    ctx.waitUntil(
+      fetch('https://api.honeycomb.io/v1/traces', {
+        method: 'POST',
+        headers: { 'X-Honeycomb-Team': env.HONEYCOMB_KEY },
+        body: JSON.stringify({ resourceSpans: [{ scopeSpans: [{ spans: otelSpans }] }] })
+      })
+    );
+  }
+};
+```