update skills

This commit is contained in:
2026-03-17 16:53:22 -07:00
parent 0b0783ef8e
commit f9a530667e
389 changed files with 54512 additions and 1 deletions

View File

@@ -0,0 +1,92 @@
# Cloudflare Workers Analytics Engine Reference
Expert guidance for implementing unlimited-cardinality analytics at scale using Cloudflare Workers Analytics Engine.
## What is Analytics Engine?
Time-series analytics database designed for high-cardinality data (millions of unique dimensions). Write data points from Workers, query via SQL API. Use for:
- Custom user-facing analytics dashboards
- Usage-based billing & metering
- Per-customer/per-feature monitoring
- High-frequency instrumentation without performance impact
**Key Capability:** Track metrics with unlimited unique values (e.g., millions of user IDs, API keys) without performance degradation.
## Core Concepts
| Concept | Description | Example |
|---------|-------------|---------|
| **Dataset** | Logical table for related metrics | `api_requests`, `user_events` |
| **Data Point** | Single measurement with timestamp | One API request's metrics |
| **Blobs** | String dimensions (max 20) | endpoint, method, status, user_id |
| **Doubles** | Numeric values (max 20) | latency_ms, request_count, bytes |
| **Indexes** | Filtered blobs for efficient queries | customer_id, api_key |
## Reading Order
| Task | Start Here | Then Read |
|------|------------|-----------|
| **First-time setup** | [configuration.md](configuration.md) → [api.md](api.md) → [patterns.md](patterns.md) | |
| **Writing data** | [api.md](api.md) → [gotchas.md](gotchas.md) (sampling) | |
| **Querying data** | [api.md](api.md) (SQL API) → [patterns.md](patterns.md) (examples) | |
| **Debugging** | [gotchas.md](gotchas.md) → [api.md](api.md) (limits) | |
| **Optimization** | [patterns.md](patterns.md) (anti-patterns) → [gotchas.md](gotchas.md) | |
## When to Use Analytics Engine
```
Need to track metrics? → Yes
Millions of unique dimension values? → Yes
Need real-time queries? → Yes
Use Analytics Engine ✓
Alternative scenarios:
- Low cardinality (<10k unique values) → Workers Analytics (free tier)
- Complex joins/relations → D1 Database
- Logs/debugging → Tail Workers (logpush)
- External tools → Send to external analytics (Datadog, etc.)
```
## Quick Start
1. Add binding to `wrangler.jsonc`:
```jsonc
{
"analytics_engine_datasets": [
{ "binding": "ANALYTICS", "dataset": "my_events" }
]
}
```
2. Write data points (fire-and-forget, no await):
```typescript
env.ANALYTICS.writeDataPoint({
blobs: ["/api/users", "GET", "200"],
doubles: [145.2, 1], // latency_ms, count
indexes: [customerId]
});
```
3. Query via SQL API (HTTP):
```sql
SELECT blob1, SUM(double2) AS total_requests
FROM my_events
WHERE index1 = 'customer_123'
AND timestamp >= NOW() - INTERVAL '7' DAY
GROUP BY blob1
ORDER BY total_requests DESC
```
## In This Reference
- **[configuration.md](configuration.md)** - Setup, bindings, TypeScript types, limits
- **[api.md](api.md)** - `writeDataPoint()`, SQL API, query syntax
- **[patterns.md](patterns.md)** - Use cases, examples, anti-patterns
- **[gotchas.md](gotchas.md)** - Sampling, index selection, troubleshooting
## See Also
- [Cloudflare Analytics Engine Docs](https://developers.cloudflare.com/analytics/analytics-engine/)

View File

@@ -0,0 +1,112 @@
# Analytics Engine API Reference
## Writing Data
### `writeDataPoint()`
Fire-and-forget (returns `void`, not Promise). Writes happen asynchronously.
```typescript
interface AnalyticsEngineDataPoint {
blobs?: string[]; // Up to 20 strings (dimensions), 16KB each
doubles?: number[]; // Up to 20 numbers (metrics)
indexes?: string[]; // 1 indexed string for high-cardinality filtering
}
env.ANALYTICS.writeDataPoint({
blobs: ["/api/users", "GET", "200"],
doubles: [145.2, 1], // latency_ms, count
indexes: ["customer_abc123"]
});
```
**Behaviors:** No await needed, no error thrown (check tail logs), auto-sampled at high volumes, auto-timestamped.
**Blob vs Index:** Blob for GROUP BY (<100k unique), Index for filter-only (millions unique).
### Full Example
```typescript
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const start = Date.now();
const url = new URL(request.url);
try {
const response = await handleRequest(request);
env.ANALYTICS.writeDataPoint({
blobs: [url.pathname, request.method, response.status.toString()],
doubles: [Date.now() - start, 1],
indexes: [request.headers.get("x-api-key") || "anonymous"]
});
return response;
} catch (error) {
env.ANALYTICS.writeDataPoint({
blobs: [url.pathname, request.method, "500"],
doubles: [Date.now() - start, 1, 0],
});
throw error;
}
}
};
```
## SQL API (External Only)
```bash
curl -X POST https://api.cloudflare.com/client/v4/accounts/{account_id}/analytics_engine/sql \
-H "Authorization: Bearer $TOKEN" \
-d "SELECT blob1 AS endpoint, COUNT(*) AS requests FROM dataset WHERE timestamp >= NOW() - INTERVAL '1' HOUR GROUP BY blob1"
```
### Column References
```sql
-- blob1..blob20, double1..double20, index1, timestamp
SELECT blob1 AS endpoint, SUM(double1) AS latency, COUNT(*) AS requests
FROM my_dataset
WHERE index1 = 'customer_123' AND timestamp >= NOW() - INTERVAL '7' DAY
GROUP BY blob1
HAVING COUNT(*) > 100
ORDER BY requests DESC LIMIT 100
```
**Aggregations:** `SUM()`, `AVG()`, `COUNT()`, `MIN()`, `MAX()`, `quantile(0.95)()`
**Time ranges:** `NOW() - INTERVAL '1' HOUR`, `BETWEEN '2026-01-01' AND '2026-01-31'`
### Query Examples
```sql
-- Top endpoints
SELECT blob1, COUNT(*) AS requests, AVG(double1) AS avg_latency
FROM api_requests WHERE timestamp >= NOW() - INTERVAL '24' HOUR
GROUP BY blob1 ORDER BY requests DESC LIMIT 20
-- Error rate
SELECT blob1, COUNT(*) AS total,
SUM(CASE WHEN blob3 LIKE '5%' THEN 1 ELSE 0 END) AS errors
FROM api_requests WHERE timestamp >= NOW() - INTERVAL '1' HOUR
GROUP BY blob1 HAVING total > 50
-- P95 latency
SELECT blob1, quantile(0.95)(double1) AS p95
FROM api_requests GROUP BY blob1
```
## Response Format
```json
{"data": [{"endpoint": "/api/users", "requests": 1523}], "rows": 2}
```
## Limits
| Resource | Limit |
|----------|-------|
| Blobs/Doubles per point | 20 each |
| Indexes per point | 1 |
| Blob/Index size | 16KB |
| Data retention | 90 days |
| Query timeout | 30s |
**Critical:** High write volumes (>1M/min) trigger automatic sampling.

View File

@@ -0,0 +1,112 @@
# Analytics Engine Configuration
## Setup
1. Add binding to `wrangler.jsonc`
2. Deploy Worker
3. Dataset created automatically on first write
4. Query via SQL API
## wrangler.jsonc
```jsonc
{
"name": "my-worker",
"analytics_engine_datasets": [
{ "binding": "ANALYTICS", "dataset": "my_events" }
]
}
```
Multiple datasets for separate concerns:
```jsonc
{
"analytics_engine_datasets": [
{ "binding": "API_ANALYTICS", "dataset": "api_requests" },
{ "binding": "USER_EVENTS", "dataset": "user_activity" }
]
}
```
## TypeScript
```typescript
interface Env {
ANALYTICS: AnalyticsEngineDataset;
}
export default {
async fetch(request: Request, env: Env) {
// No await - returns void, fire-and-forget
env.ANALYTICS.writeDataPoint({
blobs: [pathname, method, status], // String dimensions (max 20)
doubles: [latency, 1], // Numeric metrics (max 20)
indexes: [apiKey] // High-cardinality filter (max 1)
});
return response;
}
};
```
## Data Point Limits
| Field | Limit | SQL Access |
|-------|-------|------------|
| blobs | 20 strings, 16KB each | `blob1`...`blob20` |
| doubles | 20 numbers | `double1`...`double20` |
| indexes | 1 string, 16KB | `index1` |
## Write Behavior
| Scenario | Behavior |
|----------|----------|
| <1M writes/min | All accepted |
| >1M writes/min | Automatic sampling |
| Invalid data | Silent failure (check tail logs) |
**Mitigate sampling:** Pre-aggregate, use multiple datasets, write only critical metrics.
## Query Limits
| Resource | Limit |
|----------|-------|
| Query timeout | 30 seconds |
| Data retention | 90 days (default) |
| Result size | ~10MB |
## Cost
**Free tier:** 10M writes/month, 1M reads/month
**Paid:** $0.05 per 1M writes, $1.00 per 1M reads
## Environment-Specific
```jsonc
{
"analytics_engine_datasets": [
{ "binding": "ANALYTICS", "dataset": "prod_events" }
],
"env": {
"staging": {
"analytics_engine_datasets": [
{ "binding": "ANALYTICS", "dataset": "staging_events" }
]
}
}
}
```
## Monitoring
```bash
npx wrangler tail # Check for sampling/write errors
```
```sql
-- Check write activity
SELECT DATE_TRUNC('hour', timestamp) AS hour, COUNT(*) AS writes
FROM my_dataset
WHERE timestamp >= NOW() - INTERVAL '24' HOUR
GROUP BY hour
```

View File

@@ -0,0 +1,85 @@
# Analytics Engine Gotchas
## Critical Issues
### Sampling at High Volumes
**Problem:** Queries return fewer points than written at >1M writes/min.
**Solution:**
```typescript
// Pre-aggregate before writing
let buffer = { count: 0, total: 0 };
buffer.count++; buffer.total += value;
// Write once per second instead of per request
if (Date.now() % 1000 === 0) {
env.ANALYTICS.writeDataPoint({ doubles: [buffer.count, buffer.total] });
}
```
**Detection:** `npx wrangler tail` → look for "sampling enabled"
### writeDataPoint Returns void
```typescript
// ❌ Pointless await
await env.ANALYTICS.writeDataPoint({...});
// ✅ Fire-and-forget
env.ANALYTICS.writeDataPoint({...});
```
Writes can fail silently. Check tail logs.
### Index vs Blob
| Cardinality | Use | Example |
|-------------|-----|---------|
| Millions | **Index** | user_id, api_key |
| Hundreds | **Blob** | endpoint, status_code, country |
```typescript
// ✅ Correct
{ blobs: [method, path, status], indexes: [userId] }
```
### Can't Query from Workers
Query API requires HTTP auth. Use external service or cache in KV/D1.
### No Custom Timestamps
Auto-generated at write time. Store original in blob if needed.
## Common Errors
| Error | Fix |
|-------|-----|
| Binding not found | Check wrangler.jsonc, redeploy |
| No data in query | Wait 30s; check dataset name; check time range |
| Query timeout | Add time filter; use index for filtering |
## Limits
| Resource | Limit |
|----------|-------|
| Blobs per point | 20 |
| Doubles per point | 20 |
| Indexes per point | 1 |
| Blob/Index size | 16KB |
| Write rate (no sampling) | ~1M/min |
| Retention | 90 days |
| Query timeout | 30s |
## Best Practices
✅ Pre-aggregate at high volumes
✅ Use index for high-cardinality (millions)
✅ Always include time filter in queries
✅ Design schema before coding
❌ Don't await writeDataPoint
❌ Don't use index for low-cardinality
❌ Don't query without time range
❌ Don't assume all writes succeed

View File

@@ -0,0 +1,83 @@
# Analytics Engine Patterns
## Use Cases
| Use Case | Key Metrics | Index On |
|----------|-------------|----------|
| API Metering | requests, bytes, compute_units | api_key |
| Feature Usage | feature, action, duration | user_id |
| Error Tracking | error_type, endpoint, count | customer_id |
| Performance | latency_ms, cache_status | endpoint |
| A/B Testing | variant, conversions | user_id |
## API Metering (Billing)
```typescript
env.ANALYTICS.writeDataPoint({
blobs: [pathname, method, status, tier],
doubles: [1, computeUnits, bytes, latencyMs],
indexes: [apiKey]
});
// Query: Monthly usage by customer
// SELECT index1 AS api_key, SUM(double2) AS compute_units
// FROM usage WHERE timestamp >= DATE_TRUNC('month', NOW()) GROUP BY index1
```
## Error Tracking
```typescript
env.ANALYTICS.writeDataPoint({
blobs: [endpoint, method, errorName, errorMessage.slice(0, 1000)],
doubles: [1, timeToErrorMs],
indexes: [customerId]
});
```
## Performance Monitoring
```typescript
env.ANALYTICS.writeDataPoint({
blobs: [pathname, method, cacheStatus, status],
doubles: [latencyMs, 1],
indexes: [userId]
});
// Query: P95 latency by endpoint
// SELECT blob1, quantile(0.95)(double1) AS p95_ms FROM perf GROUP BY blob1
```
## Anti-Patterns
| ❌ Wrong | ✅ Correct |
|----------|-----------|
| `await writeDataPoint()` | `writeDataPoint()` (fire-and-forget) |
| `indexes: [method]` (low cardinality) | `blobs: [method]`, `indexes: [userId]` |
| `blobs: [JSON.stringify(obj)]` | Store ID in blob, full object in D1/KV |
| Write every request at 10M/min | Pre-aggregate per second |
| Query from Worker | Query from external service/API |
## Best Practices
1. **Design schema upfront** - Document blob/double/index assignments
2. **Always include count metric** - `doubles: [latency, 1]` for AVG calculations
3. **Use enums for blobs** - Consistent values like `Status.SUCCESS`
4. **Handle sampling** - Use ratios (avg_latency = SUM(latency)/SUM(count))
5. **Test queries early** - Validate schema before heavy writes
## Schema Template
```typescript
/**
* Dataset: my_metrics
*
* Blobs:
* blob1: endpoint, blob2: method, blob3: status
*
* Doubles:
* double1: latency_ms, double2: count (always 1)
*
* Indexes:
* index1: customer_id (high cardinality)
*/
```