mirror of
https://github.com/ksyasuda/dotfiles.git
synced 2026-03-20 18:11:27 -07:00
update skills
This commit is contained in:
@@ -0,0 +1,92 @@
|
||||
# Cloudflare Workers Analytics Engine Reference
|
||||
|
||||
Expert guidance for implementing unlimited-cardinality analytics at scale using Cloudflare Workers Analytics Engine.
|
||||
|
||||
## What is Analytics Engine?
|
||||
|
||||
Time-series analytics database designed for high-cardinality data (millions of unique dimensions). Write data points from Workers, query via SQL API. Use for:
|
||||
- Custom user-facing analytics dashboards
|
||||
- Usage-based billing & metering
|
||||
- Per-customer/per-feature monitoring
|
||||
- High-frequency instrumentation without performance impact
|
||||
|
||||
**Key Capability:** Track metrics with unlimited unique values (e.g., millions of user IDs, API keys) without performance degradation.
|
||||
|
||||
## Core Concepts
|
||||
|
||||
| Concept | Description | Example |
|
||||
|---------|-------------|---------|
|
||||
| **Dataset** | Logical table for related metrics | `api_requests`, `user_events` |
|
||||
| **Data Point** | Single measurement with timestamp | One API request's metrics |
|
||||
| **Blobs** | String dimensions (max 20) | endpoint, method, status, user_id |
|
||||
| **Doubles** | Numeric values (max 20) | latency_ms, request_count, bytes |
|
||||
| **Indexes** | Filtered blobs for efficient queries | customer_id, api_key |
|
||||
|
||||
## Reading Order
|
||||
|
||||
| Task | Start Here | Then Read |
|
||||
|------|------------|-----------|
|
||||
| **First-time setup** | [configuration.md](configuration.md) → [api.md](api.md) → [patterns.md](patterns.md) | |
|
||||
| **Writing data** | [api.md](api.md) → [gotchas.md](gotchas.md) (sampling) | |
|
||||
| **Querying data** | [api.md](api.md) (SQL API) → [patterns.md](patterns.md) (examples) | |
|
||||
| **Debugging** | [gotchas.md](gotchas.md) → [api.md](api.md) (limits) | |
|
||||
| **Optimization** | [patterns.md](patterns.md) (anti-patterns) → [gotchas.md](gotchas.md) | |
|
||||
|
||||
## When to Use Analytics Engine
|
||||
|
||||
```
|
||||
Need to track metrics? → Yes
|
||||
↓
|
||||
Millions of unique dimension values? → Yes
|
||||
↓
|
||||
Need real-time queries? → Yes
|
||||
↓
|
||||
Use Analytics Engine ✓
|
||||
|
||||
Alternative scenarios:
|
||||
- Low cardinality (<10k unique values) → Workers Analytics (free tier)
|
||||
- Complex joins/relations → D1 Database
|
||||
- Logs/debugging → Tail Workers (logpush)
|
||||
- External tools → Send to external analytics (Datadog, etc.)
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
1. Add binding to `wrangler.jsonc`:
|
||||
```jsonc
|
||||
{
|
||||
"analytics_engine_datasets": [
|
||||
{ "binding": "ANALYTICS", "dataset": "my_events" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
2. Write data points (fire-and-forget, no await):
|
||||
```typescript
|
||||
env.ANALYTICS.writeDataPoint({
|
||||
blobs: ["/api/users", "GET", "200"],
|
||||
doubles: [145.2, 1], // latency_ms, count
|
||||
indexes: [customerId]
|
||||
});
|
||||
```
|
||||
|
||||
3. Query via SQL API (HTTP):
|
||||
```sql
|
||||
SELECT blob1, SUM(double2) AS total_requests
|
||||
FROM my_events
|
||||
WHERE index1 = 'customer_123'
|
||||
AND timestamp >= NOW() - INTERVAL '7' DAY
|
||||
GROUP BY blob1
|
||||
ORDER BY total_requests DESC
|
||||
```
|
||||
|
||||
## In This Reference
|
||||
|
||||
- **[configuration.md](configuration.md)** - Setup, bindings, TypeScript types, limits
|
||||
- **[api.md](api.md)** - `writeDataPoint()`, SQL API, query syntax
|
||||
- **[patterns.md](patterns.md)** - Use cases, examples, anti-patterns
|
||||
- **[gotchas.md](gotchas.md)** - Sampling, index selection, troubleshooting
|
||||
|
||||
## See Also
|
||||
|
||||
- [Cloudflare Analytics Engine Docs](https://developers.cloudflare.com/analytics/analytics-engine/)
|
||||
@@ -0,0 +1,112 @@
|
||||
# Analytics Engine API Reference
|
||||
|
||||
## Writing Data
|
||||
|
||||
### `writeDataPoint()`
|
||||
|
||||
Fire-and-forget (returns `void`, not Promise). Writes happen asynchronously.
|
||||
|
||||
```typescript
|
||||
interface AnalyticsEngineDataPoint {
|
||||
blobs?: string[]; // Up to 20 strings (dimensions), 16KB each
|
||||
doubles?: number[]; // Up to 20 numbers (metrics)
|
||||
indexes?: string[]; // 1 indexed string for high-cardinality filtering
|
||||
}
|
||||
|
||||
env.ANALYTICS.writeDataPoint({
|
||||
blobs: ["/api/users", "GET", "200"],
|
||||
doubles: [145.2, 1], // latency_ms, count
|
||||
indexes: ["customer_abc123"]
|
||||
});
|
||||
```
|
||||
|
||||
**Behaviors:** No await needed, no error thrown (check tail logs), auto-sampled at high volumes, auto-timestamped.
|
||||
|
||||
**Blob vs Index:** Blob for GROUP BY (<100k unique), Index for filter-only (millions unique).
|
||||
|
||||
### Full Example
|
||||
|
||||
```typescript
|
||||
export default {
|
||||
async fetch(request: Request, env: Env): Promise<Response> {
|
||||
const start = Date.now();
|
||||
const url = new URL(request.url);
|
||||
try {
|
||||
const response = await handleRequest(request);
|
||||
env.ANALYTICS.writeDataPoint({
|
||||
blobs: [url.pathname, request.method, response.status.toString()],
|
||||
doubles: [Date.now() - start, 1],
|
||||
indexes: [request.headers.get("x-api-key") || "anonymous"]
|
||||
});
|
||||
return response;
|
||||
} catch (error) {
|
||||
env.ANALYTICS.writeDataPoint({
|
||||
blobs: [url.pathname, request.method, "500"],
|
||||
doubles: [Date.now() - start, 1, 0],
|
||||
});
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
## SQL API (External Only)
|
||||
|
||||
```bash
|
||||
curl -X POST https://api.cloudflare.com/client/v4/accounts/{account_id}/analytics_engine/sql \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-d "SELECT blob1 AS endpoint, COUNT(*) AS requests FROM dataset WHERE timestamp >= NOW() - INTERVAL '1' HOUR GROUP BY blob1"
|
||||
```
|
||||
|
||||
### Column References
|
||||
|
||||
```sql
|
||||
-- blob1..blob20, double1..double20, index1, timestamp
|
||||
SELECT blob1 AS endpoint, SUM(double1) AS latency, COUNT(*) AS requests
|
||||
FROM my_dataset
|
||||
WHERE index1 = 'customer_123' AND timestamp >= NOW() - INTERVAL '7' DAY
|
||||
GROUP BY blob1
|
||||
HAVING COUNT(*) > 100
|
||||
ORDER BY requests DESC LIMIT 100
|
||||
```
|
||||
|
||||
**Aggregations:** `SUM()`, `AVG()`, `COUNT()`, `MIN()`, `MAX()`, `quantile(0.95)()`
|
||||
|
||||
**Time ranges:** `NOW() - INTERVAL '1' HOUR`, `BETWEEN '2026-01-01' AND '2026-01-31'`
|
||||
|
||||
### Query Examples
|
||||
|
||||
```sql
|
||||
-- Top endpoints
|
||||
SELECT blob1, COUNT(*) AS requests, AVG(double1) AS avg_latency
|
||||
FROM api_requests WHERE timestamp >= NOW() - INTERVAL '24' HOUR
|
||||
GROUP BY blob1 ORDER BY requests DESC LIMIT 20
|
||||
|
||||
-- Error rate
|
||||
SELECT blob1, COUNT(*) AS total,
|
||||
SUM(CASE WHEN blob3 LIKE '5%' THEN 1 ELSE 0 END) AS errors
|
||||
FROM api_requests WHERE timestamp >= NOW() - INTERVAL '1' HOUR
|
||||
GROUP BY blob1 HAVING total > 50
|
||||
|
||||
-- P95 latency
|
||||
SELECT blob1, quantile(0.95)(double1) AS p95
|
||||
FROM api_requests GROUP BY blob1
|
||||
```
|
||||
|
||||
## Response Format
|
||||
|
||||
```json
|
||||
{"data": [{"endpoint": "/api/users", "requests": 1523}], "rows": 2}
|
||||
```
|
||||
|
||||
## Limits
|
||||
|
||||
| Resource | Limit |
|
||||
|----------|-------|
|
||||
| Blobs/Doubles per point | 20 each |
|
||||
| Indexes per point | 1 |
|
||||
| Blob/Index size | 16KB |
|
||||
| Data retention | 90 days |
|
||||
| Query timeout | 30s |
|
||||
|
||||
**Critical:** High write volumes (>1M/min) trigger automatic sampling.
|
||||
@@ -0,0 +1,112 @@
|
||||
# Analytics Engine Configuration
|
||||
|
||||
## Setup
|
||||
|
||||
1. Add binding to `wrangler.jsonc`
|
||||
2. Deploy Worker
|
||||
3. Dataset created automatically on first write
|
||||
4. Query via SQL API
|
||||
|
||||
## wrangler.jsonc
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"name": "my-worker",
|
||||
"analytics_engine_datasets": [
|
||||
{ "binding": "ANALYTICS", "dataset": "my_events" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Multiple datasets for separate concerns:
|
||||
```jsonc
|
||||
{
|
||||
"analytics_engine_datasets": [
|
||||
{ "binding": "API_ANALYTICS", "dataset": "api_requests" },
|
||||
{ "binding": "USER_EVENTS", "dataset": "user_activity" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## TypeScript
|
||||
|
||||
```typescript
|
||||
interface Env {
|
||||
ANALYTICS: AnalyticsEngineDataset;
|
||||
}
|
||||
|
||||
export default {
|
||||
async fetch(request: Request, env: Env) {
|
||||
// No await - returns void, fire-and-forget
|
||||
env.ANALYTICS.writeDataPoint({
|
||||
blobs: [pathname, method, status], // String dimensions (max 20)
|
||||
doubles: [latency, 1], // Numeric metrics (max 20)
|
||||
indexes: [apiKey] // High-cardinality filter (max 1)
|
||||
});
|
||||
return response;
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
## Data Point Limits
|
||||
|
||||
| Field | Limit | SQL Access |
|
||||
|-------|-------|------------|
|
||||
| blobs | 20 strings, 16KB each | `blob1`...`blob20` |
|
||||
| doubles | 20 numbers | `double1`...`double20` |
|
||||
| indexes | 1 string, 16KB | `index1` |
|
||||
|
||||
## Write Behavior
|
||||
|
||||
| Scenario | Behavior |
|
||||
|----------|----------|
|
||||
| <1M writes/min | All accepted |
|
||||
| >1M writes/min | Automatic sampling |
|
||||
| Invalid data | Silent failure (check tail logs) |
|
||||
|
||||
**Mitigate sampling:** Pre-aggregate, use multiple datasets, write only critical metrics.
|
||||
|
||||
## Query Limits
|
||||
|
||||
| Resource | Limit |
|
||||
|----------|-------|
|
||||
| Query timeout | 30 seconds |
|
||||
| Data retention | 90 days (default) |
|
||||
| Result size | ~10MB |
|
||||
|
||||
## Cost
|
||||
|
||||
**Free tier:** 10M writes/month, 1M reads/month
|
||||
|
||||
**Paid:** $0.05 per 1M writes, $1.00 per 1M reads
|
||||
|
||||
## Environment-Specific
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"analytics_engine_datasets": [
|
||||
{ "binding": "ANALYTICS", "dataset": "prod_events" }
|
||||
],
|
||||
"env": {
|
||||
"staging": {
|
||||
"analytics_engine_datasets": [
|
||||
{ "binding": "ANALYTICS", "dataset": "staging_events" }
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Monitoring
|
||||
|
||||
```bash
|
||||
npx wrangler tail # Check for sampling/write errors
|
||||
```
|
||||
|
||||
```sql
|
||||
-- Check write activity
|
||||
SELECT DATE_TRUNC('hour', timestamp) AS hour, COUNT(*) AS writes
|
||||
FROM my_dataset
|
||||
WHERE timestamp >= NOW() - INTERVAL '24' HOUR
|
||||
GROUP BY hour
|
||||
```
|
||||
@@ -0,0 +1,85 @@
|
||||
# Analytics Engine Gotchas
|
||||
|
||||
## Critical Issues
|
||||
|
||||
### Sampling at High Volumes
|
||||
|
||||
**Problem:** Queries return fewer points than written at >1M writes/min.
|
||||
|
||||
**Solution:**
|
||||
```typescript
|
||||
// Pre-aggregate before writing
|
||||
let buffer = { count: 0, total: 0 };
|
||||
buffer.count++; buffer.total += value;
|
||||
|
||||
// Write once per second instead of per request
|
||||
if (Date.now() % 1000 === 0) {
|
||||
env.ANALYTICS.writeDataPoint({ doubles: [buffer.count, buffer.total] });
|
||||
}
|
||||
```
|
||||
|
||||
**Detection:** `npx wrangler tail` → look for "sampling enabled"
|
||||
|
||||
### writeDataPoint Returns void
|
||||
|
||||
```typescript
|
||||
// ❌ Pointless await
|
||||
await env.ANALYTICS.writeDataPoint({...});
|
||||
|
||||
// ✅ Fire-and-forget
|
||||
env.ANALYTICS.writeDataPoint({...});
|
||||
```
|
||||
|
||||
Writes can fail silently. Check tail logs.
|
||||
|
||||
### Index vs Blob
|
||||
|
||||
| Cardinality | Use | Example |
|
||||
|-------------|-----|---------|
|
||||
| Millions | **Index** | user_id, api_key |
|
||||
| Hundreds | **Blob** | endpoint, status_code, country |
|
||||
|
||||
```typescript
|
||||
// ✅ Correct
|
||||
{ blobs: [method, path, status], indexes: [userId] }
|
||||
```
|
||||
|
||||
### Can't Query from Workers
|
||||
|
||||
Query API requires HTTP auth. Use external service or cache in KV/D1.
|
||||
|
||||
### No Custom Timestamps
|
||||
|
||||
Auto-generated at write time. Store original in blob if needed.
|
||||
|
||||
## Common Errors
|
||||
|
||||
| Error | Fix |
|
||||
|-------|-----|
|
||||
| Binding not found | Check wrangler.jsonc, redeploy |
|
||||
| No data in query | Wait 30s; check dataset name; check time range |
|
||||
| Query timeout | Add time filter; use index for filtering |
|
||||
|
||||
## Limits
|
||||
|
||||
| Resource | Limit |
|
||||
|----------|-------|
|
||||
| Blobs per point | 20 |
|
||||
| Doubles per point | 20 |
|
||||
| Indexes per point | 1 |
|
||||
| Blob/Index size | 16KB |
|
||||
| Write rate (no sampling) | ~1M/min |
|
||||
| Retention | 90 days |
|
||||
| Query timeout | 30s |
|
||||
|
||||
## Best Practices
|
||||
|
||||
✅ Pre-aggregate at high volumes
|
||||
✅ Use index for high-cardinality (millions)
|
||||
✅ Always include time filter in queries
|
||||
✅ Design schema before coding
|
||||
|
||||
❌ Don't await writeDataPoint
|
||||
❌ Don't use index for low-cardinality
|
||||
❌ Don't query without time range
|
||||
❌ Don't assume all writes succeed
|
||||
@@ -0,0 +1,83 @@
|
||||
# Analytics Engine Patterns
|
||||
|
||||
## Use Cases
|
||||
|
||||
| Use Case | Key Metrics | Index On |
|
||||
|----------|-------------|----------|
|
||||
| API Metering | requests, bytes, compute_units | api_key |
|
||||
| Feature Usage | feature, action, duration | user_id |
|
||||
| Error Tracking | error_type, endpoint, count | customer_id |
|
||||
| Performance | latency_ms, cache_status | endpoint |
|
||||
| A/B Testing | variant, conversions | user_id |
|
||||
|
||||
## API Metering (Billing)
|
||||
|
||||
```typescript
|
||||
env.ANALYTICS.writeDataPoint({
|
||||
blobs: [pathname, method, status, tier],
|
||||
doubles: [1, computeUnits, bytes, latencyMs],
|
||||
indexes: [apiKey]
|
||||
});
|
||||
|
||||
// Query: Monthly usage by customer
|
||||
// SELECT index1 AS api_key, SUM(double2) AS compute_units
|
||||
// FROM usage WHERE timestamp >= DATE_TRUNC('month', NOW()) GROUP BY index1
|
||||
```
|
||||
|
||||
## Error Tracking
|
||||
|
||||
```typescript
|
||||
env.ANALYTICS.writeDataPoint({
|
||||
blobs: [endpoint, method, errorName, errorMessage.slice(0, 1000)],
|
||||
doubles: [1, timeToErrorMs],
|
||||
indexes: [customerId]
|
||||
});
|
||||
```
|
||||
|
||||
## Performance Monitoring
|
||||
|
||||
```typescript
|
||||
env.ANALYTICS.writeDataPoint({
|
||||
blobs: [pathname, method, cacheStatus, status],
|
||||
doubles: [latencyMs, 1],
|
||||
indexes: [userId]
|
||||
});
|
||||
|
||||
// Query: P95 latency by endpoint
|
||||
// SELECT blob1, quantile(0.95)(double1) AS p95_ms FROM perf GROUP BY blob1
|
||||
```
|
||||
|
||||
## Anti-Patterns
|
||||
|
||||
| ❌ Wrong | ✅ Correct |
|
||||
|----------|-----------|
|
||||
| `await writeDataPoint()` | `writeDataPoint()` (fire-and-forget) |
|
||||
| `indexes: [method]` (low cardinality) | `blobs: [method]`, `indexes: [userId]` |
|
||||
| `blobs: [JSON.stringify(obj)]` | Store ID in blob, full object in D1/KV |
|
||||
| Write every request at 10M/min | Pre-aggregate per second |
|
||||
| Query from Worker | Query from external service/API |
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Design schema upfront** - Document blob/double/index assignments
|
||||
2. **Always include count metric** - `doubles: [latency, 1]` for AVG calculations
|
||||
3. **Use enums for blobs** - Consistent values like `Status.SUCCESS`
|
||||
4. **Handle sampling** - Use ratios (avg_latency = SUM(latency)/SUM(count))
|
||||
5. **Test queries early** - Validate schema before heavy writes
|
||||
|
||||
## Schema Template
|
||||
|
||||
```typescript
|
||||
/**
|
||||
* Dataset: my_metrics
|
||||
*
|
||||
* Blobs:
|
||||
* blob1: endpoint, blob2: method, blob3: status
|
||||
*
|
||||
* Doubles:
|
||||
* double1: latency_ms, double2: count (always 1)
|
||||
*
|
||||
* Indexes:
|
||||
* index1: customer_id (high cardinality)
|
||||
*/
|
||||
```
|
||||
Reference in New Issue
Block a user