mirror of
https://github.com/ksyasuda/dotfiles.git
synced 2026-03-21 18:11:27 -07:00
106 lines
3.3 KiB
Markdown
106 lines
3.3 KiB
Markdown
# Cloudflare Pipelines
|
|
|
|
ETL streaming platform for ingesting, transforming, and loading data into R2 with SQL transformations.
|
|
|
|
## Overview
|
|
|
|
Pipelines provides:
|
|
- **Streams**: Durable event buffers (HTTP/Workers ingestion)
|
|
- **Pipelines**: SQL-based transformations
|
|
- **Sinks**: R2 destinations (Iceberg tables or Parquet/JSON files)
|
|
|
|
**Status**: Open beta (Workers Paid plan)
|
|
**Pricing**: No charge beyond standard R2 storage/operations
|
|
|
|
## Architecture
|
|
|
|
```
|
|
Data Sources → Streams → Pipelines (SQL) → Sinks → R2
|
|
↑ ↓ ↓
|
|
HTTP/Workers Transform Iceberg/Parquet
|
|
```
|
|
|
|
| Component | Purpose | Key Feature |
|
|
|-----------|---------|-------------|
|
|
| Streams | Event ingestion | Structured (validated) or unstructured |
|
|
| Pipelines | Transform with SQL | Immutable after creation |
|
|
| Sinks | Write to R2 | Exactly-once delivery |
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Interactive setup (recommended)
|
|
npx wrangler pipelines setup
|
|
```
|
|
|
|
**Minimal Worker example:**
|
|
```typescript
|
|
interface Env {
|
|
STREAM: Pipeline;
|
|
}
|
|
|
|
export default {
|
|
async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
|
|
const event = { user_id: "123", event_type: "purchase", amount: 29.99 };
|
|
|
|
// Fire-and-forget pattern
|
|
ctx.waitUntil(env.STREAM.send([event]));
|
|
|
|
return new Response('OK');
|
|
}
|
|
} satisfies ExportedHandler<Env>;
|
|
```
|
|
|
|
## Which Sink Type?
|
|
|
|
```
|
|
Need SQL queries on data?
|
|
→ R2 Data Catalog (Iceberg)
|
|
✅ ACID transactions, time-travel, schema evolution
|
|
❌ More setup complexity (namespace, table, catalog token)
|
|
|
|
Just file storage/archival?
|
|
→ R2 Storage (Parquet)
|
|
✅ Simple, direct file access
|
|
❌ No built-in SQL queries
|
|
|
|
Using external tools (Spark/Athena)?
|
|
→ R2 Storage (Parquet with partitioning)
|
|
✅ Standard format, partition pruning for performance
|
|
❌ Must manage schema compatibility yourself
|
|
```
|
|
|
|
## Common Use Cases
|
|
|
|
- **Analytics pipelines**: Clickstream, telemetry, server logs
|
|
- **Data warehousing**: ETL into queryable Iceberg tables
|
|
- **Event processing**: Mobile/IoT with enrichment
|
|
- **Ecommerce analytics**: User events, purchases, views
|
|
|
|
## Reading Order
|
|
|
|
**New to Pipelines?** Start here:
|
|
1. [configuration.md](./configuration.md) - Setup streams, sinks, pipelines
|
|
2. [api.md](./api.md) - Send events, TypeScript types, SQL functions
|
|
3. [patterns.md](./patterns.md) - Best practices, integrations, complete example
|
|
4. [gotchas.md](./gotchas.md) - Critical warnings, troubleshooting
|
|
|
|
**Task-based routing:**
|
|
- Setup pipeline → [configuration.md](./configuration.md)
|
|
- Send/query data → [api.md](./api.md)
|
|
- Implement pattern → [patterns.md](./patterns.md)
|
|
- Debug issue → [gotchas.md](./gotchas.md)
|
|
|
|
## In This Reference
|
|
|
|
- [configuration.md](./configuration.md) - wrangler.jsonc bindings, schema definition, sink options, CLI commands
|
|
- [api.md](./api.md) - Pipeline binding interface, send() method, HTTP ingest, SQL function reference
|
|
- [patterns.md](./patterns.md) - Fire-and-forget, schema validation with Zod, integrations, performance tuning
|
|
- [gotchas.md](./gotchas.md) - Silent validation failures, immutable pipelines, latency expectations, limits
|
|
|
|
## See Also
|
|
|
|
- [r2](../r2/) - R2 storage backend for sinks
|
|
- [queues](../queues/) - Compare with Queues for async processing
|
|
- [workers](../workers/) - Worker runtime for event ingestion
|