# Cloudflare Pipelines ETL streaming platform for ingesting, transforming, and loading data into R2 with SQL transformations. ## Overview Pipelines provides: - **Streams**: Durable event buffers (HTTP/Workers ingestion) - **Pipelines**: SQL-based transformations - **Sinks**: R2 destinations (Iceberg tables or Parquet/JSON files) **Status**: Open beta (Workers Paid plan) **Pricing**: No charge beyond standard R2 storage/operations ## Architecture ``` Data Sources → Streams → Pipelines (SQL) → Sinks → R2 ↑ ↓ ↓ HTTP/Workers Transform Iceberg/Parquet ``` | Component | Purpose | Key Feature | |-----------|---------|-------------| | Streams | Event ingestion | Structured (validated) or unstructured | | Pipelines | Transform with SQL | Immutable after creation | | Sinks | Write to R2 | Exactly-once delivery | ## Quick Start ```bash # Interactive setup (recommended) npx wrangler pipelines setup ``` **Minimal Worker example:** ```typescript interface Env { STREAM: Pipeline; } export default { async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise { const event = { user_id: "123", event_type: "purchase", amount: 29.99 }; // Fire-and-forget pattern ctx.waitUntil(env.STREAM.send([event])); return new Response('OK'); } } satisfies ExportedHandler; ``` ## Which Sink Type? ``` Need SQL queries on data? → R2 Data Catalog (Iceberg) ✅ ACID transactions, time-travel, schema evolution ❌ More setup complexity (namespace, table, catalog token) Just file storage/archival? → R2 Storage (Parquet) ✅ Simple, direct file access ❌ No built-in SQL queries Using external tools (Spark/Athena)? → R2 Storage (Parquet with partitioning) ✅ Standard format, partition pruning for performance ❌ Must manage schema compatibility yourself ``` ## Common Use Cases - **Analytics pipelines**: Clickstream, telemetry, server logs - **Data warehousing**: ETL into queryable Iceberg tables - **Event processing**: Mobile/IoT with enrichment - **Ecommerce analytics**: User events, purchases, views ## Reading Order **New to Pipelines?** Start here: 1. [configuration.md](./configuration.md) - Setup streams, sinks, pipelines 2. [api.md](./api.md) - Send events, TypeScript types, SQL functions 3. [patterns.md](./patterns.md) - Best practices, integrations, complete example 4. [gotchas.md](./gotchas.md) - Critical warnings, troubleshooting **Task-based routing:** - Setup pipeline → [configuration.md](./configuration.md) - Send/query data → [api.md](./api.md) - Implement pattern → [patterns.md](./patterns.md) - Debug issue → [gotchas.md](./gotchas.md) ## In This Reference - [configuration.md](./configuration.md) - wrangler.jsonc bindings, schema definition, sink options, CLI commands - [api.md](./api.md) - Pipeline binding interface, send() method, HTTP ingest, SQL function reference - [patterns.md](./patterns.md) - Fire-and-forget, schema validation with Zod, integrations, performance tuning - [gotchas.md](./gotchas.md) - Silent validation failures, immutable pipelines, latency expectations, limits ## See Also - [r2](../r2/) - R2 storage backend for sinks - [queues](../queues/) - Compare with Queues for async processing - [workers](../workers/) - Worker runtime for event ingestion