# Configuration How to enable R2 Data Catalog and configure authentication. ## Prerequisites - Cloudflare account with [R2 subscription](https://developers.cloudflare.com/r2/pricing/) - R2 bucket created - Access to Cloudflare dashboard or Wrangler CLI ## Enable Catalog on Bucket Choose one method: ### Via Wrangler (Recommended) ```bash npx wrangler r2 bucket catalog enable ``` **Output:** ``` ✅ Data Catalog enabled for bucket 'my-bucket' Catalog URI: https://.r2.cloudflarestorage.com/iceberg/my-bucket Warehouse: my-bucket ``` ### Via Dashboard 1. Navigate to **R2** → Select your bucket → **Settings** tab 2. Scroll to "R2 Data Catalog" section → Click **Enable** 3. Note the **Catalog URI** and **Warehouse name** shown **Result:** - Catalog URI: `https://.r2.cloudflarestorage.com/iceberg/` - Warehouse: `` (same as bucket name) ### Via API (Programmatic) ```bash curl -X POST \ "https://api.cloudflare.com/client/v4/accounts//r2/buckets//catalog" \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" ``` **Response:** ```json { "result": { "catalog_uri": "https://.r2.cloudflarestorage.com/iceberg/", "warehouse": "" }, "success": true } ``` ## Check Catalog Status ```bash npx wrangler r2 bucket catalog status ``` **Output:** ``` Catalog Status: enabled Catalog URI: https://.r2.cloudflarestorage.com/iceberg/my-bucket Warehouse: my-bucket ``` ## Disable Catalog (If Needed) ```bash npx wrangler r2 bucket catalog disable ``` ⚠️ **Warning:** Disabling does NOT delete tables/data. Files remain in bucket. Metadata becomes inaccessible until re-enabled. ## API Token Creation R2 Data Catalog requires API token with **both** R2 Storage + R2 Data Catalog permissions. ### Dashboard Method (Recommended) 1. Go to **R2** → **Manage R2 API Tokens** → **Create API Token** 2. Select permission level: - **Admin Read & Write** - Full catalog + storage access (read/write) - **Admin Read only** - Read-only access (for query engines) 3. Copy token value immediately (shown only once) **Permission groups included:** - `Workers R2 Data Catalog Write` (or Read) - `Workers R2 Storage Bucket Item Write` (or Read) ### API Method (Programmatic) Use Cloudflare API to create tokens programmatically. Required permissions: - `Workers R2 Data Catalog Write` (or Read) - `Workers R2 Storage Bucket Item Write` (or Read) ## Client Configuration ### PyIceberg ```python from pyiceberg.catalog.rest import RestCatalog catalog = RestCatalog( name="my_catalog", warehouse="", # Same as bucket name uri="", # From enable command token="", # From token creation ) ``` **Full example with credentials:** ```python import os from pyiceberg.catalog.rest import RestCatalog # Store credentials in environment variables WAREHOUSE = os.getenv("R2_WAREHOUSE") # e.g., "my-bucket" CATALOG_URI = os.getenv("R2_CATALOG_URI") # e.g., "https://abc123.r2.cloudflarestorage.com/iceberg/my-bucket" TOKEN = os.getenv("R2_TOKEN") # API token catalog = RestCatalog( name="r2_catalog", warehouse=WAREHOUSE, uri=CATALOG_URI, token=TOKEN, ) # Test connection print(catalog.list_namespaces()) ``` ### Spark / Trino / DuckDB See [patterns.md](patterns.md) for integration examples with other query engines. ## Connection String Format For quick reference: ``` Catalog URI: https://.r2.cloudflarestorage.com/iceberg/ Warehouse: Token: ``` **Where to find values:** | Value | Source | |-------|--------| | `` | Dashboard URL or `wrangler whoami` | | `` | R2 bucket name | | Catalog URI | Output from `wrangler r2 bucket catalog enable` | | Token | R2 API Token creation page | ## Security Best Practices 1. **Store tokens securely** - Use environment variables or secret managers, never hardcode 2. **Use least privilege** - Read-only tokens for query engines, write tokens only where needed 3. **Rotate tokens regularly** - Create new tokens, test, then revoke old ones 4. **One token per application** - Easier to track and revoke if compromised 5. **Monitor token usage** - Check R2 analytics for unexpected patterns 6. **Bucket-scoped tokens** - Create tokens per bucket, not account-wide ## Environment Variables Pattern ```bash # .env (never commit) R2_CATALOG_URI=https://.r2.cloudflarestorage.com/iceberg/ R2_WAREHOUSE= R2_TOKEN= ``` ```python import os from pyiceberg.catalog.rest import RestCatalog catalog = RestCatalog( name="r2", uri=os.getenv("R2_CATALOG_URI"), warehouse=os.getenv("R2_WAREHOUSE"), token=os.getenv("R2_TOKEN"), ) ``` ## Troubleshooting | Problem | Solution | |---------|----------| | 404 "catalog not found" | Run `wrangler r2 bucket catalog enable ` | | 401 "unauthorized" | Check token has both Catalog + Storage permissions | | 403 on data files | Token needs both permission groups | See [gotchas.md](gotchas.md) for detailed troubleshooting.