mirror of
https://github.com/ksyasuda/dotfiles.git
synced 2026-03-20 18:11:27 -07:00
5.2 KiB
5.2 KiB
Configuration
How to enable R2 Data Catalog and configure authentication.
Prerequisites
- Cloudflare account with R2 subscription
- R2 bucket created
- Access to Cloudflare dashboard or Wrangler CLI
Enable Catalog on Bucket
Choose one method:
Via Wrangler (Recommended)
npx wrangler r2 bucket catalog enable <BUCKET_NAME>
Output:
✅ Data Catalog enabled for bucket 'my-bucket'
Catalog URI: https://<account-id>.r2.cloudflarestorage.com/iceberg/my-bucket
Warehouse: my-bucket
Via Dashboard
- Navigate to R2 → Select your bucket → Settings tab
- Scroll to "R2 Data Catalog" section → Click Enable
- Note the Catalog URI and Warehouse name shown
Result:
- Catalog URI:
https://<account-id>.r2.cloudflarestorage.com/iceberg/<bucket-name> - Warehouse:
<bucket-name>(same as bucket name)
Via API (Programmatic)
curl -X POST \
"https://api.cloudflare.com/client/v4/accounts/<account-id>/r2/buckets/<bucket>/catalog" \
-H "Authorization: Bearer <api-token>" \
-H "Content-Type: application/json"
Response:
{
"result": {
"catalog_uri": "https://<account-id>.r2.cloudflarestorage.com/iceberg/<bucket>",
"warehouse": "<bucket>"
},
"success": true
}
Check Catalog Status
npx wrangler r2 bucket catalog status <BUCKET_NAME>
Output:
Catalog Status: enabled
Catalog URI: https://<account-id>.r2.cloudflarestorage.com/iceberg/my-bucket
Warehouse: my-bucket
Disable Catalog (If Needed)
npx wrangler r2 bucket catalog disable <BUCKET_NAME>
⚠️ Warning: Disabling does NOT delete tables/data. Files remain in bucket. Metadata becomes inaccessible until re-enabled.
API Token Creation
R2 Data Catalog requires API token with both R2 Storage + R2 Data Catalog permissions.
Dashboard Method (Recommended)
- Go to R2 → Manage R2 API Tokens → Create API Token
- Select permission level:
- Admin Read & Write - Full catalog + storage access (read/write)
- Admin Read only - Read-only access (for query engines)
- Copy token value immediately (shown only once)
Permission groups included:
Workers R2 Data Catalog Write(or Read)Workers R2 Storage Bucket Item Write(or Read)
API Method (Programmatic)
Use Cloudflare API to create tokens programmatically. Required permissions:
Workers R2 Data Catalog Write(or Read)Workers R2 Storage Bucket Item Write(or Read)
Client Configuration
PyIceberg
from pyiceberg.catalog.rest import RestCatalog
catalog = RestCatalog(
name="my_catalog",
warehouse="<bucket-name>", # Same as bucket name
uri="<catalog-uri>", # From enable command
token="<api-token>", # From token creation
)
Full example with credentials:
import os
from pyiceberg.catalog.rest import RestCatalog
# Store credentials in environment variables
WAREHOUSE = os.getenv("R2_WAREHOUSE") # e.g., "my-bucket"
CATALOG_URI = os.getenv("R2_CATALOG_URI") # e.g., "https://abc123.r2.cloudflarestorage.com/iceberg/my-bucket"
TOKEN = os.getenv("R2_TOKEN") # API token
catalog = RestCatalog(
name="r2_catalog",
warehouse=WAREHOUSE,
uri=CATALOG_URI,
token=TOKEN,
)
# Test connection
print(catalog.list_namespaces())
Spark / Trino / DuckDB
See patterns.md for integration examples with other query engines.
Connection String Format
For quick reference:
Catalog URI: https://<account-id>.r2.cloudflarestorage.com/iceberg/<bucket>
Warehouse: <bucket-name>
Token: <r2-api-token>
Where to find values:
| Value | Source |
|---|---|
<account-id> |
Dashboard URL or wrangler whoami |
<bucket> |
R2 bucket name |
| Catalog URI | Output from wrangler r2 bucket catalog enable |
| Token | R2 API Token creation page |
Security Best Practices
- Store tokens securely - Use environment variables or secret managers, never hardcode
- Use least privilege - Read-only tokens for query engines, write tokens only where needed
- Rotate tokens regularly - Create new tokens, test, then revoke old ones
- One token per application - Easier to track and revoke if compromised
- Monitor token usage - Check R2 analytics for unexpected patterns
- Bucket-scoped tokens - Create tokens per bucket, not account-wide
Environment Variables Pattern
# .env (never commit)
R2_CATALOG_URI=https://<account-id>.r2.cloudflarestorage.com/iceberg/<bucket>
R2_WAREHOUSE=<bucket-name>
R2_TOKEN=<api-token>
import os
from pyiceberg.catalog.rest import RestCatalog
catalog = RestCatalog(
name="r2",
uri=os.getenv("R2_CATALOG_URI"),
warehouse=os.getenv("R2_WAREHOUSE"),
token=os.getenv("R2_TOKEN"),
)
Troubleshooting
| Problem | Solution |
|---|---|
| 404 "catalog not found" | Run wrangler r2 bucket catalog enable <bucket> |
| 401 "unauthorized" | Check token has both Catalog + Storage permissions |
| 403 on data files | Token needs both permission groups |
See gotchas.md for detailed troubleshooting.