mirror of
https://github.com/ksyasuda/dotfiles.git
synced 2026-03-22 06:11:27 -07:00
89 lines
1.7 KiB
Markdown
89 lines
1.7 KiB
Markdown
# AI Search Configuration
|
|
|
|
## Worker Setup
|
|
|
|
```jsonc
|
|
// wrangler.jsonc
|
|
{
|
|
"ai": { "binding": "AI" }
|
|
}
|
|
```
|
|
|
|
```typescript
|
|
interface Env {
|
|
AI: Ai;
|
|
}
|
|
|
|
const answer = await env.AI.autorag("my-instance").aiSearch({
|
|
query: "How do I configure caching?",
|
|
model: "@cf/meta/llama-3.3-70b-instruct-fp8-fast"
|
|
});
|
|
```
|
|
|
|
## Data Sources
|
|
|
|
### R2 Bucket
|
|
|
|
Dashboard: AI Search → Create Instance → Select R2 bucket
|
|
|
|
**Supported formats:** `.md`, `.txt`, `.html`, `.pdf`, `.doc`, `.docx`, `.csv`, `.json`
|
|
|
|
**Auto-indexed metadata:** `filename`, `folder`, `timestamp`
|
|
|
|
### Website Crawler
|
|
|
|
Requirements:
|
|
- Domain on Cloudflare
|
|
- `sitemap.xml` at root
|
|
- Bot protection must allow `CloudflareAISearch` user agent
|
|
|
|
## Path Filtering (R2)
|
|
|
|
```
|
|
docs/**/*.md # All .md in docs/ recursively
|
|
**/*.draft.md # Exclude (use in exclude patterns)
|
|
```
|
|
|
|
## Indexing
|
|
|
|
- **Automatic:** Every 6 hours
|
|
- **Force Sync:** Dashboard button (30s rate limit between syncs)
|
|
- **Pause:** Settings → Pause Indexing (existing index remains searchable)
|
|
|
|
## Service API Token
|
|
|
|
Dashboard: AI Search → Instance → Use AI Search → API → Create Token
|
|
|
|
Permissions:
|
|
- **Read** - search operations
|
|
- **Edit** - instance management
|
|
|
|
Store securely:
|
|
```bash
|
|
wrangler secret put AI_SEARCH_TOKEN
|
|
```
|
|
|
|
## Multi-Environment
|
|
|
|
```toml
|
|
# wrangler.toml
|
|
[env.production.vars]
|
|
AI_SEARCH_INSTANCE = "prod-docs"
|
|
|
|
[env.staging.vars]
|
|
AI_SEARCH_INSTANCE = "staging-docs"
|
|
```
|
|
|
|
```typescript
|
|
const answer = await env.AI.autorag(env.AI_SEARCH_INSTANCE).aiSearch({ query });
|
|
```
|
|
|
|
## Monitoring
|
|
|
|
```typescript
|
|
const instances = await env.AI.autorag("_").listInstances();
|
|
console.log(instances.find(i => i.name === "docs"));
|
|
```
|
|
|
|
Dashboard shows: files indexed, status, last index time, storage usage.
|