Workers AI API Reference

Core Method

const response = await env.AI.run(model, input);

Text Generation

const result = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
  messages: [
    { role: 'system', content: 'You are helpful' },
    { role: 'user', content: 'Hello' }
  ],
  temperature: 0.7,  // 0-1
  max_tokens: 100
});
console.log(result.response);

Streaming:

const stream = await env.AI.run(model, { messages, stream: true });
return new Response(stream, { headers: { 'Content-Type': 'text/event-stream' } });

Embeddings

const result = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
  text: ['Query', 'Doc 1', 'Doc 2'] // Batch for efficiency
});
const [queryEmbed, doc1Embed, doc2Embed] = result.data; // 768-dim vectors

Function Calling

const tools = [{
  type: 'function',
  function: {
    name: 'getWeather',
    description: 'Get weather for location',
    parameters: {
      type: 'object',
      properties: { location: { type: 'string' } },
      required: ['location']
    }
  }
}];

const response = await env.AI.run(model, { messages, tools });
if (response.tool_calls) {
  const args = JSON.parse(response.tool_calls[0].function.arguments);
  // Execute function, send result back
}

Image Generation

const image = await env.AI.run('@cf/stabilityai/stable-diffusion-xl-base-1.0', {
  prompt: 'Mountain sunset',
  num_steps: 20,   // 1-20
  guidance: 7.5    // 1-20
});
return new Response(image, { headers: { 'Content-Type': 'image/png' } });

Speech Recognition

const audioArray = Array.from(new Uint8Array(await request.arrayBuffer()));
const result = await env.AI.run('@cf/openai/whisper', { audio: audioArray });
console.log(result.text);

Translation

const result = await env.AI.run('@cf/meta/m2m100-1.2b', {
  text: 'Hello',
  source_lang: 'en',
  target_lang: 'es'
});
console.log(result.translated_text);

REST API

curl https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/@cf/meta/llama-3.1-8b-instruct \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"messages":[{"role":"user","content":"Hello"}]}'

Error Codes

Code	Meaning	Fix
7502	Model not found	Check spelling
7504	Validation failed	Verify input schema
7505	Rate limited	Reduce rate or upgrade
7506	Context exceeded	Reduce input size

Performance Tips

Batch embeddings - single request for multiple texts
Stream long responses - reduce perceived latency
Accept cold starts - first request ~1-3s, subsequent ~100-500ms

2.7 KiB Raw Blame History

Workers AI API Reference

Core Method

Text Generation

Embeddings

Function Calling

Image Generation

Speech Recognition

Translation

REST API

Error Codes

Performance Tips

2.7 KiB

Raw Blame History