mirror of
https://github.com/ksyasuda/dotfiles.git
synced 2026-03-21 06:11:27 -07:00
100 lines
3.2 KiB
Markdown
100 lines
3.2 KiB
Markdown
# CLI reference (`scripts/text_to_speech.py`)
|
|
|
|
This file contains the "command catalog" for the bundled speech generation CLI. Keep `SKILL.md` as overview-first; put verbose CLI details here.
|
|
|
|
## What this CLI does
|
|
- `speak`: generate a single audio file
|
|
- `speak-batch`: run many jobs from a JSONL file (one job per line)
|
|
- `list-voices`: list supported voices
|
|
|
|
Real API calls require network access + `OPENAI_API_KEY`. `--dry-run` does not.
|
|
|
|
## Quick start (works from any repo)
|
|
Set a stable path to the skill CLI (default `CODEX_HOME` is `~/.codex`):
|
|
|
|
```
|
|
export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
|
|
export TTS_GEN="$CODEX_HOME/skills/speech/scripts/text_to_speech.py"
|
|
```
|
|
|
|
Dry-run (no API call; no network required; does not require the `openai` package):
|
|
|
|
```
|
|
python "$TTS_GEN" speak --input "Test" --dry-run
|
|
```
|
|
|
|
Generate (requires `OPENAI_API_KEY` + network):
|
|
|
|
```
|
|
uv run --with openai python "$TTS_GEN" speak \
|
|
--input "Today is a wonderful day to build something people love!" \
|
|
--voice cedar \
|
|
--instructions "Voice Affect: Warm and composed. Tone: upbeat and encouraging." \
|
|
--response-format mp3 \
|
|
--out speech.mp3
|
|
```
|
|
|
|
No `uv` installed? Use your active Python env:
|
|
|
|
```
|
|
python "$TTS_GEN" speak --input "Hello" --voice cedar --out speech.mp3
|
|
```
|
|
|
|
## Guardrails (important)
|
|
- Use `python "$TTS_GEN" ...` (or equivalent full path) for all TTS work.
|
|
- Do **not** create one-off runners (e.g., `gen_audio.py`) unless the user explicitly asks.
|
|
- **Never modify** `scripts/text_to_speech.py`. If something is missing, ask the user before doing anything else.
|
|
|
|
## Defaults (unless overridden by flags)
|
|
- Model: `gpt-4o-mini-tts-2025-12-15`
|
|
- Voice: `cedar`
|
|
- Response format: `mp3`
|
|
- Speed: `1.0`
|
|
- Batch rpm cap: `50`
|
|
|
|
## Input limits
|
|
- Input text must be <= 4096 characters per request.
|
|
- For longer text, split into smaller chunks (manual or via batch JSONL).
|
|
|
|
## Instructions compatibility
|
|
- `instructions` are supported for GPT-4o mini TTS models.
|
|
- `tts-1` and `tts-1-hd` ignore instructions (the CLI will warn and drop them).
|
|
|
|
## Common recipes
|
|
|
|
List voices:
|
|
```
|
|
python "$TTS_GEN" list-voices
|
|
```
|
|
|
|
Generate with explicit pacing:
|
|
```
|
|
python "$TTS_GEN" speak \
|
|
--input "Welcome to the demo. We'll show how it works." \
|
|
--instructions "Tone: friendly and confident. Pacing: steady and moderate." \
|
|
--out demo.mp3
|
|
```
|
|
|
|
Batch generation (JSONL):
|
|
```
|
|
mkdir -p tmp/speech
|
|
cat > tmp/speech/jobs.jsonl << 'JSONL'
|
|
{"input":"Thank you for calling. Please hold.","voice":"cedar","response_format":"mp3","out":"hold.mp3"}
|
|
{"input":"For sales, press 1. For support, press 2.","voice":"marin","instructions":"Tone: clear and neutral. Pacing: slow.","response_format":"wav"}
|
|
JSONL
|
|
|
|
python "$TTS_GEN" speak-batch --input tmp/speech/jobs.jsonl --out-dir out --rpm 50
|
|
|
|
# Cleanup (recommended)
|
|
rm -f tmp/speech/jobs.jsonl
|
|
```
|
|
|
|
Notes:
|
|
- Use `--rpm` to control rate limiting (default `50`, max `50`).
|
|
- Per-job overrides are supported in JSONL (`model`, `voice`, `response_format`, `speed`, `instructions`, `out`).
|
|
- Treat the JSONL file as temporary: write it under `tmp/` and delete it after the run (do not commit it).
|
|
|
|
## See also
|
|
- API parameter quick reference: `references/audio-api.md`
|
|
- Instruction patterns and examples: `references/voice-directions.md`
|