mirror of
https://github.com/ksyasuda/dotfiles.git
synced 2026-03-20 06:11:27 -07:00
3.2 KiB
3.2 KiB
CLI reference (scripts/text_to_speech.py)
This file contains the "command catalog" for the bundled speech generation CLI. Keep SKILL.md as overview-first; put verbose CLI details here.
What this CLI does
speak: generate a single audio filespeak-batch: run many jobs from a JSONL file (one job per line)list-voices: list supported voices
Real API calls require network access + OPENAI_API_KEY. --dry-run does not.
Quick start (works from any repo)
Set a stable path to the skill CLI (default CODEX_HOME is ~/.codex):
export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
export TTS_GEN="$CODEX_HOME/skills/speech/scripts/text_to_speech.py"
Dry-run (no API call; no network required; does not require the openai package):
python "$TTS_GEN" speak --input "Test" --dry-run
Generate (requires OPENAI_API_KEY + network):
uv run --with openai python "$TTS_GEN" speak \
--input "Today is a wonderful day to build something people love!" \
--voice cedar \
--instructions "Voice Affect: Warm and composed. Tone: upbeat and encouraging." \
--response-format mp3 \
--out speech.mp3
No uv installed? Use your active Python env:
python "$TTS_GEN" speak --input "Hello" --voice cedar --out speech.mp3
Guardrails (important)
- Use
python "$TTS_GEN" ...(or equivalent full path) for all TTS work. - Do not create one-off runners (e.g.,
gen_audio.py) unless the user explicitly asks. - Never modify
scripts/text_to_speech.py. If something is missing, ask the user before doing anything else.
Defaults (unless overridden by flags)
- Model:
gpt-4o-mini-tts-2025-12-15 - Voice:
cedar - Response format:
mp3 - Speed:
1.0 - Batch rpm cap:
50
Input limits
- Input text must be <= 4096 characters per request.
- For longer text, split into smaller chunks (manual or via batch JSONL).
Instructions compatibility
instructionsare supported for GPT-4o mini TTS models.tts-1andtts-1-hdignore instructions (the CLI will warn and drop them).
Common recipes
List voices:
python "$TTS_GEN" list-voices
Generate with explicit pacing:
python "$TTS_GEN" speak \
--input "Welcome to the demo. We'll show how it works." \
--instructions "Tone: friendly and confident. Pacing: steady and moderate." \
--out demo.mp3
Batch generation (JSONL):
mkdir -p tmp/speech
cat > tmp/speech/jobs.jsonl << 'JSONL'
{"input":"Thank you for calling. Please hold.","voice":"cedar","response_format":"mp3","out":"hold.mp3"}
{"input":"For sales, press 1. For support, press 2.","voice":"marin","instructions":"Tone: clear and neutral. Pacing: slow.","response_format":"wav"}
JSONL
python "$TTS_GEN" speak-batch --input tmp/speech/jobs.jsonl --out-dir out --rpm 50
# Cleanup (recommended)
rm -f tmp/speech/jobs.jsonl
Notes:
- Use
--rpmto control rate limiting (default50, max50). - Per-job overrides are supported in JSONL (
model,voice,response_format,speed,instructions,out). - Treat the JSONL file as temporary: write it under
tmp/and delete it after the run (do not commit it).
See also
- API parameter quick reference:
references/audio-api.md - Instruction patterns and examples:
references/voice-directions.md