--- name: "transcribe" description: "Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings." --- # Audio Transcribe Transcribe audio using OpenAI, with optional speaker diarization when requested. Prefer the bundled CLI for deterministic, repeatable runs. ## Workflow 1. Collect inputs: audio file path(s), desired response format (text/json/diarized_json), optional language hint, and any known speaker references. 2. Verify `OPENAI_API_KEY` is set. If missing, ask the user to set it locally (do not ask them to paste the key). 3. Run the bundled `transcribe_diarize.py` CLI with sensible defaults (fast text transcription). 4. Validate the output: transcription quality, speaker labels, and segment boundaries; iterate with a single targeted change if needed. 5. Save outputs under `output/transcribe/` when working in this repo. ## Decision rules - Default to `gpt-4o-mini-transcribe` with `--response-format text` for fast transcription. - If the user wants speaker labels or diarization, use `--model gpt-4o-transcribe-diarize --response-format diarized_json`. - If audio is longer than ~30 seconds, keep `--chunking-strategy auto`. - Prompting is not supported for `gpt-4o-transcribe-diarize`. ## Output conventions - Use `output/transcribe//` for evaluation runs. - Use `--out-dir` for multiple files to avoid overwriting. ## Dependencies (install if missing) Prefer `uv` for dependency management. ``` uv pip install openai ``` If `uv` is unavailable: ``` python3 -m pip install openai ``` ## Environment - `OPENAI_API_KEY` must be set for live API calls. - If the key is missing, instruct the user to create one in the OpenAI platform UI and export it in their shell. - Never ask the user to paste the full key in chat. ## Skill path (set once) ```bash export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}" export TRANSCRIBE_CLI="$CODEX_HOME/skills/transcribe/scripts/transcribe_diarize.py" ``` User-scoped skills install under `$CODEX_HOME/skills` (default: `~/.codex/skills`). ## CLI quick start Single file (fast text default): ``` python3 "$TRANSCRIBE_CLI" \ path/to/audio.wav \ --out transcript.txt ``` Diarization with known speakers (up to 4): ``` python3 "$TRANSCRIBE_CLI" \ meeting.m4a \ --model gpt-4o-transcribe-diarize \ --known-speaker "Alice=refs/alice.wav" \ --known-speaker "Bob=refs/bob.wav" \ --response-format diarized_json \ --out-dir output/transcribe/meeting ``` Plain text output (explicit): ``` python3 "$TRANSCRIBE_CLI" \ interview.mp3 \ --response-format text \ --out interview.txt ``` ## Reference map - `references/api.md`: supported formats, limits, response formats, and known-speaker notes.