mirror of
https://github.com/ksyasuda/dotfiles.git
synced 2026-03-20 06:11:27 -07:00
82 lines
2.8 KiB
Markdown
82 lines
2.8 KiB
Markdown
---
|
|
name: "transcribe"
|
|
description: "Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings."
|
|
---
|
|
|
|
|
|
# Audio Transcribe
|
|
|
|
Transcribe audio using OpenAI, with optional speaker diarization when requested. Prefer the bundled CLI for deterministic, repeatable runs.
|
|
|
|
## Workflow
|
|
1. Collect inputs: audio file path(s), desired response format (text/json/diarized_json), optional language hint, and any known speaker references.
|
|
2. Verify `OPENAI_API_KEY` is set. If missing, ask the user to set it locally (do not ask them to paste the key).
|
|
3. Run the bundled `transcribe_diarize.py` CLI with sensible defaults (fast text transcription).
|
|
4. Validate the output: transcription quality, speaker labels, and segment boundaries; iterate with a single targeted change if needed.
|
|
5. Save outputs under `output/transcribe/` when working in this repo.
|
|
|
|
## Decision rules
|
|
- Default to `gpt-4o-mini-transcribe` with `--response-format text` for fast transcription.
|
|
- If the user wants speaker labels or diarization, use `--model gpt-4o-transcribe-diarize --response-format diarized_json`.
|
|
- If audio is longer than ~30 seconds, keep `--chunking-strategy auto`.
|
|
- Prompting is not supported for `gpt-4o-transcribe-diarize`.
|
|
|
|
## Output conventions
|
|
- Use `output/transcribe/<job-id>/` for evaluation runs.
|
|
- Use `--out-dir` for multiple files to avoid overwriting.
|
|
|
|
## Dependencies (install if missing)
|
|
Prefer `uv` for dependency management.
|
|
|
|
```
|
|
uv pip install openai
|
|
```
|
|
If `uv` is unavailable:
|
|
```
|
|
python3 -m pip install openai
|
|
```
|
|
|
|
## Environment
|
|
- `OPENAI_API_KEY` must be set for live API calls.
|
|
- If the key is missing, instruct the user to create one in the OpenAI platform UI and export it in their shell.
|
|
- Never ask the user to paste the full key in chat.
|
|
|
|
## Skill path (set once)
|
|
|
|
```bash
|
|
export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
|
|
export TRANSCRIBE_CLI="$CODEX_HOME/skills/transcribe/scripts/transcribe_diarize.py"
|
|
```
|
|
|
|
User-scoped skills install under `$CODEX_HOME/skills` (default: `~/.codex/skills`).
|
|
|
|
## CLI quick start
|
|
Single file (fast text default):
|
|
```
|
|
python3 "$TRANSCRIBE_CLI" \
|
|
path/to/audio.wav \
|
|
--out transcript.txt
|
|
```
|
|
|
|
Diarization with known speakers (up to 4):
|
|
```
|
|
python3 "$TRANSCRIBE_CLI" \
|
|
meeting.m4a \
|
|
--model gpt-4o-transcribe-diarize \
|
|
--known-speaker "Alice=refs/alice.wav" \
|
|
--known-speaker "Bob=refs/bob.wav" \
|
|
--response-format diarized_json \
|
|
--out-dir output/transcribe/meeting
|
|
```
|
|
|
|
Plain text output (explicit):
|
|
```
|
|
python3 "$TRANSCRIBE_CLI" \
|
|
interview.mp3 \
|
|
--response-format text \
|
|
--out interview.txt
|
|
```
|
|
|
|
## Reference map
|
|
- `references/api.md`: supported formats, limits, response formats, and known-speaker notes.
|