update skills

2026-05-09 00:41:27 -07:00 · 2026-03-17 16:53:22 -07:00
parent 0b0783ef8e
commit f9a530667e
389 changed files with 54512 additions and 1 deletions
@@ -0,0 +1,32 @@
+# Accessibility read defaults
+
+## Suggested defaults
+- Voice: `cedar`
+- Format: `mp3` or `wav`
+- Speed: `0.95` to `1.0`
+
+## Guidance
+- Keep delivery steady and neutral.
+- Enunciate acronyms and numbers.
+- Avoid dramatic or stylized delivery.
+
+## Instruction template
+```
+Voice Affect: Neutral and clear.
+Tone: Informational and steady.
+Pacing: Slow and consistent.
+Pronunciation: Enunciate acronyms and numbers.
+Emphasis: Stress key warnings or labels.
+```
+
+## Example (short)
+Input text:
+"Warning: High voltage. Keep hands clear."
+
+Instructions:
+```
+Voice Affect: Neutral and clear.
+Tone: Informational and steady.
+Pacing: Slow and consistent.
+Emphasis: Stress "Warning" and "High voltage".
+```
@@ -0,0 +1,31 @@
+# Audio Speech API quick reference
+
+## Endpoint
+- Create speech: `POST /v1/audio/speech`
+
+## Default model
+- `gpt-4o-mini-tts-2025-12-15`
+
+## Other speech models (if requested)
+- `gpt-4o-mini-tts`
+- `tts-1`
+- `tts-1-hd`
+
+## Core parameters
+- `model`: speech model
+- `input`: text to synthesize (max 4096 characters)
+- `voice`: built-in voice name
+- `instructions`: optional style directions (not supported for `tts-1` or `tts-1-hd`)
+- `response_format`: `mp3`, `opus`, `aac`, `flac`, `wav`, or `pcm`
+- `speed`: 0.25 to 4.0
+
+## Built-in voices
+- `alloy`, `ash`, `ballad`, `cedar`, `coral`, `echo`, `fable`, `marin`, `nova`, `onyx`, `sage`, `shimmer`, `verse`
+
+## Output notes
+- Default format is `mp3`.
+- `pcm` is raw 24 kHz 16-bit little-endian samples (no header).
+- `wav` includes a header (better for quick playback).
+
+## Compliance note
+- Provide a clear disclosure that the voice is AI-generated.
@@ -0,0 +1,99 @@
+# CLI reference (`scripts/text_to_speech.py`)
+
+This file contains the "command catalog" for the bundled speech generation CLI. Keep `SKILL.md` as overview-first; put verbose CLI details here.
+
+## What this CLI does
+- `speak`: generate a single audio file
+- `speak-batch`: run many jobs from a JSONL file (one job per line)
+- `list-voices`: list supported voices
+
+Real API calls require network access + `OPENAI_API_KEY`. `--dry-run` does not.
+
+## Quick start (works from any repo)
+Set a stable path to the skill CLI (default `CODEX_HOME` is `~/.codex`):
+
+```
+export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
+export TTS_GEN="$CODEX_HOME/skills/speech/scripts/text_to_speech.py"
+```
+
+Dry-run (no API call; no network required; does not require the `openai` package):
+
+```
+python "$TTS_GEN" speak --input "Test" --dry-run
+```
+
+Generate (requires `OPENAI_API_KEY` + network):
+
+```
+uv run --with openai python "$TTS_GEN" speak \
+  --input "Today is a wonderful day to build something people love!" \
+  --voice cedar \
+  --instructions "Voice Affect: Warm and composed. Tone: upbeat and encouraging." \
+  --response-format mp3 \
+  --out speech.mp3
+```
+
+No `uv` installed? Use your active Python env:
+
+```
+python "$TTS_GEN" speak --input "Hello" --voice cedar --out speech.mp3
+```
+
+## Guardrails (important)
+- Use `python "$TTS_GEN" ...` (or equivalent full path) for all TTS work.
+- Do **not** create one-off runners (e.g., `gen_audio.py`) unless the user explicitly asks.
+- **Never modify** `scripts/text_to_speech.py`. If something is missing, ask the user before doing anything else.
+
+## Defaults (unless overridden by flags)
+- Model: `gpt-4o-mini-tts-2025-12-15`
+- Voice: `cedar`
+- Response format: `mp3`
+- Speed: `1.0`
+- Batch rpm cap: `50`
+
+## Input limits
+- Input text must be <= 4096 characters per request.
+- For longer text, split into smaller chunks (manual or via batch JSONL).
+
+## Instructions compatibility
+- `instructions` are supported for GPT-4o mini TTS models.
+- `tts-1` and `tts-1-hd` ignore instructions (the CLI will warn and drop them).
+
+## Common recipes
+
+List voices:
+```
+python "$TTS_GEN" list-voices
+```
+
+Generate with explicit pacing:
+```
+python "$TTS_GEN" speak \
+  --input "Welcome to the demo. We'll show how it works." \
+  --instructions "Tone: friendly and confident. Pacing: steady and moderate." \
+  --out demo.mp3
+```
+
+Batch generation (JSONL):
+```
+mkdir -p tmp/speech
+cat > tmp/speech/jobs.jsonl << 'JSONL'
+{"input":"Thank you for calling. Please hold.","voice":"cedar","response_format":"mp3","out":"hold.mp3"}
+{"input":"For sales, press 1. For support, press 2.","voice":"marin","instructions":"Tone: clear and neutral. Pacing: slow.","response_format":"wav"}
+JSONL
+
+python "$TTS_GEN" speak-batch --input tmp/speech/jobs.jsonl --out-dir out --rpm 50
+
+# Cleanup (recommended)
+rm -f tmp/speech/jobs.jsonl
+```
+
+Notes:
+- Use `--rpm` to control rate limiting (default `50`, max `50`).
+- Per-job overrides are supported in JSONL (`model`, `voice`, `response_format`, `speed`, `instructions`, `out`).
+- Treat the JSONL file as temporary: write it under `tmp/` and delete it after the run (do not commit it).
+
+## See also
+- API parameter quick reference: `references/audio-api.md`
+- Instruction patterns and examples: `references/voice-directions.md`
@@ -0,0 +1,28 @@
+# Codex network approvals / sandbox notes
+
+This guidance is intentionally isolated from `SKILL.md` because it can vary by environment and may become stale. Prefer the defaults in your environment when in doubt.
+
+## Why am I asked to approve every speech generation call?
+Speech generation uses the OpenAI Audio API, so the CLI needs outbound network access. In many Codex setups, network access is disabled by default (especially under stricter sandbox modes), and/or the approval policy may require confirmation before networked commands run.
+
+## How do I reduce repeated approval prompts (network)?
+If you trust the repo and want fewer prompts, enable network access for the relevant sandbox mode and relax the approval policy.
+
+Example `~/.codex/config.toml` pattern:
+
+```
+approval_policy = "never"
+sandbox_mode = "workspace-write"
+
+[sandbox_workspace_write]
+network_access = true
+```
+
+Or for a single session:
+
+```
+codex --sandbox workspace-write --ask-for-approval never
+```
+
+## Safety note
+Use caution: enabling network and disabling approvals reduces friction but increases risk if you run untrusted code or work in an untrusted repository.
@@ -0,0 +1,32 @@
+# IVR / phone prompt defaults
+
+## Suggested defaults
+- Voice: `cedar` (clear) or `marin` (brighter)
+- Format: `wav`
+- Speed: `0.9` to `1.0`
+
+## Guidance
+- Prioritize clarity and slower pacing.
+- Enunciate numbers and menu options.
+- Keep sentences short and consistent.
+
+## Instruction template
+```
+Voice Affect: Clear and neutral.
+Tone: Professional and concise.
+Pacing: Slow and even.
+Pronunciation: Enunciate numbers and menu options.
+Emphasis: Stress the option numbers.
+```
+
+## Example (short)
+Input text:
+"For sales, press 1. For support, press 2."
+
+Instructions:
+```
+Voice Affect: Clear and neutral.
+Tone: Professional and concise.
+Pacing: Slow and even.
+Emphasis: Stress "press 1" and "press 2".
+```
@@ -0,0 +1,31 @@
+# Narration / explainer defaults
+
+## Suggested defaults
+- Voice: `cedar`
+- Format: `mp3`
+- Speed: `1.0`
+
+## Guidance
+- Keep pacing steady and clear.
+- Emphasize section headings and key transitions.
+- If the script is long, chunk it into logical paragraphs.
+
+## Instruction template
+```
+Voice Affect: Warm and composed.
+Tone: Friendly and confident.
+Pacing: Steady and moderate.
+Emphasis: Stress section titles and key terms.
+Pauses: Brief pause after each section.
+```
+
+## Example (short)
+Input text:
+"Welcome to the demo. Today we'll show how it works."
+
+Instructions:
+```
+Voice Affect: Warm and composed.
+Tone: Friendly and confident.
+Pacing: Steady and moderate.
+```
@@ -0,0 +1,38 @@
+# Instructioning best practices (TTS)
+
+## Contents
+- Structure
+- Specificity
+- Avoiding conflicts
+- Pronunciation and names
+- Pauses and pacing
+- Iterate deliberately
+- Where to find copy/paste recipes
+
+## Structure
+- Use a consistent order: affect -> tone -> pacing -> emotion -> pronunciation/pauses -> emphasis -> delivery.
+- For complex requests, use short labeled lines instead of a long paragraph.
+
+## Specificity
+- Name the delivery you want ("calm and steady" vs "friendly").
+- If you need a specific cadence, call it out explicitly ("slow and measured", "brisk and energetic").
+
+## Avoiding conflicts
+- Do not mix opposing instructions ("fast and slow", "formal and casual").
+- Keep instructions short: 4 to 8 lines are usually enough.
+
+## Pronunciation and names
+- For acronyms, write the pronunciation hint in text ("A-I" instead of "AI").
+- For names or brands, add a simple phonetic guide in the input text if clarity matters.
+- If a word must be emphasized, add an Emphasis line and repeat the word exactly.
+
+## Pauses and pacing
+- Use punctuation or short line breaks in the input text to create natural pauses.
+- Use the Pauses line for intentional pauses ("pause after the greeting").
+
+## Iterate deliberately
+- Start with a clean base instruction set, then make one change at a time.
+- Repeat critical constraints on each iteration ("keep pacing steady").
+
+## Where to find copy/paste recipes
+For copy/paste instruction templates, see `references/sample-prompts.md`. This file focuses on principles, structure, and iteration patterns.
@@ -0,0 +1,44 @@
+# Sample instruction templates (copy/paste)
+
+These are short instruction blocks. Use only the lines you need and keep them consistent with the input text.
+
+## Friendly product demo
+```
+Voice Affect: Warm and composed.
+Tone: Friendly and confident.
+Pacing: Steady and moderate.
+Emphasis: Stress key product benefits.
+```
+
+## Calm support update
+```
+Voice Affect: Calm and reassuring.
+Tone: Sincere and empathetic.
+Pacing: Slow and steady.
+Emotion: Warmth and care.
+Pauses: Brief pause after apologies.
+```
+
+## IVR menu
+```
+Voice Affect: Clear and neutral.
+Tone: Professional and concise.
+Pacing: Slow and even.
+Emphasis: Stress menu options and numbers.
+```
+
+## Accessibility readout
+```
+Voice Affect: Neutral and clear.
+Tone: Informational and steady.
+Pacing: Slow and consistent.
+Pronunciation: Enunciate acronyms and numbers.
+```
+
+## Energetic intro
+```
+Voice Affect: Bright and upbeat.
+Tone: Enthusiastic and welcoming.
+Pacing: Brisk but clear.
+Emphasis: Stress the opening greeting.
+```
@@ -0,0 +1,80 @@
+# Voice directions
+
+## Template
+Use only the lines you need. Keep directions concise and aligned to the input text.
+
+```
+Voice Affect: <overall character and texture>
+Tone: <attitude, formality, warmth>
+Pacing: <slow, steady, brisk>
+Emotion: <key emotions to convey>
+Pronunciation: <words to enunciate or emphasize>
+Pauses: <where to insert brief pauses>
+Emphasis: <key phrases to stress>
+Delivery: <cadence or rhythm notes>
+```
+
+## Best practices
+- Keep 4 to 8 short lines. Avoid conflicting instructions.
+- Prefer concrete guidance over adjectives alone.
+- Do not rewrite the input text in the instructions; only guide delivery.
+- If you need a language or accent, write the input text in that language.
+- Repeat critical constraints (for example: "slow and steady") when iterating.
+
+## Examples (short)
+
+### Calm support
+```
+Voice Affect: Calm and composed, reassuring.
+Tone: Sincere and empathetic.
+Pacing: Steady and moderate.
+Emotion: Warmth and genuine care.
+Pronunciation: Clear, with emphasis on key reassurances.
+Pauses: Brief pauses after apologies and before requests.
+```
+
+### Dramatic narrator
+```
+Voice Affect: Low and suspenseful.
+Tone: Serious and mysterious.
+Pacing: Slow and deliberate.
+Emotion: Restrained intensity.
+Emphasis: Highlight sensory details and cliffhanger lines.
+Pauses: Add pauses after suspenseful moments.
+```
+
+### Fitness instructor
+```
+Voice Affect: High energy and upbeat.
+Tone: Motivational and encouraging.
+Pacing: Fast and dynamic.
+Emotion: Enthusiasm and momentum.
+Emphasis: Stress action verbs and countdowns.
+```
+
+### Serene guide
+```
+Voice Affect: Soft and soothing.
+Tone: Calm and reassuring.
+Pacing: Slow and unhurried.
+Emotion: Peaceful warmth.
+Pauses: Gentle pauses after breathing cues.
+```
+
+### Robot agent
+```
+Voice Affect: Monotone and mechanical.
+Tone: Neutral and formal.
+Pacing: Even and controlled.
+Emotion: None; strictly informational.
+Pronunciation: Precise and consistent.
+```
+
+### Old-time announcer
+```
+Voice Affect: Refined and theatrical.
+Tone: Formal and welcoming.
+Pacing: Steady with a classic cadence.
+Emotion: Warm enthusiasm.
+Pronunciation: Crisp enunciation with vintage flair.
+```
@@ -0,0 +1,31 @@
+# Product demo / voiceover defaults
+
+## Suggested defaults
+- Voice: `cedar` (neutral) or `marin` (brighter)
+- Format: `wav` for video sync, `mp3` for quick review
+- Speed: `1.0`
+
+## Guidance
+- Keep tone confident and helpful.
+- Emphasize product benefits and call-to-action phrases.
+- Avoid overly dramatic delivery unless requested.
+
+## Instruction template
+```
+Voice Affect: Confident and composed.
+Tone: Helpful and upbeat.
+Pacing: Steady, slightly brisk.
+Emphasis: Stress product benefits and the call to action.
+```
+
+## Example (short)
+Input text:
+"Meet the new dashboard. Find insights faster and act with confidence."
+
+Instructions:
+```
+Voice Affect: Confident and composed.
+Tone: Helpful and upbeat.
+Pacing: Steady, slightly brisk.
+Emphasis: Stress "insights" and "confidence".
+```