mirror of
https://github.com/ksyasuda/dotfiles.git
synced 2026-03-20 18:11:27 -07:00
update skills
This commit is contained in:
32
.agents/skills/speech/references/accessibility.md
Normal file
32
.agents/skills/speech/references/accessibility.md
Normal file
@@ -0,0 +1,32 @@
|
||||
# Accessibility read defaults
|
||||
|
||||
## Suggested defaults
|
||||
- Voice: `cedar`
|
||||
- Format: `mp3` or `wav`
|
||||
- Speed: `0.95` to `1.0`
|
||||
|
||||
## Guidance
|
||||
- Keep delivery steady and neutral.
|
||||
- Enunciate acronyms and numbers.
|
||||
- Avoid dramatic or stylized delivery.
|
||||
|
||||
## Instruction template
|
||||
```
|
||||
Voice Affect: Neutral and clear.
|
||||
Tone: Informational and steady.
|
||||
Pacing: Slow and consistent.
|
||||
Pronunciation: Enunciate acronyms and numbers.
|
||||
Emphasis: Stress key warnings or labels.
|
||||
```
|
||||
|
||||
## Example (short)
|
||||
Input text:
|
||||
"Warning: High voltage. Keep hands clear."
|
||||
|
||||
Instructions:
|
||||
```
|
||||
Voice Affect: Neutral and clear.
|
||||
Tone: Informational and steady.
|
||||
Pacing: Slow and consistent.
|
||||
Emphasis: Stress "Warning" and "High voltage".
|
||||
```
|
||||
31
.agents/skills/speech/references/audio-api.md
Normal file
31
.agents/skills/speech/references/audio-api.md
Normal file
@@ -0,0 +1,31 @@
|
||||
# Audio Speech API quick reference
|
||||
|
||||
## Endpoint
|
||||
- Create speech: `POST /v1/audio/speech`
|
||||
|
||||
## Default model
|
||||
- `gpt-4o-mini-tts-2025-12-15`
|
||||
|
||||
## Other speech models (if requested)
|
||||
- `gpt-4o-mini-tts`
|
||||
- `tts-1`
|
||||
- `tts-1-hd`
|
||||
|
||||
## Core parameters
|
||||
- `model`: speech model
|
||||
- `input`: text to synthesize (max 4096 characters)
|
||||
- `voice`: built-in voice name
|
||||
- `instructions`: optional style directions (not supported for `tts-1` or `tts-1-hd`)
|
||||
- `response_format`: `mp3`, `opus`, `aac`, `flac`, `wav`, or `pcm`
|
||||
- `speed`: 0.25 to 4.0
|
||||
|
||||
## Built-in voices
|
||||
- `alloy`, `ash`, `ballad`, `cedar`, `coral`, `echo`, `fable`, `marin`, `nova`, `onyx`, `sage`, `shimmer`, `verse`
|
||||
|
||||
## Output notes
|
||||
- Default format is `mp3`.
|
||||
- `pcm` is raw 24 kHz 16-bit little-endian samples (no header).
|
||||
- `wav` includes a header (better for quick playback).
|
||||
|
||||
## Compliance note
|
||||
- Provide a clear disclosure that the voice is AI-generated.
|
||||
99
.agents/skills/speech/references/cli.md
Normal file
99
.agents/skills/speech/references/cli.md
Normal file
@@ -0,0 +1,99 @@
|
||||
# CLI reference (`scripts/text_to_speech.py`)
|
||||
|
||||
This file contains the "command catalog" for the bundled speech generation CLI. Keep `SKILL.md` as overview-first; put verbose CLI details here.
|
||||
|
||||
## What this CLI does
|
||||
- `speak`: generate a single audio file
|
||||
- `speak-batch`: run many jobs from a JSONL file (one job per line)
|
||||
- `list-voices`: list supported voices
|
||||
|
||||
Real API calls require network access + `OPENAI_API_KEY`. `--dry-run` does not.
|
||||
|
||||
## Quick start (works from any repo)
|
||||
Set a stable path to the skill CLI (default `CODEX_HOME` is `~/.codex`):
|
||||
|
||||
```
|
||||
export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
|
||||
export TTS_GEN="$CODEX_HOME/skills/speech/scripts/text_to_speech.py"
|
||||
```
|
||||
|
||||
Dry-run (no API call; no network required; does not require the `openai` package):
|
||||
|
||||
```
|
||||
python "$TTS_GEN" speak --input "Test" --dry-run
|
||||
```
|
||||
|
||||
Generate (requires `OPENAI_API_KEY` + network):
|
||||
|
||||
```
|
||||
uv run --with openai python "$TTS_GEN" speak \
|
||||
--input "Today is a wonderful day to build something people love!" \
|
||||
--voice cedar \
|
||||
--instructions "Voice Affect: Warm and composed. Tone: upbeat and encouraging." \
|
||||
--response-format mp3 \
|
||||
--out speech.mp3
|
||||
```
|
||||
|
||||
No `uv` installed? Use your active Python env:
|
||||
|
||||
```
|
||||
python "$TTS_GEN" speak --input "Hello" --voice cedar --out speech.mp3
|
||||
```
|
||||
|
||||
## Guardrails (important)
|
||||
- Use `python "$TTS_GEN" ...` (or equivalent full path) for all TTS work.
|
||||
- Do **not** create one-off runners (e.g., `gen_audio.py`) unless the user explicitly asks.
|
||||
- **Never modify** `scripts/text_to_speech.py`. If something is missing, ask the user before doing anything else.
|
||||
|
||||
## Defaults (unless overridden by flags)
|
||||
- Model: `gpt-4o-mini-tts-2025-12-15`
|
||||
- Voice: `cedar`
|
||||
- Response format: `mp3`
|
||||
- Speed: `1.0`
|
||||
- Batch rpm cap: `50`
|
||||
|
||||
## Input limits
|
||||
- Input text must be <= 4096 characters per request.
|
||||
- For longer text, split into smaller chunks (manual or via batch JSONL).
|
||||
|
||||
## Instructions compatibility
|
||||
- `instructions` are supported for GPT-4o mini TTS models.
|
||||
- `tts-1` and `tts-1-hd` ignore instructions (the CLI will warn and drop them).
|
||||
|
||||
## Common recipes
|
||||
|
||||
List voices:
|
||||
```
|
||||
python "$TTS_GEN" list-voices
|
||||
```
|
||||
|
||||
Generate with explicit pacing:
|
||||
```
|
||||
python "$TTS_GEN" speak \
|
||||
--input "Welcome to the demo. We'll show how it works." \
|
||||
--instructions "Tone: friendly and confident. Pacing: steady and moderate." \
|
||||
--out demo.mp3
|
||||
```
|
||||
|
||||
Batch generation (JSONL):
|
||||
```
|
||||
mkdir -p tmp/speech
|
||||
cat > tmp/speech/jobs.jsonl << 'JSONL'
|
||||
{"input":"Thank you for calling. Please hold.","voice":"cedar","response_format":"mp3","out":"hold.mp3"}
|
||||
{"input":"For sales, press 1. For support, press 2.","voice":"marin","instructions":"Tone: clear and neutral. Pacing: slow.","response_format":"wav"}
|
||||
JSONL
|
||||
|
||||
python "$TTS_GEN" speak-batch --input tmp/speech/jobs.jsonl --out-dir out --rpm 50
|
||||
|
||||
# Cleanup (recommended)
|
||||
rm -f tmp/speech/jobs.jsonl
|
||||
```
|
||||
|
||||
Notes:
|
||||
- Use `--rpm` to control rate limiting (default `50`, max `50`).
|
||||
- Per-job overrides are supported in JSONL (`model`, `voice`, `response_format`, `speed`, `instructions`, `out`).
|
||||
- Treat the JSONL file as temporary: write it under `tmp/` and delete it after the run (do not commit it).
|
||||
|
||||
## See also
|
||||
- API parameter quick reference: `references/audio-api.md`
|
||||
- Instruction patterns and examples: `references/voice-directions.md`
|
||||
28
.agents/skills/speech/references/codex-network.md
Normal file
28
.agents/skills/speech/references/codex-network.md
Normal file
@@ -0,0 +1,28 @@
|
||||
# Codex network approvals / sandbox notes
|
||||
|
||||
This guidance is intentionally isolated from `SKILL.md` because it can vary by environment and may become stale. Prefer the defaults in your environment when in doubt.
|
||||
|
||||
## Why am I asked to approve every speech generation call?
|
||||
Speech generation uses the OpenAI Audio API, so the CLI needs outbound network access. In many Codex setups, network access is disabled by default (especially under stricter sandbox modes), and/or the approval policy may require confirmation before networked commands run.
|
||||
|
||||
## How do I reduce repeated approval prompts (network)?
|
||||
If you trust the repo and want fewer prompts, enable network access for the relevant sandbox mode and relax the approval policy.
|
||||
|
||||
Example `~/.codex/config.toml` pattern:
|
||||
|
||||
```
|
||||
approval_policy = "never"
|
||||
sandbox_mode = "workspace-write"
|
||||
|
||||
[sandbox_workspace_write]
|
||||
network_access = true
|
||||
```
|
||||
|
||||
Or for a single session:
|
||||
|
||||
```
|
||||
codex --sandbox workspace-write --ask-for-approval never
|
||||
```
|
||||
|
||||
## Safety note
|
||||
Use caution: enabling network and disabling approvals reduces friction but increases risk if you run untrusted code or work in an untrusted repository.
|
||||
32
.agents/skills/speech/references/ivr.md
Normal file
32
.agents/skills/speech/references/ivr.md
Normal file
@@ -0,0 +1,32 @@
|
||||
# IVR / phone prompt defaults
|
||||
|
||||
## Suggested defaults
|
||||
- Voice: `cedar` (clear) or `marin` (brighter)
|
||||
- Format: `wav`
|
||||
- Speed: `0.9` to `1.0`
|
||||
|
||||
## Guidance
|
||||
- Prioritize clarity and slower pacing.
|
||||
- Enunciate numbers and menu options.
|
||||
- Keep sentences short and consistent.
|
||||
|
||||
## Instruction template
|
||||
```
|
||||
Voice Affect: Clear and neutral.
|
||||
Tone: Professional and concise.
|
||||
Pacing: Slow and even.
|
||||
Pronunciation: Enunciate numbers and menu options.
|
||||
Emphasis: Stress the option numbers.
|
||||
```
|
||||
|
||||
## Example (short)
|
||||
Input text:
|
||||
"For sales, press 1. For support, press 2."
|
||||
|
||||
Instructions:
|
||||
```
|
||||
Voice Affect: Clear and neutral.
|
||||
Tone: Professional and concise.
|
||||
Pacing: Slow and even.
|
||||
Emphasis: Stress "press 1" and "press 2".
|
||||
```
|
||||
31
.agents/skills/speech/references/narration.md
Normal file
31
.agents/skills/speech/references/narration.md
Normal file
@@ -0,0 +1,31 @@
|
||||
# Narration / explainer defaults
|
||||
|
||||
## Suggested defaults
|
||||
- Voice: `cedar`
|
||||
- Format: `mp3`
|
||||
- Speed: `1.0`
|
||||
|
||||
## Guidance
|
||||
- Keep pacing steady and clear.
|
||||
- Emphasize section headings and key transitions.
|
||||
- If the script is long, chunk it into logical paragraphs.
|
||||
|
||||
## Instruction template
|
||||
```
|
||||
Voice Affect: Warm and composed.
|
||||
Tone: Friendly and confident.
|
||||
Pacing: Steady and moderate.
|
||||
Emphasis: Stress section titles and key terms.
|
||||
Pauses: Brief pause after each section.
|
||||
```
|
||||
|
||||
## Example (short)
|
||||
Input text:
|
||||
"Welcome to the demo. Today we'll show how it works."
|
||||
|
||||
Instructions:
|
||||
```
|
||||
Voice Affect: Warm and composed.
|
||||
Tone: Friendly and confident.
|
||||
Pacing: Steady and moderate.
|
||||
```
|
||||
38
.agents/skills/speech/references/prompting.md
Normal file
38
.agents/skills/speech/references/prompting.md
Normal file
@@ -0,0 +1,38 @@
|
||||
# Instructioning best practices (TTS)
|
||||
|
||||
## Contents
|
||||
- Structure
|
||||
- Specificity
|
||||
- Avoiding conflicts
|
||||
- Pronunciation and names
|
||||
- Pauses and pacing
|
||||
- Iterate deliberately
|
||||
- Where to find copy/paste recipes
|
||||
|
||||
## Structure
|
||||
- Use a consistent order: affect -> tone -> pacing -> emotion -> pronunciation/pauses -> emphasis -> delivery.
|
||||
- For complex requests, use short labeled lines instead of a long paragraph.
|
||||
|
||||
## Specificity
|
||||
- Name the delivery you want ("calm and steady" vs "friendly").
|
||||
- If you need a specific cadence, call it out explicitly ("slow and measured", "brisk and energetic").
|
||||
|
||||
## Avoiding conflicts
|
||||
- Do not mix opposing instructions ("fast and slow", "formal and casual").
|
||||
- Keep instructions short: 4 to 8 lines are usually enough.
|
||||
|
||||
## Pronunciation and names
|
||||
- For acronyms, write the pronunciation hint in text ("A-I" instead of "AI").
|
||||
- For names or brands, add a simple phonetic guide in the input text if clarity matters.
|
||||
- If a word must be emphasized, add an Emphasis line and repeat the word exactly.
|
||||
|
||||
## Pauses and pacing
|
||||
- Use punctuation or short line breaks in the input text to create natural pauses.
|
||||
- Use the Pauses line for intentional pauses ("pause after the greeting").
|
||||
|
||||
## Iterate deliberately
|
||||
- Start with a clean base instruction set, then make one change at a time.
|
||||
- Repeat critical constraints on each iteration ("keep pacing steady").
|
||||
|
||||
## Where to find copy/paste recipes
|
||||
For copy/paste instruction templates, see `references/sample-prompts.md`. This file focuses on principles, structure, and iteration patterns.
|
||||
44
.agents/skills/speech/references/sample-prompts.md
Normal file
44
.agents/skills/speech/references/sample-prompts.md
Normal file
@@ -0,0 +1,44 @@
|
||||
# Sample instruction templates (copy/paste)
|
||||
|
||||
These are short instruction blocks. Use only the lines you need and keep them consistent with the input text.
|
||||
|
||||
## Friendly product demo
|
||||
```
|
||||
Voice Affect: Warm and composed.
|
||||
Tone: Friendly and confident.
|
||||
Pacing: Steady and moderate.
|
||||
Emphasis: Stress key product benefits.
|
||||
```
|
||||
|
||||
## Calm support update
|
||||
```
|
||||
Voice Affect: Calm and reassuring.
|
||||
Tone: Sincere and empathetic.
|
||||
Pacing: Slow and steady.
|
||||
Emotion: Warmth and care.
|
||||
Pauses: Brief pause after apologies.
|
||||
```
|
||||
|
||||
## IVR menu
|
||||
```
|
||||
Voice Affect: Clear and neutral.
|
||||
Tone: Professional and concise.
|
||||
Pacing: Slow and even.
|
||||
Emphasis: Stress menu options and numbers.
|
||||
```
|
||||
|
||||
## Accessibility readout
|
||||
```
|
||||
Voice Affect: Neutral and clear.
|
||||
Tone: Informational and steady.
|
||||
Pacing: Slow and consistent.
|
||||
Pronunciation: Enunciate acronyms and numbers.
|
||||
```
|
||||
|
||||
## Energetic intro
|
||||
```
|
||||
Voice Affect: Bright and upbeat.
|
||||
Tone: Enthusiastic and welcoming.
|
||||
Pacing: Brisk but clear.
|
||||
Emphasis: Stress the opening greeting.
|
||||
```
|
||||
80
.agents/skills/speech/references/voice-directions.md
Normal file
80
.agents/skills/speech/references/voice-directions.md
Normal file
@@ -0,0 +1,80 @@
|
||||
# Voice directions
|
||||
|
||||
## Template
|
||||
Use only the lines you need. Keep directions concise and aligned to the input text.
|
||||
|
||||
```
|
||||
Voice Affect: <overall character and texture>
|
||||
Tone: <attitude, formality, warmth>
|
||||
Pacing: <slow, steady, brisk>
|
||||
Emotion: <key emotions to convey>
|
||||
Pronunciation: <words to enunciate or emphasize>
|
||||
Pauses: <where to insert brief pauses>
|
||||
Emphasis: <key phrases to stress>
|
||||
Delivery: <cadence or rhythm notes>
|
||||
```
|
||||
|
||||
## Best practices
|
||||
- Keep 4 to 8 short lines. Avoid conflicting instructions.
|
||||
- Prefer concrete guidance over adjectives alone.
|
||||
- Do not rewrite the input text in the instructions; only guide delivery.
|
||||
- If you need a language or accent, write the input text in that language.
|
||||
- Repeat critical constraints (for example: "slow and steady") when iterating.
|
||||
|
||||
## Examples (short)
|
||||
|
||||
### Calm support
|
||||
```
|
||||
Voice Affect: Calm and composed, reassuring.
|
||||
Tone: Sincere and empathetic.
|
||||
Pacing: Steady and moderate.
|
||||
Emotion: Warmth and genuine care.
|
||||
Pronunciation: Clear, with emphasis on key reassurances.
|
||||
Pauses: Brief pauses after apologies and before requests.
|
||||
```
|
||||
|
||||
### Dramatic narrator
|
||||
```
|
||||
Voice Affect: Low and suspenseful.
|
||||
Tone: Serious and mysterious.
|
||||
Pacing: Slow and deliberate.
|
||||
Emotion: Restrained intensity.
|
||||
Emphasis: Highlight sensory details and cliffhanger lines.
|
||||
Pauses: Add pauses after suspenseful moments.
|
||||
```
|
||||
|
||||
### Fitness instructor
|
||||
```
|
||||
Voice Affect: High energy and upbeat.
|
||||
Tone: Motivational and encouraging.
|
||||
Pacing: Fast and dynamic.
|
||||
Emotion: Enthusiasm and momentum.
|
||||
Emphasis: Stress action verbs and countdowns.
|
||||
```
|
||||
|
||||
### Serene guide
|
||||
```
|
||||
Voice Affect: Soft and soothing.
|
||||
Tone: Calm and reassuring.
|
||||
Pacing: Slow and unhurried.
|
||||
Emotion: Peaceful warmth.
|
||||
Pauses: Gentle pauses after breathing cues.
|
||||
```
|
||||
|
||||
### Robot agent
|
||||
```
|
||||
Voice Affect: Monotone and mechanical.
|
||||
Tone: Neutral and formal.
|
||||
Pacing: Even and controlled.
|
||||
Emotion: None; strictly informational.
|
||||
Pronunciation: Precise and consistent.
|
||||
```
|
||||
|
||||
### Old-time announcer
|
||||
```
|
||||
Voice Affect: Refined and theatrical.
|
||||
Tone: Formal and welcoming.
|
||||
Pacing: Steady with a classic cadence.
|
||||
Emotion: Warm enthusiasm.
|
||||
Pronunciation: Crisp enunciation with vintage flair.
|
||||
```
|
||||
31
.agents/skills/speech/references/voiceover.md
Normal file
31
.agents/skills/speech/references/voiceover.md
Normal file
@@ -0,0 +1,31 @@
|
||||
# Product demo / voiceover defaults
|
||||
|
||||
## Suggested defaults
|
||||
- Voice: `cedar` (neutral) or `marin` (brighter)
|
||||
- Format: `wav` for video sync, `mp3` for quick review
|
||||
- Speed: `1.0`
|
||||
|
||||
## Guidance
|
||||
- Keep tone confident and helpful.
|
||||
- Emphasize product benefits and call-to-action phrases.
|
||||
- Avoid overly dramatic delivery unless requested.
|
||||
|
||||
## Instruction template
|
||||
```
|
||||
Voice Affect: Confident and composed.
|
||||
Tone: Helpful and upbeat.
|
||||
Pacing: Steady, slightly brisk.
|
||||
Emphasis: Stress product benefits and the call to action.
|
||||
```
|
||||
|
||||
## Example (short)
|
||||
Input text:
|
||||
"Meet the new dashboard. Find insights faster and act with confidence."
|
||||
|
||||
Instructions:
|
||||
```
|
||||
Voice Affect: Confident and composed.
|
||||
Tone: Helpful and upbeat.
|
||||
Pacing: Steady, slightly brisk.
|
||||
Emphasis: Stress "insights" and "confidence".
|
||||
```
|
||||
Reference in New Issue
Block a user