feat: add app-owned YouTube subtitle flow with absPlayer-style parsing (#31)

* fix: harden preload argv parsing for popup windows * fix: align youtube playback with shared overlay startup * fix: unwrap mpv youtube streams for anki media mining * docs: update docs for youtube subtitle and mining flow * refactor: unify cli and runtime wiring for startup and youtube flow * feat: update subtitle sidebar overlay behavior * chore: add shared log-file source for diagnostics * fix(ci): add changelog fragment for immersion changes * fix: address CodeRabbit review feedback * fix: persist canonical title from youtube metadata * style: format stats library tab * fix: address latest review feedback * style: format stats library files * test: stub launcher youtube deps in CI * test: isolate launcher youtube flow deps * test: stub launcher youtube deps in failing case * test: force x11 backend in launcher ci harness * test: address latest review feedback * fix(launcher): preserve user YouTube ytdl raw options * docs(backlog): update task tracking notes * fix(immersion): special-case youtube media paths in runtime and tracking * feat(stats): improve YouTube media metadata and picker key handling * fix(ci): format stats media library hook * fix: address latest CodeRabbit review items * docs: update youtube release notes and docs * feat: auto-load youtube subtitles before manual picker * fix: restore app-owned youtube subtitle flow * docs: update youtube playback docs and config copy * refactor: remove legacy youtube launcher mode plumbing * fix: refine youtube subtitle startup binding * docs: clarify youtube subtitle startup behavior * fix: address PR #31 latest review follow-ups * fix: address PR #31 follow-up review comments * test: harden youtube picker test harness * udpate backlog * fix: add timeout to youtube metadata probe * docs: refresh youtube and stats docs * update backlog * update backlog * chore: release v0.9.0
2026-05-13 08:12:54 -07:00 · 2026-03-24 00:01:24 -07:00
parent c17f0a4080
commit 5feed360ca
219 changed files with 12778 additions and 1052 deletions
@@ -17,6 +17,11 @@ For most users, start with this minimal configuration:
  "ankiConnect": {
    "enabled": true,
    "deck": "YourDeckName",
+    "knownWords": {
+      "decks": {
+        "YourDeckName": ["Word", "Word Reading", "Expression"]
+      }
+    },
    "fields": {
      "sentence": "Sentence",
      "audio": "Audio",
@@ -26,6 +31,8 @@ For most users, start with this minimal configuration:
 }
 ```

+`ankiConnect.deck` is still accepted for backward-compatible polling scope and legacy known-word fallback behavior. For known-word cache scope, prefer `ankiConnect.knownWords.decks` with deck-to-fields mapping.
+
 Then customize as needed using the sections below.

 ## Configuration File
@@ -120,7 +127,7 @@ The configuration file includes several main sections:
 - [**Discord Rich Presence**](#discord-rich-presence) - Optional Discord activity card updates
 - [**Immersion Tracking**](#immersion-tracking) - Track subtitle sessions and mining activity in SQLite
 - [**Stats Dashboard**](#stats-dashboard) - Local dashboard and overlay for immersion progress
- [**YouTube Subtitle Generation**](#youtube-subtitle-generation) - Launcher defaults for yt-dlp + local whisper fallback
+- [**YouTube Playback Settings**](#youtube-playback-settings) - Defaults for YouTube subtitle loading

 ## Core Settings

@@ -348,7 +355,8 @@ Configure the parsed-subtitle sidebar modal.
 ```json
 {
  "subtitleSidebar": {
-    "enabled": true,
+    "enabled": false,
+    "autoOpen": false,
    "layout": "overlay",
    "toggleKey": "Backslash",
    "pauseVideoOnHover": false,
@@ -362,12 +370,13 @@ Configure the parsed-subtitle sidebar modal.
 | Option                      | Values           | Description                                                                      |
 | --------------------------- | ---------------- | -------------------------------------------------------------------------------- |
 | `enabled`                   | boolean          | Enable subtitle sidebar support (`false` by default)                             |
+| `autoOpen`                  | boolean          | Open sidebar automatically on overlay startup (`false` by default)                |
 | `layout`                    | string           | `"overlay"` floats over mpv; `"embedded"` reserves right-side player space to mimic browser-like layout |
 | `toggleKey`                 | string           | `KeyboardEvent.code` used to open/close the sidebar (default: `"Backslash"`)     |
 | `pauseVideoOnHover`         | boolean          | Pause playback while hovering the sidebar cue list                               |
 | `autoScroll`                | boolean          | Keep the active cue in view while playback advances                              |
 | `maxWidth`                  | number           | Maximum sidebar width in CSS pixels (default: `420`)                             |
-| `opacity`                   | number           | Sidebar opacity between `0` and `1` (default: `0.78`)                            |
+| `opacity`                   | number           | Sidebar opacity between `0` and `1` (default: `0.95`)                            |
 | `backgroundColor`           | string           | Sidebar shell background color                                                   |
 | `textColor`                 | hex color        | Default cue text color                                                           |
 | `fontFamily`                | string           | CSS `font-family` value applied to sidebar cue text                             |
@@ -460,6 +469,7 @@ See `config.example.jsonc` for detailed configuration options and more examples.
 | `Space`              | `["cycle", "pause"]`         | Toggle pause                          |
 | `KeyJ`               | `["cycle", "sid"]`           | Cycle primary subtitle track          |
 | `Shift+KeyJ`         | `["cycle", "secondary-sid"]` | Cycle secondary subtitle track        |
+| `Ctrl+Alt+KeyC`    | `["__youtube-picker-open"]`     | Open the manual YouTube subtitle picker |
 | `ArrowRight`         | `["seek", 5]`                | Seek forward 5 seconds                |
 | `ArrowLeft`          | `["seek", -5]`               | Seek backward 5 seconds               |
 | `ArrowUp`            | `["seek", 60]`               | Seek forward 60 seconds               |
@@ -731,7 +741,7 @@ Palette controls:
 ### Shared AI Provider

 Shared OpenAI-compatible transport settings live at the top level under `ai`.
-Anki and YouTube subtitle cleanup both read this provider, then apply feature-local overrides where supported.
+Anki reads this provider directly. Legacy subtitle fallback keeps the same provider shape for compatibility, then applies feature-local overrides where supported.

 ```json
 {
@@ -751,12 +761,14 @@ Anki and YouTube subtitle cleanup both read this provider, then apply feature-lo
 | `apiKey`           | string                | Static API key for the shared provider               |
 | `apiKeyCommand`    | string                | Shell command used to resolve the API key            |
 | `baseUrl`          | string (URL)          | OpenAI-compatible base URL                           |
+| `model`            | string                | Optional model override for shared provider workflows   |
+| `systemPrompt`     | string                | Optional system prompt override for shared provider workflows |
 | `requestTimeoutMs` | integer milliseconds  | Shared request timeout (default: `15000`)            |

-SubMiner uses the shared provider in two places:
+SubMiner uses the shared provider for:

 - Anki translation/enrichment when `ankiConnect.ai.enabled` is `true`
- YouTube whisper subtitle post-processing when `youtubeSubgen.fixWithAi` is `true`
+- Legacy subtitle fallback compatibility when `youtubeSubgen.fixWithAi` is `true`

 ### AnkiConnect

@@ -840,8 +852,8 @@ This example is intentionally compact. The option table below documents availabl
 | `proxy.port`                            | number                                  | Bind port for local AnkiConnect proxy (default: `8766`)                                                                                       |
 | `proxy.upstreamUrl`                     | string (URL)                            | Upstream AnkiConnect URL that proxy forwards to (default: `http://127.0.0.1:8765`)                                                            |
 | `tags`                                  | array of strings                        | Tags automatically added to cards mined/updated by SubMiner (default: `['SubMiner']`; set `[]` to disable automatic tagging).                 |
-| `deck`                                  | string                                  | Anki deck to monitor for new cards                                                                                                            |
-| `ankiConnect.knownWords.decks`          | array of strings                        | Decks used for known-word cache lookups. When omitted/empty, falls back to `ankiConnect.deck`.                                                 |
+| `ankiConnect.deck`                      | string                                  | Legacy Anki polling/compatibility scope. Newer known-word cache scoping should use `ankiConnect.knownWords.decks`.                               |
+| `ankiConnect.knownWords.decks`          | object                                  | Deck→fields mapping for known-word cache queries (for example `{ "Kaishi 1.5k": ["Word", "Word Reading"] }`).                                  |
 | `fields.word`                           | string                                  | Card field for mined word / expression text (default: `Expression`)                                                                            |
 | `fields.audio`                          | string                                  | Card field for audio files (default: `ExpressionAudio`)                                                                                       |
 | `fields.image`                          | string                                  | Card field for images (default: `Picture`)                                                                                                    |
@@ -862,6 +874,7 @@ This example is intentionally compact. The option table below documents availabl
 | `media.animatedMaxWidth`                | number (px)                             | Max width for animated AVIF (default: `640`)                                                                                                  |
 | `media.animatedMaxHeight`               | number (px)                             | Optional max height for animated AVIF. Unset keeps source aspect-constrained height.                                                          |
 | `media.animatedCrf`                     | number (0-63)                           | CRF quality for AVIF; lower = higher quality (default: `35`)                                                                                  |
+| `media.syncAnimatedImageToWordAudio`    | `true`, `false`                         | Whether animated AVIF includes an opening frame synced to sentence word-audio timing (default: `true`). |
 | `media.audioPadding`                    | number (seconds)                        | Padding around audio clip timing (default: `0.5`)                                                                                             |
 | `media.fallbackDuration`                | number (seconds)                        | Default duration if timing unavailable (default: `3.0`)                                                                                       |
 | `media.maxMediaDuration`                | number (seconds)                        | Max duration for generated media from multi-line copy (default: `30`, `0` to disable)                                                         |
@@ -870,10 +883,11 @@ This example is intentionally compact. The option table below documents availabl
 | `behavior.mediaInsertMode`              | `"append"`, `"prepend"`                 | Where to insert new media when overwrite is off (default: `"append"`)                                                                         |
 | `behavior.highlightWord`                | `true`, `false`                         | Highlight the word in sentence context (default: `true`)                                                                                      |
 | `ankiConnect.knownWords.highlightEnabled` | `true`, `false`                       | Enable fast local highlighting for words already known in Anki (default: `false`)                                                             |
+| `ankiConnect.knownWords.addMinedWordsImmediately` | `true`, `false`                         | Add words from successful mines into the local known-word cache immediately (default: `true`)                                                   |
 | `ankiConnect.knownWords.color`          | hex color string                        | Text color for tokens already found in the local known-word cache (default: `"#a6da95"`).                                                     |
 | `ankiConnect.knownWords.matchMode`      | `"headword"`, `"surface"`               | Matching strategy for known-word highlighting (default: `"headword"`). `headword` uses token headwords; `surface` uses visible subtitle text. |
 | `ankiConnect.knownWords.refreshMinutes` | number                                  | Minutes between known-word cache refreshes (default: `1440`)                                                                                  |
-| `ankiConnect.knownWords.decks`          | array of strings                        | Decks used by known-word cache refresh. Leave empty for compatibility with legacy `deck` scope.                                               |
+| `ankiConnect.knownWords.decks`          | object                                  | Deck→fields mapping used for known-word cache query scope (e.g. `{ "Kaishi 1.5k": ["Word", "Word Reading"] }`).                              |
 | `ankiConnect.nPlusOne.nPlusOne`         | hex color string                        | Text color for the single target token to study when exactly one unknown candidate exists in a sentence (default: `"#c6a0f6"`).               |
 | `ankiConnect.nPlusOne.minSentenceWords` | number                                  | Minimum number of words required in a sentence before single unknown-word N+1 highlighting can trigger (default: `3`).                        |
 | `behavior.notificationType`             | `"osd"`, `"system"`, `"both"`, `"none"` | Notification type on card update (default: `"osd"`)                                                                                           |
@@ -919,7 +933,7 @@ Known-word cache policy:
 - `ankiConnect.nPlusOne.nPlusOne` sets the color for the single target token when exactly one eligible unknown word exists.
 - `ankiConnect.nPlusOne.minSentenceWords` sets the minimum token count required in a sentence for N+1 highlighting (default: `3`).
 - `ankiConnect.knownWords.color` sets the known-word highlight color for tokens already in Anki.
- `ankiConnect.knownWords.decks` accepts one or more decks. If empty, it uses the legacy single `ankiConnect.deck` value as scope.
+- `ankiConnect.knownWords.decks` accepts an object keyed by deck name. If omitted or empty, it falls back to the legacy `ankiConnect.deck` single-deck scope.
 - Cache state is persisted to `known-words-cache.json` under the app `userData` directory.
 - The cache is automatically invalidated when the configured scope changes (for example, when deck changes).
 - Cache lookups are in-memory. By default, token headwords are matched against cached `Expression` / `Word` values; set `ankiConnect.knownWords.matchMode` to `"surface"` for raw subtitle text matching.
@@ -1263,6 +1277,14 @@ Enable or disable local immersion analytics stored in SQLite for mined subtitles
 | `retention.monthlyRollupsDays` | integer (`0`-`36500`)         | Monthly rollup retention window. Default `0` (keep all).                                                    |
 | `retention.vacuumIntervalDays` | integer (`0`-`3650`)          | Minimum spacing between `VACUUM` passes. `0` disables vacuum. Default `0` (disabled).                         |

+You can also disable immersion tracking for a single session using:
+
+```bash
+SUBMINER_DISABLE_IMMERSION_TRACKING=1 subminer
+```
+
+When this is set, SubMiner skips immersion-tracker startup and does not initialize or read the immersion SQLite database for that session.
+
 Default behavior keeps raw events, telemetry, sessions, and rollups forever while still maintaining lifetime summary tables and daily/monthly rollups for faster reads. If you later want bounded retention, switch `retentionMode` or set explicit `retention.*` values.

 When `dbPath` is blank or omitted, SubMiner writes telemetry and session summaries to the default app-data location:
@@ -1283,7 +1305,7 @@ Configure the local stats UI served from SubMiner and the in-app stats overlay t
 {
  "stats": {
    "toggleKey": "Backquote",
-    "serverPort": 5175,
+    "serverPort": 6969,
    "autoStartServer": true,
    "autoOpenBrowser": true
  }
@@ -1293,7 +1315,7 @@ Configure the local stats UI served from SubMiner and the in-app stats overlay t
 | Option            | Values            | Description                                                                 |
 | ----------------- | ----------------- | --------------------------------------------------------------------------- |
 | `toggleKey`       | Electron key code | Overlay-local key code used to toggle the stats overlay. Default `Backquote`. |
-| `serverPort`      | integer           | Localhost port for the browser stats UI. Default `5175`.                    |
+| `serverPort`      | integer           | Localhost port for the browser stats UI. Default `6969`.                    |
 | `autoStartServer` | `true`, `false`   | Start the local stats HTTP server automatically once immersion tracking is active. Default `true`. |
 | `autoOpenBrowser` | `true`, `false`   | When `subminer stats` starts the server on demand, also open the dashboard in your default browser. Default `true`. |

@@ -1304,22 +1326,13 @@ Usage notes:
 - The dashboard reads from the same immersion-tracking database, so keep `immersionTracking.enabled` on if you want data to appear.
 - The UI includes Overview, Library, Trends, Vocabulary, and Sessions tabs.

-### YouTube Subtitle Generation
+### YouTube Playback Settings

-Set defaults used by the `subminer` launcher for YouTube subtitle generation:
+Set defaults used by the `subminer` launcher for YouTube subtitle loading:

 ```json
 {
  "youtubeSubgen": {
-    "whisperBin": "/path/to/whisper-cli",
-    "whisperModel": "/path/to/ggml-model.bin",
-    "whisperVadModel": "/path/to/ggml-vad.bin",
-    "whisperThreads": 4,
-    "fixWithAi": false,
-    "ai": {
-      "model": "openai/gpt-4o-mini",
-      "systemPrompt": "Fix subtitle mistakes only."
-    },
    "primarySubLanguages": ["ja", "jpn"]
  }
 }
@@ -1327,27 +1340,22 @@ Set defaults used by the `subminer` launcher for YouTube subtitle generation:

 | Option                | Values               | Description                                                                                    |
 | --------------------- | -------------------- | ---------------------------------------------------------------------------------------------- |
-| `whisperBin`          | string path          | Path to `whisper.cpp` CLI binary used as fallback transcription engine                        |
-| `whisperModel`        | string path          | Path to whisper model used by fallback transcription                                          |
-| `whisperVadModel`     | string path          | Optional whisper VAD model path                                                               |
-| `whisperThreads`      | integer              | Thread count passed to whisper runs                                                           |
-| `fixWithAi`           | `true`, `false`      | Run shared AI post-processing on whisper-generated subtitles                                  |
-| `ai.model`            | string               | Optional model override for YouTube AI subtitle cleanup                                       |
-| `ai.systemPrompt`     | string               | Optional system prompt override for YouTube AI subtitle cleanup                               |
-| `primarySubLanguages` | string[]             | Primary subtitle language priority for YouTube subtitle generation (default `["ja", "jpn"]`) |
+| `primarySubLanguages` | string[]             | Primary subtitle language priority for YouTube auto-loading (default `["ja", "jpn"]`) |

-Launcher behavior:
+Current launcher behavior:

- For YouTube URLs, subtitle generation now runs before mpv launch.
- SubMiner probes manual/native YouTube subtitle tracks first.
- Missing tracks fall back to local `whisper.cpp`.
- English secondary subtitles can use whisper translate fallback when no manual track exists.
- If `fixWithAi` is enabled, only whisper-generated `.srt` output is post-processed with the shared top-level `ai` provider.
+- For YouTube URLs, SubMiner probes subtitle tracks with yt-dlp after mpv bootstrap and binds auto-selected tracks before normal playback resumes.
+- If YouTube/mpv already exposes an authoritative matching subtitle track, SubMiner reuses it; otherwise it downloads and injects only the missing side.
+- SubMiner loads the primary subtitle plus a best-effort secondary subtitle.
+- Playback waits only for primary subtitle readiness; secondary failures do not block playback.
+- English secondary subtitles are selected from `secondarySub.secondarySubLanguages` when primary language matches are unavailable.
+- Native mpv secondary subtitle rendering stays hidden during this flow so the SubMiner overlay remains the visible secondary subtitle surface.
+- If primary subtitle loading fails, use `Ctrl+Alt+C` to open the subtitle modal and pick a track.

 Language targets are derived from subtitle config:

 - primary track: `youtubeSubgen.primarySubLanguages` (falls back to `["ja","jpn"]`)
 - secondary track: `secondarySub.secondarySubLanguages` (falls back to English when empty)
- Subtitle files are generated or downloaded before mpv starts; the older launcher mode switch has been removed.
+- Tracks are resolved and loaded before mpv starts; the older launcher mode switch has been removed.

 Precedence for launcher defaults is: CLI flag > environment variable > `config.jsonc` > built-in default.