move docs back repo

2026-05-05 00:41:30 -07:00 · 2026-03-10 19:47:16 -07:00
parent f7ce3371a1
commit 5f320edab5
73 changed files with 7813 additions and 53 deletions
@@ -0,0 +1,133 @@
+# Subtitle Annotations
+
+SubMiner annotates subtitle tokens in real time as they appear in the overlay. Four annotation layers work together to surface useful context while you watch: **N+1 highlighting**, **character-name highlighting**, **frequency highlighting**, and **JLPT tagging**.
+
+All four are opt-in and configured under `subtitleStyle` and `ankiConnect.nPlusOne` in your config. They apply independently — you can enable any combination.
+
+## N+1 Word Highlighting
+
+N+1 highlighting identifies sentences where you know every word except one, making them ideal mining targets. When enabled, SubMiner builds a local cache of your known vocabulary from Anki and highlights tokens accordingly.
+
+**How it works:**
+
+1. SubMiner queries your Anki decks for existing `Expression` / `Word` field values.
+2. The results are cached locally (`known-words-cache.json`) and refreshed on a configurable interval.
+3. When a subtitle line appears, each token is checked against the cache.
+4. If exactly one unknown word remains in the sentence, it is highlighted with `nPlusOneColor` (default: `#c6a0f6`).
+5. Already-known tokens can optionally display in `knownWordColor` (default: `#a6da95`).
+
+**Key settings:**
+
+| Option | Default | Description |
+| --- | --- | --- |
+| `ankiConnect.nPlusOne.highlightEnabled` | `false` | Enable N+1 highlighting |
+| `ankiConnect.nPlusOne.refreshMinutes` | `60` | Minutes between Anki cache refreshes |
+| `ankiConnect.nPlusOne.decks` | `[]` | Decks to query (falls back to `ankiConnect.deck`) |
+| `ankiConnect.nPlusOne.matchMode` | `"headword"` | `"headword"` (dictionary form) or `"surface"` (raw text) |
+| `ankiConnect.nPlusOne.minSentenceWords` | `3` | Minimum tokens in a sentence for N+1 to trigger |
+| `subtitleStyle.nPlusOneColor` | `#c6a0f6` | Color for the single unknown target word |
+| `subtitleStyle.knownWordColor` | `#a6da95` | Color for already-known tokens |
+
+::: tip
+Set `refreshMinutes` to `1440` (24 hours) for daily sync if your Anki collection is large.
+:::
+
+## Character-Name Highlighting
+
+Character-name matches are built from the active merged SubMiner character dictionary, which auto-syncs character data from AniList for your recently-watched titles. Matching names are highlighted in subtitles and become clickable for full character profiles — portraits, roles, voice actors, and biographical detail.
+
+**How it works:**
+
+1. Subtitles are tokenized, then candidate name tokens are matched against the character dictionary via Yomitan's scanning pipeline.
+2. Matching tokens receive a dedicated style distinct from N+1 and frequency layers.
+3. This layer can be independently toggled with `subtitleStyle.nameMatchEnabled`.
+
+**Key settings:**
+
+| Option | Default | Description |
+| --- | --- | --- |
+| `subtitleStyle.nameMatchEnabled` | `true` | Enable character-name token highlighting |
+| `subtitleStyle.nameMatchColor` | `#f5bde6` | Color used for character-name matches |
+
+For full details on dictionary generation, name variant expansion, auto-sync lifecycle, and configuration, see the dedicated [Character Dictionary](/character-dictionary) page.
+
+## Frequency Highlighting
+
+Frequency highlighting colors tokens based on how common they are, using dictionary frequency rank data. This helps you spot high-value vocabulary at a glance.
+
+**Modes:**
+
+- **Single** — all highlighted tokens share one color (`singleColor`).
+- **Banded** — tokens are assigned to five color bands from most common to least common within the `topX` window.
+
+SubMiner looks up each token's `frequencyRank` from `term_meta_bank_*.json` files. Only tokens with a positive rank at or below `topX` are highlighted.
+
+**Key settings:**
+
+| Option | Default | Description |
+| --- | --- | --- |
+| `subtitleStyle.frequencyDictionary.enabled` | `false` | Enable frequency highlighting |
+| `subtitleStyle.frequencyDictionary.topX` | `1000` | Max frequency rank to highlight |
+| `subtitleStyle.frequencyDictionary.mode` | `"single"` | `"single"` or `"banded"` |
+| `subtitleStyle.frequencyDictionary.matchMode` | `"headword"` | `"headword"` or `"surface"` |
+| `subtitleStyle.frequencyDictionary.singleColor` | — | Color for single mode |
+| `subtitleStyle.frequencyDictionary.bandedColors` | — | Array of five hex colors for banded mode |
+| `subtitleStyle.frequencyDictionary.sourcePath` | — | Custom path to frequency dictionary root |
+
+When `sourcePath` is omitted, SubMiner searches default install/runtime locations for `frequency-dictionary` directories automatically.
+
+::: info
+Frequency highlighting skips tokens that look like non-lexical noise (kana reduplication, short kana endings like `っ`), even when dictionary ranks exist.
+:::
+
+## JLPT Tagging
+
+JLPT tagging adds colored underlines to tokens based on their JLPT level (N1–N5), giving you an at-a-glance sense of difficulty distribution in each subtitle line.
+
+**How it works:**
+
+SubMiner loads offline `term_meta_bank_*.json` files from `vendor/yomitan-jlpt-vocab` and matches each token's headword against the bank entries. Tokens with a recognized JLPT level receive a colored underline.
+
+**Default colors:**
+
+| Level | Color | Preview |
+| --- | --- | --- |
+| N1 | `#ed8796` | Red |
+| N2 | `#f5a97f` | Peach |
+| N3 | `#f9e2af` | Yellow |
+| N4 | `#a6e3a1` | Green |
+| N5 | `#8aadf4` | Blue |
+
+All colors are customizable via the `subtitleStyle.jlptColors` object.
+
+**Key settings:**
+
+| Option | Default | Description |
+| --- | --- | --- |
+| `subtitleStyle.enableJlpt` | `false` | Enable JLPT underline styling |
+| `subtitleStyle.jlptColors.N1`–`N5` | see above | Per-level underline colors |
+
+::: tip
+JLPT tagging requires the offline vocabulary bundle. See [JLPT Vocabulary Bundle](jlpt-vocab-bundle) for setup instructions and file locations.
+:::
+
+## Runtime Toggles
+
+All annotation layers can be toggled at runtime via the mpv command menu without restarting:
+
+- `ankiConnect.nPlusOne.highlightEnabled` (`On` / `Off`)
+- `subtitleStyle.nameMatchEnabled` (`On` / `Off`)
+- `subtitleStyle.enableJlpt` (`On` / `Off`)
+- `subtitleStyle.frequencyDictionary.enabled` (`On` / `Off`)
+
+Toggles only apply to new subtitle lines after the change — the currently displayed line is not re-tokenized in place.
+
+## Rendering Priority
+
+When multiple annotations apply to the same token, the visual priority is:
+
+1. **N+1 target** (highest) — the single unknown word in an N+1 sentence
+2. **Character-name match** — dictionary-driven character-name token styling
+3. **Known-word color** — already-learned token tint
+4. **Frequency highlight** — common-word coloring (not applied when N+1/character-name/known-word already matched)
+5. **JLPT underline** — level-based underline (stacks with the above since it uses underline rather than text color)