mirror of
https://github.com/ksyasuda/SubMiner.git
synced 2026-03-20 12:11:28 -07:00
- Added hover-revealed ↗ button on SessionRow that navigates to the
anime media-detail view for the session's videoId
- Added `sessions` origin type to MediaDetailOrigin and
openSessionsMediaDetail() / closeMediaDetail() handling so the
back button returns correctly to the Sessions tab ("Back to Sessions")
- Wired onNavigateToMediaDetail down through SessionsTab → SessionRow
- Excluded tokens with MeCab POS3 = 助動詞語幹 (e.g. そうだ grammar tails)
from subtitle annotation metadata so frequency, JLPT, and N+1 styling
no longer apply to grammar-tail tokens
- Added annotation-stage unit test and end-to-end tokenizeSubtitle test
for the そうだ exclusion path
- Updated docs-site changelog, immersion-tracking, and
subtitle-annotations pages to reflect both changes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
140 lines
7.5 KiB
Markdown
140 lines
7.5 KiB
Markdown
# Subtitle Annotations
|
||
|
||
SubMiner annotates subtitle tokens in real time as they appear in the overlay. Four annotation layers work together to surface useful context while you watch: **N+1 highlighting**, **character-name highlighting**, **frequency highlighting**, and **JLPT tagging**.
|
||
|
||
All four are opt-in and configured under `subtitleStyle`, `ankiConnect.knownWords`, and `ankiConnect.nPlusOne` in your config. They apply independently — you can enable any combination.
|
||
|
||
Before any of those layers render, SubMiner strips annotation metadata from tokens that are usually just subtitle glue or annotation noise. Standalone particles, auxiliaries, adnominals, common explanatory endings like `んです` / `のだ`, merged trailing quote-particle forms like `...って`, auxiliary-stem grammar tails like `そうだ` (MeCab POS3 `助動詞語幹`), repeated kana interjections, and similar non-lexical helper tokens remain hoverable in the subtitle text, but they render as plain tokens without known-word, N+1, frequency, JLPT, or name-match annotation styling.
|
||
|
||
## N+1 Word Highlighting
|
||
|
||
N+1 highlighting identifies sentences where you know every word except one, making them ideal mining targets. When enabled, SubMiner builds a local cache of your known vocabulary from Anki and highlights tokens accordingly.
|
||
|
||
**How it works:**
|
||
|
||
1. SubMiner queries your Anki decks for existing `Expression` / `Word` field values.
|
||
2. The results are cached locally (`known-words-cache.json`) and refreshed on a configurable interval.
|
||
3. When a subtitle line appears, each token is checked against the cache.
|
||
4. If exactly one unknown word remains in the sentence, it is highlighted with `nPlusOneColor` (default: `#c6a0f6`).
|
||
5. Already-known tokens can optionally display in `knownWordColor` (default: `#a6da95`).
|
||
|
||
**Key settings:**
|
||
|
||
| Option | Default | Description |
|
||
| --- | --- | --- |
|
||
| `ankiConnect.knownWords.highlightEnabled` | `false` | Enable known-word cache lookups used by N+1 highlighting |
|
||
| `ankiConnect.knownWords.refreshMinutes` | `1440` | Minutes between Anki cache refreshes |
|
||
| `ankiConnect.knownWords.decks` | `[]` | Decks to query (falls back to `ankiConnect.deck`) |
|
||
| `ankiConnect.knownWords.matchMode` | `"headword"` | `"headword"` (dictionary form) or `"surface"` (raw text) |
|
||
| `ankiConnect.nPlusOne.minSentenceWords` | `3` | Minimum tokens in a sentence for N+1 to trigger |
|
||
| `ankiConnect.nPlusOne.nPlusOne` | `#c6a0f6` | Color for the single unknown target word |
|
||
| `ankiConnect.knownWords.color` | `#a6da95` | Color for already-known tokens |
|
||
|
||
::: tip
|
||
Set `refreshMinutes` to `1440` (24 hours) for daily sync if your Anki collection is large.
|
||
:::
|
||
|
||
## Character-Name Highlighting
|
||
|
||
Character-name matches are built from the active merged SubMiner character dictionary, which auto-syncs character data from AniList for your recently-watched titles. Matching names are highlighted in subtitles and become available for hover-driven Yomitan character profiles — portraits, roles, voice actors, and biographical detail.
|
||
|
||
**How it works:**
|
||
|
||
1. Subtitles are tokenized, then candidate name tokens are matched against the character dictionary via Yomitan's scanning pipeline.
|
||
2. Matching tokens receive a dedicated style distinct from N+1 and frequency layers.
|
||
3. This layer can be independently toggled with `subtitleStyle.nameMatchEnabled`.
|
||
|
||
**Key settings:**
|
||
|
||
| Option | Default | Description |
|
||
| --- | --- | --- |
|
||
| `subtitleStyle.nameMatchEnabled` | `true` | Enable character-name token highlighting |
|
||
| `subtitleStyle.nameMatchColor` | `#f5bde6` | Color used for character-name matches |
|
||
|
||
For full details on dictionary generation, name variant expansion, auto-sync lifecycle, and configuration, see the dedicated [Character Dictionary](/character-dictionary) page.
|
||
|
||
## Frequency Highlighting
|
||
|
||
Frequency highlighting colors tokens based on how common they are, using dictionary frequency rank data. This helps you spot high-value vocabulary at a glance.
|
||
|
||
**Modes:**
|
||
|
||
- **Single** — all highlighted tokens share one color (`singleColor`).
|
||
- **Banded** — tokens are assigned to five color bands from most common to least common within the `topX` window.
|
||
|
||
SubMiner looks up each token's `frequencyRank` from `term_meta_bank_*.json` files. Only tokens with a positive rank at or below `topX` are highlighted.
|
||
|
||
**Key settings:**
|
||
|
||
| Option | Default | Description |
|
||
| --- | --- | --- |
|
||
| `subtitleStyle.frequencyDictionary.enabled` | `false` | Enable frequency highlighting |
|
||
| `subtitleStyle.frequencyDictionary.topX` | `1000` | Max frequency rank to highlight |
|
||
| `subtitleStyle.frequencyDictionary.mode` | `"single"` | `"single"` or `"banded"` |
|
||
| `subtitleStyle.frequencyDictionary.matchMode` | `"headword"` | `"headword"` or `"surface"` |
|
||
| `subtitleStyle.frequencyDictionary.singleColor` | — | Color for single mode |
|
||
| `subtitleStyle.frequencyDictionary.bandedColors` | — | Array of five hex colors for banded mode |
|
||
| `subtitleStyle.frequencyDictionary.sourcePath` | — | Custom path to frequency dictionary root |
|
||
|
||
When `sourcePath` is omitted, SubMiner searches default install/runtime locations for `frequency-dictionary` directories automatically.
|
||
|
||
::: info
|
||
Frequency highlighting skips tokens that look like non-lexical noise (kana reduplication, short kana endings like `っ`), even when dictionary ranks exist.
|
||
:::
|
||
|
||
::: info
|
||
Frequency, JLPT, and N+1 metadata are only shown for tokens that survive the subtitle-annotation noise filter. Standalone grammar tokens like `は`, `です`, and `この` are intentionally left unannotated even if a dictionary can assign them metadata.
|
||
:::
|
||
|
||
## JLPT Tagging
|
||
|
||
JLPT tagging adds colored underlines to tokens based on their JLPT level (N1–N5), giving you an at-a-glance sense of difficulty distribution in each subtitle line.
|
||
|
||
**How it works:**
|
||
|
||
SubMiner loads offline `term_meta_bank_*.json` files from `vendor/yomitan-jlpt-vocab` and matches each token's headword against the bank entries. Tokens with a recognized JLPT level receive a colored underline.
|
||
|
||
**Default colors:**
|
||
|
||
| Level | Color | Preview |
|
||
| --- | --- | --- |
|
||
| N1 | `#ed8796` | Red |
|
||
| N2 | `#f5a97f` | Peach |
|
||
| N3 | `#f9e2af` | Yellow |
|
||
| N4 | `#a6e3a1` | Green |
|
||
| N5 | `#8aadf4` | Blue |
|
||
|
||
All colors are customizable via the `subtitleStyle.jlptColors` object.
|
||
|
||
**Key settings:**
|
||
|
||
| Option | Default | Description |
|
||
| --- | --- | --- |
|
||
| `subtitleStyle.enableJlpt` | `false` | Enable JLPT underline styling |
|
||
| `subtitleStyle.jlptColors.N1`–`N5` | see above | Per-level underline colors |
|
||
|
||
::: tip
|
||
JLPT tagging requires the offline vocabulary bundle. See [JLPT Vocabulary Bundle](jlpt-vocab-bundle) for setup instructions and file locations.
|
||
:::
|
||
|
||
## Runtime Toggles
|
||
|
||
All annotation layers can be toggled at runtime via the mpv command menu without restarting:
|
||
|
||
- `ankiConnect.knownWords.highlightEnabled` (`On` / `Off`)
|
||
- `subtitleStyle.nameMatchEnabled` (`On` / `Off`)
|
||
- `subtitleStyle.enableJlpt` (`On` / `Off`)
|
||
- `subtitleStyle.frequencyDictionary.enabled` (`On` / `Off`)
|
||
|
||
Toggles only apply to new subtitle lines after the change — the currently displayed line is not re-tokenized in place.
|
||
|
||
## Rendering Priority
|
||
|
||
When multiple annotations apply to the same token, the visual priority is:
|
||
|
||
1. **N+1 target** (highest) — the single unknown word in an N+1 sentence
|
||
2. **Character-name match** — dictionary-driven character-name token styling
|
||
3. **Known-word color** — already-learned token tint
|
||
4. **Frequency highlight** — common-word coloring (not applied when N+1/character-name/known-word already matched)
|
||
5. **JLPT underline** — level-based underline (stacks with the above since it uses underline rather than text color)
|