diff --git a/backlog/tasks/task-25 - Add-frequency-dictionary-based-token-highlighting-with-configurable-top-X-and-color-ramp.md b/backlog/tasks/task-25 - Add-frequency-dictionary-based-token-highlighting-with-configurable-top-X-and-color-ramp.md index 51094ac..f65345b 100644 --- a/backlog/tasks/task-25 - Add-frequency-dictionary-based-token-highlighting-with-configurable-top-X-and-color-ramp.md +++ b/backlog/tasks/task-25 - Add-frequency-dictionary-based-token-highlighting-with-configurable-top-X-and-color-ramp.md @@ -3,11 +3,15 @@ id: TASK-25 title: >- Add frequency-dictionary-based token highlighting with configurable top-X and color ramp -status: To Do +status: Done assignee: [] created_date: '2026-02-13 16:47' +updated_date: '2026-02-16 06:48' labels: [] dependencies: [] +documentation: + - /Users/sudacode/.codex/worktrees/2089/SubMiner/docs/configuration.md + - /Users/sudacode/.codex/worktrees/2089/SubMiner/docs/jlpt-vocab-bundle.md priority: high --- @@ -19,20 +23,32 @@ Leverage user-installed frequency dictionaries to color subtitle tokens based on ## Acceptance Criteria -- [ ] #1 Add a feature flag and configuration for frequency-based highlighting with default disabled state. -- [ ] #2 Support selecting a user-installed frequency dictionary source and reading word frequency data from it. -- [ ] #3 Introduce a configurable top-X threshold in config for which words are eligible for frequency-based coloring. -- [ ] #4 When single-color mode is enabled, all matched words within the rank rule use the configured color. -- [ ] #5 When multi-color mode is enabled, map frequency bands to colors and color tokens by their actual rank bucket. -- [ ] #6 Ensure matching is token-aware (normalization/lowercasing handling) and preserves existing subtitle tokenization behavior. -- [ ] #7 Handle missing/unsupported dictionary formats and unknown words with deterministic no-highlight fallback. -- [ ] #8 Render underline/token highlights without breaking subtitle layout or interactions. -- [ ] #9 Add tests/verification for: single-color mode, color-band mode, threshold boundary, and disabled mode. -- [ ] #10 Document dictionary source format expectations, configuration example, and performance impact of ranking lookups. -- [ ] #11 If full automatic discovery of user-installed frequency dictionaries is not possible, provide clear configuration workflow/fallback path. +- [x] #1 Add a feature flag and configuration for frequency-based highlighting with default disabled state. +- [x] #2 Support selecting a user-installed frequency dictionary source and reading word frequency data from it. +- [x] #3 Introduce a configurable top-X threshold in config for which words are eligible for frequency-based coloring. +- [x] #4 When single-color mode is enabled, all matched words within the rank rule use the configured color. +- [x] #5 When multi-color mode is enabled, map frequency bands to colors and color tokens by their actual rank bucket. +- [x] #6 Ensure matching is token-aware (normalization/lowercasing handling) and preserves existing subtitle tokenization behavior. +- [x] #7 Handle missing/unsupported dictionary formats and unknown words with deterministic no-highlight fallback. +- [x] #8 Render underline/token highlights without breaking subtitle layout or interactions. +- [x] #9 Add tests/verification for: single-color mode, color-band mode, threshold boundary, and disabled mode. +- [x] #10 Document dictionary source format expectations, configuration example, and performance impact of ranking lookups. +- [x] #11 If full automatic discovery of user-installed frequency dictionaries is not possible, provide clear configuration workflow/fallback path. +## Implementation Notes + + +2026-02-16: Updated docs for frequency dictionary behavior. Clarified built-in fallback, precedence, and shared format expectations in and . + +Added docs references for frequency dictionary defaults and fallback behavior. + +As of 2026-02-16, docs and implementation are considered complete for TASK-25; frequency highlighting fallback, custom sourcePath precedence, topX, single/banded modes, token pipeline integration, and fallback behavior are present; documentation and tests exist in src/core/services and src/renderer. + +2026-02-16: Frequency-dictionary highlighting feature fully complete and shipped. Task acceptance criteria, DoD, and docs alignment are all marked complete in this task record. + + ## Definition of Done -- [ ] #1 Frequency-based highlighting renders using either single-color or banded-colors for valid matches, with configurable top-X threshold and documented setup. +- [x] #1 Frequency-based highlighting renders using either single-color or banded-colors for valid matches, with configurable top-X threshold and documented setup. diff --git a/docs/configuration.md b/docs/configuration.md index 6254de1..7e66a4c 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -556,7 +556,7 @@ See `config.example.jsonc` for detailed configuration options. | `backgroundColor` | string | Any CSS color, including `"transparent"` (default: `"rgba(54, 58, 79, 0.5)"`) | | `enableJlpt` | boolean | Enable JLPT level underline styling (`false` by default) | | `frequencyDictionary.enabled` | boolean | Enable frequency highlighting from dictionary lookups (`false` by default) | -| `frequencyDictionary.sourcePath` | string | Optional absolute path used for dictionary discovery (defaults to built-in paths) | +| `frequencyDictionary.sourcePath` | string | Path to a local frequency dictionary root. Leave empty or omit to use the built-in bundled dictionary search paths. | | `frequencyDictionary.topX` | number | Only color tokens whose frequency rank is `<= topX` (`1000` by default) | | `frequencyDictionary.mode` | string | `"single"` or `"banded"` (`"single"` by default) | | `frequencyDictionary.singleColor` | string | Color used for all highlighted tokens in single mode | @@ -568,7 +568,15 @@ See `config.example.jsonc` for detailed configuration options. JLPT underlining is powered by offline term-meta bank files at runtime. See [`docs/jlpt-vocab-bundle.md`](jlpt-vocab-bundle.md) for required files, source/version refresh steps, and deterministic fallback behavior. -Frequency dictionary highlighting uses the same dictionary file format as JLPT bundle lookups (`term_meta_bank_*.json` under discovered dictionary directories). A token is highlighted when it has a positive integer `frequencyRank` (lower is more common) and the rank is within `topX`. In `single` mode all highlights use `singleColor`; in `banded` mode tokens map to five ascending color bands from most common to least common inside the topX window. +Frequency dictionary highlighting uses the same dictionary file format as JLPT bundle lookups (`term_meta_bank_*.json` under discovered dictionary directories). A token is highlighted when it has a positive integer `frequencyRank` (lower is more common) and the rank is within `topX`. + +Lookup behavior: + +- Set `frequencyDictionary.sourcePath` to a directory containing `term_meta_bank_*.json` for a fully custom source. +- If `sourcePath` is missing or empty, SubMiner uses bundled defaults from `vendor/jiten_freq_global` (packaged under `/jiten_freq_global` in distribution builds). +- In both cases, only terms with a valid `frequencyRank` are used; everything else falls back to no highlighting. + +In `single` mode all highlights use `singleColor`; in `banded` mode tokens map to five ascending color bands from most common to least common inside the topX window. Secondary subtitle defaults: `fontSize: 24`, `fontColor: "#ffffff"`, `backgroundColor: "transparent"`. Any property not set in `secondary` falls back to the CSS defaults. diff --git a/docs/jlpt-vocab-bundle.md b/docs/jlpt-vocab-bundle.md index 7f6cfbc..fb3f21c 100644 --- a/docs/jlpt-vocab-bundle.md +++ b/docs/jlpt-vocab-bundle.md @@ -26,6 +26,8 @@ The expected files are: Each bank maps terms to frequency metadata; only entries with a `frequency.displayValue` are considered for JLPT tagging. +SubMiner also reuses the same `term_meta_bank_*.json` format for frequency-based subtitle highlighting. The default frequency source is now bundled as `vendor/jiten_freq_global`, so users can enable `subtitleStyle.frequencyDictionary` without extra setup. + ## Source and update process For reproducible updates: