Mark TASK-25 as done

This commit is contained in:
2026-02-15 22:48:56 -08:00
parent 01a48f4714
commit 8e9d392b21
3 changed files with 41 additions and 15 deletions

View File

@@ -3,11 +3,15 @@ id: TASK-25
title: >-
Add frequency-dictionary-based token highlighting with configurable top-X and
color ramp
status: To Do
status: Done
assignee: []
created_date: '2026-02-13 16:47'
updated_date: '2026-02-16 06:48'
labels: []
dependencies: []
documentation:
- /Users/sudacode/.codex/worktrees/2089/SubMiner/docs/configuration.md
- /Users/sudacode/.codex/worktrees/2089/SubMiner/docs/jlpt-vocab-bundle.md
priority: high
---
@@ -19,20 +23,32 @@ Leverage user-installed frequency dictionaries to color subtitle tokens based on
## Acceptance Criteria
<!-- AC:BEGIN -->
- [ ] #1 Add a feature flag and configuration for frequency-based highlighting with default disabled state.
- [ ] #2 Support selecting a user-installed frequency dictionary source and reading word frequency data from it.
- [ ] #3 Introduce a configurable top-X threshold in config for which words are eligible for frequency-based coloring.
- [ ] #4 When single-color mode is enabled, all matched words within the rank rule use the configured color.
- [ ] #5 When multi-color mode is enabled, map frequency bands to colors and color tokens by their actual rank bucket.
- [ ] #6 Ensure matching is token-aware (normalization/lowercasing handling) and preserves existing subtitle tokenization behavior.
- [ ] #7 Handle missing/unsupported dictionary formats and unknown words with deterministic no-highlight fallback.
- [ ] #8 Render underline/token highlights without breaking subtitle layout or interactions.
- [ ] #9 Add tests/verification for: single-color mode, color-band mode, threshold boundary, and disabled mode.
- [ ] #10 Document dictionary source format expectations, configuration example, and performance impact of ranking lookups.
- [ ] #11 If full automatic discovery of user-installed frequency dictionaries is not possible, provide clear configuration workflow/fallback path.
- [x] #1 Add a feature flag and configuration for frequency-based highlighting with default disabled state.
- [x] #2 Support selecting a user-installed frequency dictionary source and reading word frequency data from it.
- [x] #3 Introduce a configurable top-X threshold in config for which words are eligible for frequency-based coloring.
- [x] #4 When single-color mode is enabled, all matched words within the rank rule use the configured color.
- [x] #5 When multi-color mode is enabled, map frequency bands to colors and color tokens by their actual rank bucket.
- [x] #6 Ensure matching is token-aware (normalization/lowercasing handling) and preserves existing subtitle tokenization behavior.
- [x] #7 Handle missing/unsupported dictionary formats and unknown words with deterministic no-highlight fallback.
- [x] #8 Render underline/token highlights without breaking subtitle layout or interactions.
- [x] #9 Add tests/verification for: single-color mode, color-band mode, threshold boundary, and disabled mode.
- [x] #10 Document dictionary source format expectations, configuration example, and performance impact of ranking lookups.
- [x] #11 If full automatic discovery of user-installed frequency dictionaries is not possible, provide clear configuration workflow/fallback path.
<!-- AC:END -->
## Implementation Notes
<!-- SECTION:NOTES:BEGIN -->
2026-02-16: Updated docs for frequency dictionary behavior. Clarified built-in fallback, precedence, and shared format expectations in and .
Added docs references for frequency dictionary defaults and fallback behavior.
As of 2026-02-16, docs and implementation are considered complete for TASK-25; frequency highlighting fallback, custom sourcePath precedence, topX, single/banded modes, token pipeline integration, and fallback behavior are present; documentation and tests exist in src/core/services and src/renderer.
2026-02-16: Frequency-dictionary highlighting feature fully complete and shipped. Task acceptance criteria, DoD, and docs alignment are all marked complete in this task record.
<!-- SECTION:NOTES:END -->
## Definition of Done
<!-- DOD:BEGIN -->
- [ ] #1 Frequency-based highlighting renders using either single-color or banded-colors for valid matches, with configurable top-X threshold and documented setup.
- [x] #1 Frequency-based highlighting renders using either single-color or banded-colors for valid matches, with configurable top-X threshold and documented setup.
<!-- DOD:END -->

View File

@@ -556,7 +556,7 @@ See `config.example.jsonc` for detailed configuration options.
| `backgroundColor` | string | Any CSS color, including `"transparent"` (default: `"rgba(54, 58, 79, 0.5)"`) |
| `enableJlpt` | boolean | Enable JLPT level underline styling (`false` by default) |
| `frequencyDictionary.enabled` | boolean | Enable frequency highlighting from dictionary lookups (`false` by default) |
| `frequencyDictionary.sourcePath` | string | Optional absolute path used for dictionary discovery (defaults to built-in paths) |
| `frequencyDictionary.sourcePath` | string | Path to a local frequency dictionary root. Leave empty or omit to use the built-in bundled dictionary search paths. |
| `frequencyDictionary.topX` | number | Only color tokens whose frequency rank is `<= topX` (`1000` by default) |
| `frequencyDictionary.mode` | string | `"single"` or `"banded"` (`"single"` by default) |
| `frequencyDictionary.singleColor` | string | Color used for all highlighted tokens in single mode |
@@ -568,7 +568,15 @@ See `config.example.jsonc` for detailed configuration options.
JLPT underlining is powered by offline term-meta bank files at runtime. See [`docs/jlpt-vocab-bundle.md`](jlpt-vocab-bundle.md) for required files, source/version refresh steps, and deterministic fallback behavior.
Frequency dictionary highlighting uses the same dictionary file format as JLPT bundle lookups (`term_meta_bank_*.json` under discovered dictionary directories). A token is highlighted when it has a positive integer `frequencyRank` (lower is more common) and the rank is within `topX`. In `single` mode all highlights use `singleColor`; in `banded` mode tokens map to five ascending color bands from most common to least common inside the topX window.
Frequency dictionary highlighting uses the same dictionary file format as JLPT bundle lookups (`term_meta_bank_*.json` under discovered dictionary directories). A token is highlighted when it has a positive integer `frequencyRank` (lower is more common) and the rank is within `topX`.
Lookup behavior:
- Set `frequencyDictionary.sourcePath` to a directory containing `term_meta_bank_*.json` for a fully custom source.
- If `sourcePath` is missing or empty, SubMiner uses bundled defaults from `vendor/jiten_freq_global` (packaged under `<resources>/jiten_freq_global` in distribution builds).
- In both cases, only terms with a valid `frequencyRank` are used; everything else falls back to no highlighting.
In `single` mode all highlights use `singleColor`; in `banded` mode tokens map to five ascending color bands from most common to least common inside the topX window.
Secondary subtitle defaults: `fontSize: 24`, `fontColor: "#ffffff"`, `backgroundColor: "transparent"`. Any property not set in `secondary` falls back to the CSS defaults.

View File

@@ -26,6 +26,8 @@ The expected files are:
Each bank maps terms to frequency metadata; only entries with a `frequency.displayValue` are considered for JLPT tagging.
SubMiner also reuses the same `term_meta_bank_*.json` format for frequency-based subtitle highlighting. The default frequency source is now bundled as `vendor/jiten_freq_global`, so users can enable `subtitleStyle.frequencyDictionary` without extra setup.
## Source and update process
For reproducible updates: