mirror of
https://github.com/ksyasuda/SubMiner.git
synced 2026-02-28 06:22:45 -08:00
Mark TASK-25 as done
This commit is contained in:
@@ -3,11 +3,15 @@ id: TASK-25
|
|||||||
title: >-
|
title: >-
|
||||||
Add frequency-dictionary-based token highlighting with configurable top-X and
|
Add frequency-dictionary-based token highlighting with configurable top-X and
|
||||||
color ramp
|
color ramp
|
||||||
status: To Do
|
status: Done
|
||||||
assignee: []
|
assignee: []
|
||||||
created_date: '2026-02-13 16:47'
|
created_date: '2026-02-13 16:47'
|
||||||
|
updated_date: '2026-02-16 06:48'
|
||||||
labels: []
|
labels: []
|
||||||
dependencies: []
|
dependencies: []
|
||||||
|
documentation:
|
||||||
|
- /Users/sudacode/.codex/worktrees/2089/SubMiner/docs/configuration.md
|
||||||
|
- /Users/sudacode/.codex/worktrees/2089/SubMiner/docs/jlpt-vocab-bundle.md
|
||||||
priority: high
|
priority: high
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -19,20 +23,32 @@ Leverage user-installed frequency dictionaries to color subtitle tokens based on
|
|||||||
|
|
||||||
## Acceptance Criteria
|
## Acceptance Criteria
|
||||||
<!-- AC:BEGIN -->
|
<!-- AC:BEGIN -->
|
||||||
- [ ] #1 Add a feature flag and configuration for frequency-based highlighting with default disabled state.
|
- [x] #1 Add a feature flag and configuration for frequency-based highlighting with default disabled state.
|
||||||
- [ ] #2 Support selecting a user-installed frequency dictionary source and reading word frequency data from it.
|
- [x] #2 Support selecting a user-installed frequency dictionary source and reading word frequency data from it.
|
||||||
- [ ] #3 Introduce a configurable top-X threshold in config for which words are eligible for frequency-based coloring.
|
- [x] #3 Introduce a configurable top-X threshold in config for which words are eligible for frequency-based coloring.
|
||||||
- [ ] #4 When single-color mode is enabled, all matched words within the rank rule use the configured color.
|
- [x] #4 When single-color mode is enabled, all matched words within the rank rule use the configured color.
|
||||||
- [ ] #5 When multi-color mode is enabled, map frequency bands to colors and color tokens by their actual rank bucket.
|
- [x] #5 When multi-color mode is enabled, map frequency bands to colors and color tokens by their actual rank bucket.
|
||||||
- [ ] #6 Ensure matching is token-aware (normalization/lowercasing handling) and preserves existing subtitle tokenization behavior.
|
- [x] #6 Ensure matching is token-aware (normalization/lowercasing handling) and preserves existing subtitle tokenization behavior.
|
||||||
- [ ] #7 Handle missing/unsupported dictionary formats and unknown words with deterministic no-highlight fallback.
|
- [x] #7 Handle missing/unsupported dictionary formats and unknown words with deterministic no-highlight fallback.
|
||||||
- [ ] #8 Render underline/token highlights without breaking subtitle layout or interactions.
|
- [x] #8 Render underline/token highlights without breaking subtitle layout or interactions.
|
||||||
- [ ] #9 Add tests/verification for: single-color mode, color-band mode, threshold boundary, and disabled mode.
|
- [x] #9 Add tests/verification for: single-color mode, color-band mode, threshold boundary, and disabled mode.
|
||||||
- [ ] #10 Document dictionary source format expectations, configuration example, and performance impact of ranking lookups.
|
- [x] #10 Document dictionary source format expectations, configuration example, and performance impact of ranking lookups.
|
||||||
- [ ] #11 If full automatic discovery of user-installed frequency dictionaries is not possible, provide clear configuration workflow/fallback path.
|
- [x] #11 If full automatic discovery of user-installed frequency dictionaries is not possible, provide clear configuration workflow/fallback path.
|
||||||
<!-- AC:END -->
|
<!-- AC:END -->
|
||||||
|
|
||||||
|
## Implementation Notes
|
||||||
|
|
||||||
|
<!-- SECTION:NOTES:BEGIN -->
|
||||||
|
2026-02-16: Updated docs for frequency dictionary behavior. Clarified built-in fallback, precedence, and shared format expectations in and .
|
||||||
|
|
||||||
|
Added docs references for frequency dictionary defaults and fallback behavior.
|
||||||
|
|
||||||
|
As of 2026-02-16, docs and implementation are considered complete for TASK-25; frequency highlighting fallback, custom sourcePath precedence, topX, single/banded modes, token pipeline integration, and fallback behavior are present; documentation and tests exist in src/core/services and src/renderer.
|
||||||
|
|
||||||
|
2026-02-16: Frequency-dictionary highlighting feature fully complete and shipped. Task acceptance criteria, DoD, and docs alignment are all marked complete in this task record.
|
||||||
|
<!-- SECTION:NOTES:END -->
|
||||||
|
|
||||||
## Definition of Done
|
## Definition of Done
|
||||||
<!-- DOD:BEGIN -->
|
<!-- DOD:BEGIN -->
|
||||||
- [ ] #1 Frequency-based highlighting renders using either single-color or banded-colors for valid matches, with configurable top-X threshold and documented setup.
|
- [x] #1 Frequency-based highlighting renders using either single-color or banded-colors for valid matches, with configurable top-X threshold and documented setup.
|
||||||
<!-- DOD:END -->
|
<!-- DOD:END -->
|
||||||
|
|||||||
@@ -556,7 +556,7 @@ See `config.example.jsonc` for detailed configuration options.
|
|||||||
| `backgroundColor` | string | Any CSS color, including `"transparent"` (default: `"rgba(54, 58, 79, 0.5)"`) |
|
| `backgroundColor` | string | Any CSS color, including `"transparent"` (default: `"rgba(54, 58, 79, 0.5)"`) |
|
||||||
| `enableJlpt` | boolean | Enable JLPT level underline styling (`false` by default) |
|
| `enableJlpt` | boolean | Enable JLPT level underline styling (`false` by default) |
|
||||||
| `frequencyDictionary.enabled` | boolean | Enable frequency highlighting from dictionary lookups (`false` by default) |
|
| `frequencyDictionary.enabled` | boolean | Enable frequency highlighting from dictionary lookups (`false` by default) |
|
||||||
| `frequencyDictionary.sourcePath` | string | Optional absolute path used for dictionary discovery (defaults to built-in paths) |
|
| `frequencyDictionary.sourcePath` | string | Path to a local frequency dictionary root. Leave empty or omit to use the built-in bundled dictionary search paths. |
|
||||||
| `frequencyDictionary.topX` | number | Only color tokens whose frequency rank is `<= topX` (`1000` by default) |
|
| `frequencyDictionary.topX` | number | Only color tokens whose frequency rank is `<= topX` (`1000` by default) |
|
||||||
| `frequencyDictionary.mode` | string | `"single"` or `"banded"` (`"single"` by default) |
|
| `frequencyDictionary.mode` | string | `"single"` or `"banded"` (`"single"` by default) |
|
||||||
| `frequencyDictionary.singleColor` | string | Color used for all highlighted tokens in single mode |
|
| `frequencyDictionary.singleColor` | string | Color used for all highlighted tokens in single mode |
|
||||||
@@ -568,7 +568,15 @@ See `config.example.jsonc` for detailed configuration options.
|
|||||||
|
|
||||||
JLPT underlining is powered by offline term-meta bank files at runtime. See [`docs/jlpt-vocab-bundle.md`](jlpt-vocab-bundle.md) for required files, source/version refresh steps, and deterministic fallback behavior.
|
JLPT underlining is powered by offline term-meta bank files at runtime. See [`docs/jlpt-vocab-bundle.md`](jlpt-vocab-bundle.md) for required files, source/version refresh steps, and deterministic fallback behavior.
|
||||||
|
|
||||||
Frequency dictionary highlighting uses the same dictionary file format as JLPT bundle lookups (`term_meta_bank_*.json` under discovered dictionary directories). A token is highlighted when it has a positive integer `frequencyRank` (lower is more common) and the rank is within `topX`. In `single` mode all highlights use `singleColor`; in `banded` mode tokens map to five ascending color bands from most common to least common inside the topX window.
|
Frequency dictionary highlighting uses the same dictionary file format as JLPT bundle lookups (`term_meta_bank_*.json` under discovered dictionary directories). A token is highlighted when it has a positive integer `frequencyRank` (lower is more common) and the rank is within `topX`.
|
||||||
|
|
||||||
|
Lookup behavior:
|
||||||
|
|
||||||
|
- Set `frequencyDictionary.sourcePath` to a directory containing `term_meta_bank_*.json` for a fully custom source.
|
||||||
|
- If `sourcePath` is missing or empty, SubMiner uses bundled defaults from `vendor/jiten_freq_global` (packaged under `<resources>/jiten_freq_global` in distribution builds).
|
||||||
|
- In both cases, only terms with a valid `frequencyRank` are used; everything else falls back to no highlighting.
|
||||||
|
|
||||||
|
In `single` mode all highlights use `singleColor`; in `banded` mode tokens map to five ascending color bands from most common to least common inside the topX window.
|
||||||
|
|
||||||
Secondary subtitle defaults: `fontSize: 24`, `fontColor: "#ffffff"`, `backgroundColor: "transparent"`. Any property not set in `secondary` falls back to the CSS defaults.
|
Secondary subtitle defaults: `fontSize: 24`, `fontColor: "#ffffff"`, `backgroundColor: "transparent"`. Any property not set in `secondary` falls back to the CSS defaults.
|
||||||
|
|
||||||
|
|||||||
@@ -26,6 +26,8 @@ The expected files are:
|
|||||||
|
|
||||||
Each bank maps terms to frequency metadata; only entries with a `frequency.displayValue` are considered for JLPT tagging.
|
Each bank maps terms to frequency metadata; only entries with a `frequency.displayValue` are considered for JLPT tagging.
|
||||||
|
|
||||||
|
SubMiner also reuses the same `term_meta_bank_*.json` format for frequency-based subtitle highlighting. The default frequency source is now bundled as `vendor/jiten_freq_global`, so users can enable `subtitleStyle.frequencyDictionary` without extra setup.
|
||||||
|
|
||||||
## Source and update process
|
## Source and update process
|
||||||
|
|
||||||
For reproducible updates:
|
For reproducible updates:
|
||||||
|
|||||||
Reference in New Issue
Block a user