Files
SubMiner/backlog/tasks/task-25 - Add-frequency-dictionary-based-token-highlighting-with-configurable-top-X-and-color-ramp.md
2026-02-13 18:29:17 -08:00

2.2 KiB

id, title, status, assignee, created_date, labels, dependencies, priority
id title status assignee created_date labels dependencies priority
TASK-25 Add frequency-dictionary-based token highlighting with configurable top-X and color ramp To Do
2026-02-13 16:47
high

Description

Leverage user-installed frequency dictionaries to color subtitle tokens based on word frequency rank, with configurable behavior: either one shared color for all words below a rank threshold or a multi-color range mapping based on frequency bands. The feature should support a configurable X (top-N words) cutoff and integrate with existing subtitle rendering flow.

Acceptance Criteria

  • #1 Add a feature flag and configuration for frequency-based highlighting with default disabled state.
  • #2 Support selecting a user-installed frequency dictionary source and reading word frequency data from it.
  • #3 Introduce a configurable top-X threshold in config for which words are eligible for frequency-based coloring.
  • #4 When single-color mode is enabled, all matched words within the rank rule use the configured color.
  • #5 When multi-color mode is enabled, map frequency bands to colors and color tokens by their actual rank bucket.
  • #6 Ensure matching is token-aware (normalization/lowercasing handling) and preserves existing subtitle tokenization behavior.
  • #7 Handle missing/unsupported dictionary formats and unknown words with deterministic no-highlight fallback.
  • #8 Render underline/token highlights without breaking subtitle layout or interactions.
  • #9 Add tests/verification for: single-color mode, color-band mode, threshold boundary, and disabled mode.
  • #10 Document dictionary source format expectations, configuration example, and performance impact of ranking lookups.
  • #11 If full automatic discovery of user-installed frequency dictionaries is not possible, provide clear configuration workflow/fallback path.

Definition of Done

  • #1 Frequency-based highlighting renders using either single-color or banded-colors for valid matches, with configurable top-X threshold and documented setup.