Files
SubMiner/backlog/tasks/task-108 - Exclude-single-kana-tokens-from-frequency-highlighting.md

1.3 KiB

id, title, status, assignee, created_date, updated_date, labels, dependencies, priority, ordinal
id title status assignee created_date updated_date labels dependencies priority ordinal
TASK-108 Exclude single kana tokens from frequency highlighting Done
2026-03-07 01:18 2026-03-07 01:22
medium 9008

Description

Suppress frequency highlighting for single-character hiragana or katakana tokens. Scope is frequency-only: known/N+1/JLPT behavior stays unchanged.

Acceptance Criteria

  • #1 Single-character hiragana tokens do not retain frequencyRank.
  • #2 Single-character katakana tokens do not retain frequencyRank.
  • #3 Regression coverage exists at annotation-stage and tokenizer levels.

Final Summary

Added a frequency-only suppression rule for single-character kana tokens based on token surface, so bogus merged fragments like and standalone one-character kana no longer keep frequencyRank. Regression coverage now exists both in the annotation stage and in the tokenizer path, while multi-character tokens and N+1/JLPT behavior remain unchanged.

Verification:

  • bun test src/core/services/tokenizer/annotation-stage.test.ts --timeout 20000
  • bun test src/core/services/tokenizer.test.ts --timeout 20000