mirror of
https://github.com/ksyasuda/SubMiner.git
synced 2026-03-07 03:22:17 -08:00
1.3 KiB
1.3 KiB
id, title, status, assignee, created_date, updated_date, labels, dependencies, priority, ordinal
| id | title | status | assignee | created_date | updated_date | labels | dependencies | priority | ordinal |
|---|---|---|---|---|---|---|---|---|---|
| TASK-108 | Exclude single kana tokens from frequency highlighting | Done | 2026-03-07 01:18 | 2026-03-07 01:22 | medium | 9008 |
Description
Suppress frequency highlighting for single-character hiragana or katakana tokens. Scope is frequency-only: known/N+1/JLPT behavior stays unchanged.
Acceptance Criteria
- #1 Single-character hiragana tokens do not retain
frequencyRank. - #2 Single-character katakana tokens do not retain
frequencyRank. - #3 Regression coverage exists at annotation-stage and tokenizer levels.
Final Summary
Added a frequency-only suppression rule for single-character kana tokens based on token surface, so bogus merged fragments like た and standalone one-character kana no longer keep frequencyRank. Regression coverage now exists both in the annotation stage and in the tokenizer path, while multi-character tokens and N+1/JLPT behavior remain unchanged.
Verification:
bun test src/core/services/tokenizer/annotation-stage.test.ts --timeout 20000bun test src/core/services/tokenizer.test.ts --timeout 20000