mirror of
https://github.com/ksyasuda/SubMiner.git
synced 2026-03-20 12:11:28 -07:00
1.7 KiB
1.7 KiB
id, title, status, assignee, created_date, updated_date, labels, dependencies, references, priority, ordinal
| id | title | status | assignee | created_date | updated_date | labels | dependencies | references | priority | ordinal | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TASK-203 | Restore known and JLPT annotation for reading-mismatch subtitle tokens | Done |
|
2026-03-19 18:25 | 2026-03-19 18:25 |
|
|
medium | 105721 |
Description
Some subtitle tokens lose both known-word coloring and JLPT underline even though the popup resolves a valid dictionary term. Repro example: 大体 in 大体 僕だって困ってたんですよ! can be known via kana-only Anki data (だいたい) while JLPT lookup should still resolve from the kanji surface/headword.
Acceptance Criteria
- #1 Subtitle annotation can mark a token known via its reading when the configured headword/surface lookup misses.
- #2 JLPT eligibility no longer drops valid kanji terms just because their reading contains repeated kana patterns.
- #3 Regression coverage locks the combined known + JLPT case for
大体.
Outcome
Known-word annotation now falls back to the token reading after the configured headword/surface lookup misses, so kana-only known-card entries still light up matching subtitle tokens. JLPT eligibility now ignores repeated-kana noise checks on the reading when a real surface/headword is present, which preserves JLPT tagging for words like 大体.
Verification:
bun test src/core/services/tokenizer/annotation-stage.test.ts