SubMiner/backlog/tasks/task-203 - Restore-known-and-JLPT-annotation-for-reading-mismatch-tokens.md at main - SubMiner

sudacode/SubMiner

Fork 0

mirror of https://github.com/ksyasuda/SubMiner.git synced 2026-03-20 12:11:28 -07:00

Files

sudacode 6749ff843c feat(stats): add v1 immersion stats dashboard (#19 )

2026-03-20 02:43:28 -07:00

1.7 KiB

Raw Permalink Blame History

id, title, status, assignee, created_date, updated_date, labels, dependencies, references, priority, ordinal

title

status

assignee

created_date

updated_date

labels

dependencies

references

priority

ordinal

TASK-203

Restore known and JLPT annotation for reading-mismatch subtitle tokens

Done

Codex

2026-03-19 18:25

subtitle

bug

src/core/services/tokenizer/annotation-stage.ts

src/core/services/tokenizer/annotation-stage.test.ts

medium

105721

Description

Some subtitle tokens lose both known-word coloring and JLPT underline even though the popup resolves a valid dictionary term. Repro example: 大体 in 大体僕だって困ってたんですよ！ can be known via kana-only Anki data (だいたい) while JLPT lookup should still resolve from the kanji surface/headword.

Acceptance Criteria

#1 Subtitle annotation can mark a token known via its reading when the configured headword/surface lookup misses.
#2 JLPT eligibility no longer drops valid kanji terms just because their reading contains repeated kana patterns.
#3 Regression coverage locks the combined known + JLPT case for 大体.

Outcome

Known-word annotation now falls back to the token reading after the configured headword/surface lookup misses, so kana-only known-card entries still light up matching subtitle tokens. JLPT eligibility now ignores repeated-kana noise checks on the reading when a real surface/headword is present, which preserves JLPT tagging for words like 大体.

Verification:

bun test src/core/services/tokenizer/annotation-stage.test.ts

1.7 KiB Raw Permalink Blame History

Description

Acceptance Criteria

Outcome

1.7 KiB

Raw Permalink Blame History