SubMiner/backlog/tasks/task-93 - Replace-subtitle-tokenizer-with-left-to-right-Yomitan-scanning-parser.md at 82bec02a362980a63a7e12d21447f726690a987f - SubMiner

sudacode/SubMiner

Fork 0

mirror of https://github.com/ksyasuda/SubMiner.git synced 2026-03-06 19:57:26 -08:00

Files

sudacode 746696b1a4

fix: improve yomitan subtitle name lookup

2026-03-06 01:28:58 -08:00

1.7 KiB

Raw Blame History

id, title, status, assignee, created_date, updated_date, labels, dependencies, priority

title

status

assignee

created_date

updated_date

labels

dependencies

priority

TASK-93

Replace subtitle tokenizer with left-to-right Yomitan scanning parser

Done

2026-03-06 09:02

2026-03-06 09:14

tokenizer

yomitan

refactor

high

Description

Replace the current parseText candidate-selection tokenizer with a GSM-style left-to-right Yomitan scanning tokenizer for all subtitles. Preserve downstream token contracts for rendering, JLPT/frequency/N+1 annotation, and MeCab enrichment while improving full-term matching for names and katakana compounds.

Acceptance Criteria

#1 Subtitle tokenization uses a left-to-right Yomitan scanning strategy instead of parseText candidate selection.
#2 Token surfaces, readings, headwords, and offsets remain compatible with existing renderer and annotation stages.
#3 Known problematic name cases such as カズマ and バニール resolve to full-token dictionary matches when Yomitan can match them.
#4 Regression tests cover left-to-right exact-match scanning, unmatched text handling, and downstream tokenizeSubtitle integration.

Final Summary

Replaced the live subtitle tokenization path with a left-to-right Yomitan termsFind scanner that greedily advances through the normalized subtitle text, preserving downstream MergedToken contracts for renderer, MeCab enrichment, JLPT, frequency, and N+1 annotation. Added runtime and integration coverage for exact-match scanning plus name cases like カズマ and kept compatibility fallback handling for older mocked parseText-style test payloads.

1.7 KiB Raw Blame History

Description

Acceptance Criteria

Final Summary

1.7 KiB

Raw Blame History