mirror of
https://github.com/ksyasuda/SubMiner.git
synced 2026-02-28 06:22:45 -08:00
2.2 KiB
2.2 KiB
id, title, status, assignee, created_date, labels, dependencies, priority
| id | title | status | assignee | created_date | labels | dependencies | priority | ||
|---|---|---|---|---|---|---|---|---|---|
| TASK-88 | Remove MeCab fallback tokenizer and simplify Yomitan token flow | To Do | 2026-02-20 00:00 |
|
medium |
Description
Remove the MeCab fallback tokenization path and associated merge-selection complexity in subtitle tokenization. Treat Yomitan parser output as the single source of token boundaries/grouping, and keep only minimal normalization needed for downstream known-word, JLPT, and frequency annotation.
Action Steps
- Remove MeCab fallback execution from
tokenizeSubtitleand delete dead fallback-specific branches. - Remove merge/candidate-selection code that is only needed to reconcile MeCab-vs-Yomitan tokenization strategies.
- Keep Yomitan parsing pipeline with minimal structural token normalization only.
- Update MeCab usage so it is no longer required for tokenization fallback (retain only explicitly needed behavior, if any).
- Update docs/config notes to reflect Yomitan-only tokenization flow.
- Add regression tests for Yomitan-only success/failure paths and token annotation continuity.
Acceptance Criteria
- #1 Subtitle tokenization no longer falls back to MeCab when Yomitan parsing fails.
- #2 Token grouping logic is simplified to rely on Yomitan structure; redundant custom merge-selection logic removed.
- #3 Known-word, JLPT, frequency, and N+1 annotations still work on Yomitan-derived tokens.
- #4 If Yomitan parsing fails, behavior is explicit and tested (for example
tokens: nullwithout MeCab fallback path). - #5 Documentation reflects that tokenization flow is Yomitan-first and Yomitan-only.
Definition of Done
- #1
src/core/services/tokenizer.tsno longer contains MeCab fallback tokenization branch. - #2 Tests cover Yomitan-only pipeline and failure behavior regressions.
- #3 Any removed MeCab-only merge helpers are deleted with no unused exports/imports.
- #4 Build and relevant tokenizer/subtitle tests pass.