perf(tokenizer): optimize mecab POS enrichment lookups

This commit is contained in:
2026-03-02 01:39:44 -08:00
parent 83f13df627
commit fa97472bce
3 changed files with 344 additions and 76 deletions

View File

@@ -4,7 +4,7 @@ title: 'Tokenization performance: disable Yomitan MeCab parser, gate local MeCab
status: Done
assignee: []
created_date: '2026-03-02 07:44'
updated_date: '2026-03-02 20:34'
updated_date: '2026-03-02 20:37'
labels: []
dependencies: []
priority: high
@@ -47,6 +47,8 @@ Implemented tokenizer latency optimizations:
- added regression coverage in `src/main/runtime/composers/mpv-runtime-composer.test.ts` for sequential tokenize calls (`warmup` side effects run once);
- post-review critical fix: treat Yomitan default-profile Anki server sync `no-change` as successful check, so `lastSyncedYomitanAnkiServer` is cached and expensive sync checks do not repeat on every subtitle line;
- added regression assertion in `src/core/services/tokenizer/yomitan-parser-runtime.test.ts` for `updated: false` path returning sync success;
- post-review performance fix: refactored POS enrichment to pre-index MeCab tokens by surface/start position and use sliding overlap window + binary-search cursor fallback, removing repeated full MeCab scans per token (`O(n*m)` hotspot);
- added regression test in `src/core/services/tokenizer/parser-enrichment-stage.test.ts` that fails on repeated distant-token scan access and passes with indexed lookup;
- validated with targeted tests and `tsc --noEmit`.
<!-- SECTION:FINAL_SUMMARY:END -->