Files
SubMiner/backlog/tasks/task-81 - Tokenization-performance-disable-yomitan-mecab-and-persistent-local-mecab.md

2.1 KiB

id, title, status, assignee, created_date, updated_date, labels, dependencies, priority, ordinal
id title status assignee created_date updated_date labels dependencies priority ordinal
TASK-81 Tokenization performance: disable Yomitan MeCab parser, gate local MeCab init, and add persistent MeCab process Done
2026-03-02 07:44 2026-03-02 09:20
high 9001

Description

Reduce subtitle annotation latency by:

  • disabling Yomitan-side MeCab parser requests (useMecabParser=false);
  • initializing local MeCab only when POS-dependent annotations are enabled (N+1 / JLPT / frequency);
  • replacing per-line local MeCab process spawning with a persistent parser process that auto-shuts down after idle time and restarts on demand.

Acceptance Criteria

  • #1 Yomitan parse requests disable MeCab parser path.
  • #2 MeCab warmup/init is skipped when all POS-dependent annotation toggles are off.
  • #3 Local MeCab tokenizer uses persistent process across subtitle lines.
  • #4 Persistent MeCab process auto-shuts down after idle timeout and restarts on next tokenize activity.
  • #5 Tests cover parser flag, warmup gating, and persistent MeCab lifecycle behavior.

Final Summary

Implemented tokenizer latency optimizations:

  • switched Yomitan parse requests to useMecabParser: false;
  • added annotation-aware MeCab initialization gating in runtime warmup flow;
  • added persistent local MeCab process (default idle shutdown: 30s) with queued requests, retry-on-process-end, idle auto-shutdown, and automatic restart on new work;
  • added regression tests for Yomitan parse flag, MeCab warmup gating, and persistent/idle lifecycle behavior;
  • fixed tokenization warmup gate so first-use warmup completion is sticky (tokenizationWarmupCompleted) and sequential tokenizeSubtitle calls no longer re-run Yomitan/dictionary warmup path;
  • added regression coverage in src/main/runtime/composers/mpv-runtime-composer.test.ts for sequential tokenize calls (warmup side effects run once);
  • validated with targeted tests and tsc --noEmit.