Files
SubMiner/backlog/tasks/task-189 - Replace-stats-word-counts-with-Yomitan-token-counts.md

2.5 KiB

id, title, status, assignee, created_date, updated_date, labels, milestone, dependencies, references, priority, ordinal
id title status assignee created_date updated_date labels milestone dependencies references priority ordinal
TASK-189 Replace stats word counts with Yomitan token counts Done
codex
2026-03-18 01:35 2026-03-18 05:28
stats
tokenizer
bug
m-1
src/core/services/immersion-tracker-service.ts
src/core/services/immersion-tracker/reducer.ts
src/core/services/immersion-tracker/storage.ts
src/core/services/immersion-tracker/query.ts
src/core/services/immersion-tracker/lifetime.ts
stats/src/components
stats/src/lib/yomitan-lookup.ts
medium 100500

Description

Replace heuristic immersion stats word counting with Yomitan token counts. Session/media/anime stats should use the exact merged Yomitan token stream as the denominator and display metric, with no whitespace/CJK-character fallback and no active wordsSeen concept in the runtime, storage, API, or stats UI.

Acceptance Criteria

  • #1 recordSubtitleLine derives session count deltas from Yomitan token arrays instead of calculateTextMetrics.
  • #2 Active immersion tracking/storage/query code no longer depends on wordsSeen / totalWordsSeen fields for stats behavior.
  • #3 Stats UI labels and lookup-rate copy refer to tokens instead of words where those counts are shown to users.
  • #4 Regression tests cover token-count sourcing, zero-count behavior when tokenization payload is absent, and updated stats copy.
  • #5 A changelog fragment documents the user-visible stats denominator change.

Implementation Plan

  1. Add failing tracker tests proving subtitle count metrics come from Yomitan token arrays and stay zero when tokenization is absent.
  2. Add failing stats UI tests for token-based copy and token-count display helpers.
  3. Remove wordsSeen from active tracker/session/query/type paths and use tokensSeen as the single stats count field.
  4. Update stats UI labels and lookup-rate copy from words to tokens.
  5. Run targeted verification, then add the changelog fragment and any needed docs update.

Outcome

Completed. Stats subtitle counts now come directly from Yomitan merged-token counts, wordsSeen is removed from the active tracker/storage/query/UI path, token-facing copy is updated, and focused regression coverage plus bun run typecheck are green.