SubMiner/backlog/tasks/task-171 - Add-normalized-immersion-word-and-kanji-occurrence-tracking.md at 9530445a9560e76c1d0eb42066f5f314bef92642 - SubMiner

mirror of https://github.com/ksyasuda/SubMiner.git synced 2026-03-20 12:11:28 -07:00

Files

docs: add stats dashboard design docs, plans, and knowledge base

- Stats dashboard redesign design and implementation plans
- Episode detail and Anki card link design
- Internal knowledge base restructure
- Backlog tasks for testing, verification, and occurrence tracking

2026-03-17 20:01:23 -07:00

5.2 KiB

Raw Blame History

id, title, status, assignee, created_date, updated_date, labels, dependencies, references

title

status

assignee

created_date

updated_date

labels

dependencies

references

TASK-171

Add normalized immersion word and kanji occurrence tracking

Done

codex

2026-03-14 11:30

2026-03-14 11:48

immersion

stats

database

/home/sudacode/projects/japanese/SubMiner/docs/plans/2026-03-14-immersion-occurrence-tracking-design.md

/home/sudacode/projects/japanese/SubMiner/docs/plans/2026-03-14-immersion-occurrence-tracking.md

Description

Add normalized occurrence tables for immersion-tracked words and kanji so stats can map vocabulary back to the exact anime, episode, timestamp, and subtitle line where each item appeared. Preserve repeated tokens within the same line via counted occurrences instead of deduping, while avoiding duplicated token text storage.

Acceptance Criteria

#1 The immersion schema adds normalized subtitle-line and counted occurrence tables for words and kanji, with additive migration support for existing databases.
#2 Subtitle-line tracking writes one subtitle-line row per seen line plus counted word/kanji occurrences linked back to the line, session, video, and anime context.
#3 Query surfaces can map a word or kanji back to anime/episode/line/timestamp rows without breaking current top-level vocabulary and kanji stats.
#4 Focused regression coverage exists for schema, counted occurrence persistence, and reverse-mapping queries.
#5 Verification covers the SQLite immersion lane and any broader lanes required by touched service/API files.

Implementation Plan

Add red tests for new line/occurrence schema and migration shape in the SQLite immersion lane.
Add red tests for service-level subtitle persistence that writes one line row plus counted word/kanji occurrences.
Implement additive schema, write-path plumbing, and counted occurrence upserts with minimal disruption to existing aggregate tables.
Add reverse-mapping query surfaces for word and kanji occurrences, plus focused API/service exposure only where needed.
Run focused SQLite verification first, then broader verification only if touched runtime/API files require it.

Implementation Notes

2026-03-14: Design approved in-thread. Chosen shape: imm_subtitle_lines plus counted bridge tables imm_word_line_occurrences and imm_kanji_line_occurrences, retaining repeated tokens within a line via occurrence_count. 2026-03-14: Implemented additive schema version bump to 7. recordSubtitleLine(...) now queues one normalized subtitle-line write that owns aggregate word/kanji upserts plus counted bridge-row inserts. 2026-03-14: Added reverse-mapping query surfaces for exact word triples and single kanji lookups. No stats API/UI consumer was widened in this change. 2026-03-14: Verification commands run:

bun test src/core/services/immersion-tracker-service.test.ts
bun test src/core/services/immersion-tracker/storage-session.test.ts
bun test src/core/services/immersion-tracker/__tests__/query.test.ts
bun run typecheck
bash .agents/skills/subminer-change-verification/scripts/classify_subminer_diff.sh src/core/services/immersion-tracker/types.ts src/core/services/immersion-tracker/storage.ts src/core/services/immersion-tracker/query.ts src/core/services/immersion-tracker-service.ts src/core/services/immersion-tracker/storage-session.test.ts src/core/services/immersion-tracker-service.test.ts src/core/services/immersion-tracker/__tests__/query.test.ts
bash .agents/skills/subminer-change-verification/scripts/verify_subminer_change.sh --lane core src/core/services/immersion-tracker/types.ts src/core/services/immersion-tracker/storage.ts src/core/services/immersion-tracker/query.ts src/core/services/immersion-tracker-service.ts src/core/services/immersion-tracker/storage-session.test.ts src/core/services/immersion-tracker-service.test.ts src/core/services/immersion-tracker/__tests__/query.test.ts
bun run test:immersion:sqlite:src 2026-03-14: Verification results:
targeted tracker/query tests: passed
verifier lane selection: core
verifier result: passed (typecheck, test:fast)
verifier artifacts: .tmp/skill-verification/subminer-verify-20260314-114630-abO7mb/
maintained immersion SQLite lane: passed

Final Summary

Added normalized subtitle-line occurrence tracking to immersion stats with three additive tables: imm_subtitle_lines, imm_word_line_occurrences, and imm_kanji_line_occurrences.

recordSubtitleLine(...) now preserves repeated allowed tokens and repeated kanji within the same subtitle line via occurrence_count, while still updating canonical imm_words and imm_kanji aggregates.

Added reverse-mapping queries for exact word triples and kanji so callers can fetch anime/video/session/line/timestamp context for each occurrence without duplicating token text storage.

Verified with targeted tracker/query tests, bun run typecheck, verifier-selected core coverage, and the maintained bun run test:immersion:sqlite:src lane.

5.2 KiB Raw Blame History