Persist stats exclusions in DB and fix word metrics filtering

- Stats vocabulary exclusions stored in `imm_stats_excluded_words` (schema v18); seeded from localStorage on first load - Session, overview, trends, and library word metrics use filtered persisted occurrences with raw fallback - Session known-word % chart uses filtered persisted totals as denominator for both known and total - JLPT subtitle styling changed to underline-only; no longer overrides text color
2026-05-04 00:41:33 -07:00 · 2026-05-03 19:40:54 -07:00
parent db30c61327
commit 25d0aa47db
32 changed files with 1541 additions and 211 deletions
@@ -0,0 +1,38 @@
+---
+id: TASK-325
+title: Fix session chart known-word percentage denominator
+status: Done
+assignee: []
+created_date: '2026-05-04 01:19'
+updated_date: '2026-05-04 01:23'
+labels:
+  - stats
+dependencies: []
+priority: medium
+---
+
+## Description
+
+<!-- SECTION:DESCRIPTION:BEGIN -->
+Session detail known-word percentages should use the same filtered vocabulary occurrence rows for both known and total word counts. Current chart can divide known persisted word occurrences by raw token totals, causing excluded tokens to depress the known percentage.
+<!-- SECTION:DESCRIPTION:END -->
+
+## Acceptance Criteria
+<!-- AC:BEGIN -->
+- [x] #1 Session known-word timeline API exposes cumulative filtered total word counts alongside known counts.
+- [x] #2 Session detail chart computes known/unknown areas from filtered totals, not raw timeline token counts, when known-word data is available.
+- [x] #3 Session summary known-word rate uses filtered persisted word totals where available and preserves safe fallback behavior when known-word data is unavailable.
+- [x] #4 Regression tests cover filtered denominator behavior for the API and chart data path.
+<!-- AC:END -->
+
+## Implementation Notes
+
+<!-- SECTION:NOTES:BEGIN -->
+Implemented in-place fix using existing persisted word occurrence rows. `/api/stats/sessions/:id/known-words-timeline` now returns cumulative `totalWordsSeen` from filtered persisted occurrences, and session known-word rates divide by the same filtered total. Session detail chart builds known/unknown areas from `totalWordsSeen` instead of raw timeline `tokensSeen`.
+<!-- SECTION:NOTES:END -->
+
+## Final Summary
+
+<!-- SECTION:FINAL_SUMMARY:BEGIN -->
+Known-word percentages on session charts now use filtered persisted word totals for both numerator and denominator. No migration/backfill required; data comes from existing `imm_word_line_occurrences`. Added regression coverage for the API response/rate and chart data builder.
+<!-- SECTION:FINAL_SUMMARY:END -->
@@ -0,0 +1,42 @@
+---
+id: TASK-326
+title: Make stats word metrics honor filtering rules
+status: Done
+assignee: []
+created_date: '2026-05-04 01:35'
+updated_date: '2026-05-04 02:08'
+labels:
+  - stats
+dependencies: []
+priority: high
+---
+
+## Description
+
+<!-- SECTION:DESCRIPTION:BEGIN -->
+Audit stats app metrics that show or derive from word totals and make them use filtered persisted vocabulary occurrences where the UI concept is learned/seen words. Preserve raw telemetry only where it is intentionally playback/token telemetry.
+<!-- SECTION:DESCRIPTION:END -->
+
+## Acceptance Criteria
+<!-- AC:BEGIN -->
+- [x] #1 Stats UI word totals, word rates, lookup-per-word rates, and chart word series use filtered persisted word occurrences where available.
+- [x] #2 Known-word metrics continue to use the same filtered denominator as known counts.
+- [x] #3 Trend, overview, library, session, and episode surfaces are audited with regression coverage for changed data paths.
+- [x] #4 Fallback behavior remains safe for sessions without persisted vocabulary occurrences.
+<!-- AC:END -->
+
+## Implementation Notes
+
+<!-- SECTION:NOTES:BEGIN -->
+Audit finding: raw `tokensSeen` / `totalTokensSeen` still feeds overview hints, dashboard aggregation, trends activity/progress/anime cumulative/library summary, lookup-per-100-word rates, session rows/recent sessions/episode sessions, and library/anime/media headers. Vocabulary and known unique word summaries already use persisted filtered vocabulary rows. Recommended design: query-time filtered word totals from existing `imm_word_line_occurrences`, with raw-token fallback only when a session has no persisted occurrence rows.
+
+Implemented shared query-time filtered word counts. Session summaries, overview hints, daily/monthly rollups, anime/media library/detail rows, anime episode rows, episode/media sessions, trends activity/progress/anime cumulative, library summary, and lookup-per-100-word ratios now use filtered persisted word occurrences. Fallback remains raw token totals only for sessions with no persisted subtitle-line rows.
+
+Follow-up implemented: Vocab frequency tables now apply the same tokenizer vocabulary predicate at read time, because old `imm_words` rows can predate current tokenizer exclusion rules. Vocabulary persistence and cleanup also mirror the broader subtitle-annotation grammar filters. Added common frequency stop terms observed in the stats vocabulary list to the shared tokenizer exclusion set so those rows are filtered consistently across subtitle annotations, persistence, cleanup, stats reads, and SQL word-count aggregates.
+<!-- SECTION:NOTES:END -->
+
+## Final Summary
+
+<!-- SECTION:FINAL_SUMMARY:BEGIN -->
+Stats word metrics now honor filtering rules through the read-model query layer. Existing persisted `imm_word_line_occurrences` provide the filtered denominator; no migration/backfill needed. Vocab tables filter stored rows on read using tokenizer vocabulary rules, so legacy noisy rows stop appearing without a migration. Added regressions for session/overview/rollup fallback behavior, trends/library lookup-rate behavior, vocabulary read filtering, cleanup filtering, and shared stop-term filtering.
+<!-- SECTION:FINAL_SUMMARY:END -->
@@ -0,0 +1,42 @@
+---
+id: TASK-327
+title: Persist stats page exclusion list in database
+status: Done
+assignee: []
+created_date: '2026-05-04 01:39'
+updated_date: '2026-05-04 01:49'
+labels:
+  - feature
+  - stats
+  - database
+dependencies: []
+priority: medium
+---
+
+## Description
+
+<!-- SECTION:DESCRIPTION:BEGIN -->
+Add database-backed persistence for the stats page exclusion list. On first load with the new schema, seed the new table from the existing exclusion list source so existing user choices are preserved. After migration, update database rows whenever the exclusion list is changed or saved so it persists across browser sessions indefinitely.
+<!-- SECTION:DESCRIPTION:END -->
+
+## Acceptance Criteria
+<!-- AC:BEGIN -->
+- [x] #1 A new small database table stores stats page exclusion entries.
+- [x] #2 First load with the new schema seeds the table from the existing exclusion list source.
+- [x] #3 Subsequent exclusion list save/change operations update the database-backed list.
+- [x] #4 Regression coverage verifies migration/seed behavior and persistence updates.
+<!-- AC:END -->
+
+## Implementation Notes
+
+<!-- SECTION:NOTES:BEGIN -->
+Implemented DB-backed stats exclusion list using schema version 18 and new `imm_stats_excluded_words` table. Added read/replace query helpers, service methods, and `/api/stats/excluded-words` GET/PUT routes. Stats frontend now loads exclusions from DB, seeds the empty DB table from legacy `localStorage` on first load, and writes each toggle/restore/clear through the API while keeping localStorage in sync for compatibility. Added focused regression coverage for schema/read-replace, API routes, API client, and frontend bootstrap/update behavior. Verification: `bun run typecheck` passed; `bun test src/core/services/__tests__/stats-server.test.ts stats/src/lib/api-client.test.ts stats/src/hooks/useExcludedWords.test.ts` passed; `bun test src/core/services/immersion-tracker/storage-session.test.ts` passed; `bun run docs:test` passed; `bun run format:check:stats` passed; `bun run changelog:lint` passed. Blocked/unrelated: `bun run typecheck:stats` fails in existing stats files (`AnilistSelector.tsx`, `reading-utils*`, `session-grouping.test.ts`, `yomitan-lookup.test.tsx`); `bun run test:immersion:sqlite:src` fails existing `recordSubtitleLine counts exact Yomitan tokens for session metrics` expected 4 got 3; `bun run docs:build` fails missing `@catppuccin/vitepress/theme/macchiato/mauve.css` import.
+
+Added `src/core/services/__tests__/stats-server.test.ts` and `stats/src/hooks/useExcludedWords.test.ts` to the `test:core:src` allowlist so the new DB exclusion route/client/store regressions run in the maintained fast source lane.
+<!-- SECTION:NOTES:END -->
+
+## Final Summary
+
+<!-- SECTION:FINAL_SUMMARY:BEGIN -->
+Persisted the stats vocabulary exclusion list in SQLite with new schema version 18 table `imm_stats_excluded_words`. Added backend read/replace helpers and `/api/stats/excluded-words` GET/PUT routes, then wired the stats frontend exclusion store to load DB rows, seed an empty DB from legacy browser localStorage on first load, and update the DB on toggle/restore/clear. Updated docs and added changelog fragment. Focused tests and root typecheck pass; broader stats/docs/sqlite gates are blocked by unrelated existing failures recorded in notes.
+<!-- SECTION:FINAL_SUMMARY:END -->
@@ -0,0 +1,43 @@
+---
+id: TASK-329
+title: Keep JLPT subtitle styling underline-only
+status: Done
+assignee: []
+created_date: '2026-05-04 02:13'
+labels:
+  - bug
+  - renderer
+  - jlpt
+dependencies: []
+references:
+  - src/renderer/style.css
+  - src/renderer/subtitle-render.test.ts
+priority: medium
+---
+
+## Description
+
+<!-- SECTION:DESCRIPTION:BEGIN -->
+Fix subtitle token styling so JLPT metadata never changes token text color. JLPT should only render the level marker/underline affordance while known, n+1, name-match, and frequency colors retain priority.
+<!-- SECTION:DESCRIPTION:END -->
+
+## Acceptance Criteria
+<!-- AC:BEGIN -->
+- [x] #1 JLPT-only subtitle tokens do not set token text color.
+- [x] #2 JLPT level marker/underline still uses configured JLPT color.
+- [x] #3 Existing known, n+1, name-match, and frequency text colors remain unchanged.
+<!-- AC:END -->
+
+## Final Summary
+
+<!-- SECTION:FINAL_SUMMARY:BEGIN -->
+Changed subtitle JLPT styling from text color to underline decoration and updated renderer CSS regression coverage.
+
+Verification:
+- `bun test src/renderer/subtitle-render.test.ts`
+- `bunx prettier --check src/renderer/subtitle-render.test.ts src/renderer/style.css`
+- `bun run typecheck`
+
+Blocked:
+- `bun run test:fast` fails in existing dirty stats/session work: `recordSubtitleLine counts exact Yomitan tokens for session metrics` expects `tokensSeen` 4 but gets 3.
+<!-- SECTION:FINAL_SUMMARY:END -->