Files
SubMiner/backlog/tasks/task-338 - Fix-known-word-highlight-on-standalone-subtitle-particles.md
T

4.0 KiB

id, title, status, assignee, created_date, updated_date, labels, dependencies, references, priority
id title status assignee created_date updated_date labels dependencies references priority
TASK-338 Fix known-word highlight on standalone subtitle particles Done
codex
2026-05-04 05:52 2026-05-04 05:57
bug
subtitle
tokenizer
src/core/services/tokenizer/annotation-stage.ts
src/core/services/tokenizer/subtitle-annotation-filter.ts
src/renderer/subtitle-render.ts
medium

Description

Standalone grammar particles such as に should not render as known-word green when they appear in the known-word cache as readings for other words. Keep known-word coloring for lexical tokens, but prevent grammar-excluded subtitle tokens from getting known-green.

Acceptance Criteria

  • #1 Standalone grammar particles like に do not retain isKnown after subtitle annotation filtering.
  • #2 Lexical known-word tokens still render as known when not grammar-excluded.
  • #3 Focused regression test covers the particle false-positive path.

Implementation Plan

  1. Add a focused regression in src/core/services/tokenizer/annotation-stage.test.ts showing standalone particle is grammar-excluded and does not retain isKnown even when isKnownWord('に') is true.
  2. Run the focused tokenizer annotation test and confirm the new test fails for the current behavior.
  3. Patch src/core/services/tokenizer/annotation-stage.ts so grammar-excluded tokens clear known status while still stripping N+1/frequency/JLPT/name metadata.
  4. Run the focused test file, then inspect diff and update task acceptance criteria.

Implementation Notes

Implemented tokenizer annotation filtering so grammar-excluded subtitle tokens clear known-word status instead of retaining green known coloring. Added focused regression for known-word-cache particle false positive and updated existing expectations for unified annotation clearing. Verification: bun test src/core/services/tokenizer/annotation-stage.test.ts --test-name-pattern "clears known status from standalone particles" failed before the production patch; after patch, bun test src/core/services/tokenizer/annotation-stage.test.ts, bun test src/core/services/tokenizer.test.ts, combined tokenizer tests, bun run typecheck, bun run changelog:lint, and bun run test:fast passed.

Full handoff gate follow-up: bun run test:env and bun run build passed. bun run test:smoke:dist failed outside this tokenizer change in dist/core/services/overlay-manager.test.js because current dirty overlay-window code calls window.getTitle() on a test mock that does not provide it.

Final Summary

Summary:

  • Cleared isKnown for grammar-excluded subtitle tokens in the tokenizer annotation stage, preventing standalone particles such as from rendering as known just because a known-word deck contains a matching reading.
  • Added a focused regression test for the known-word-cache false positive and updated tokenizer expectations so helper/grammar spans consistently clear all subtitle annotations.
  • Added changelog fragment changes/338-known-word-particle-highlights.md.

Verification:

  • bun test src/core/services/tokenizer/annotation-stage.test.ts --test-name-pattern "clears known status from standalone particles" failed before the production patch.
  • bun test src/core/services/tokenizer/annotation-stage.test.ts
  • bun test src/core/services/tokenizer.test.ts
  • bun test src/core/services/tokenizer/annotation-stage.test.ts src/core/services/tokenizer.test.ts
  • bun run typecheck
  • bun run changelog:lint
  • bun run test:fast
  • bun run test:env
  • bun run build

Blocked/External:

  • bun run test:smoke:dist currently fails outside this tokenizer change in dist/core/services/overlay-manager.test.js: dirty overlay-window code calls window.getTitle() on a test mock without that method.