4.0 KiB
id, title, status, assignee, created_date, updated_date, labels, dependencies, references, priority
| id | title | status | assignee | created_date | updated_date | labels | dependencies | references | priority | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TASK-338 | Fix known-word highlight on standalone subtitle particles | Done |
|
2026-05-04 05:52 | 2026-05-04 05:57 |
|
|
medium |
Description
Standalone grammar particles such as に should not render as known-word green when they appear in the known-word cache as readings for other words. Keep known-word coloring for lexical tokens, but prevent grammar-excluded subtitle tokens from getting known-green.
Acceptance Criteria
- #1 Standalone grammar particles like に do not retain isKnown after subtitle annotation filtering.
- #2 Lexical known-word tokens still render as known when not grammar-excluded.
- #3 Focused regression test covers the particle false-positive path.
Implementation Plan
- Add a focused regression in
src/core/services/tokenizer/annotation-stage.test.tsshowing standalone particleにis grammar-excluded and does not retainisKnowneven whenisKnownWord('に')is true. - Run the focused tokenizer annotation test and confirm the new test fails for the current behavior.
- Patch
src/core/services/tokenizer/annotation-stage.tsso grammar-excluded tokens clear known status while still stripping N+1/frequency/JLPT/name metadata. - Run the focused test file, then inspect diff and update task acceptance criteria.
Implementation Notes
Implemented tokenizer annotation filtering so grammar-excluded subtitle tokens clear known-word status instead of retaining green known coloring. Added focused regression for known-word-cache particle false positive and updated existing expectations for unified annotation clearing. Verification: bun test src/core/services/tokenizer/annotation-stage.test.ts --test-name-pattern "clears known status from standalone particles" failed before the production patch; after patch, bun test src/core/services/tokenizer/annotation-stage.test.ts, bun test src/core/services/tokenizer.test.ts, combined tokenizer tests, bun run typecheck, bun run changelog:lint, and bun run test:fast passed.
Full handoff gate follow-up: bun run test:env and bun run build passed. bun run test:smoke:dist failed outside this tokenizer change in dist/core/services/overlay-manager.test.js because current dirty overlay-window code calls window.getTitle() on a test mock that does not provide it.
Final Summary
Summary:
- Cleared
isKnownfor grammar-excluded subtitle tokens in the tokenizer annotation stage, preventing standalone particles such asにfrom rendering as known just because a known-word deck contains a matching reading. - Added a focused regression test for the known-word-cache false positive and updated tokenizer expectations so helper/grammar spans consistently clear all subtitle annotations.
- Added changelog fragment
changes/338-known-word-particle-highlights.md.
Verification:
bun test src/core/services/tokenizer/annotation-stage.test.ts --test-name-pattern "clears known status from standalone particles"failed before the production patch.bun test src/core/services/tokenizer/annotation-stage.test.tsbun test src/core/services/tokenizer.test.tsbun test src/core/services/tokenizer/annotation-stage.test.ts src/core/services/tokenizer.test.tsbun run typecheckbun run changelog:lintbun run test:fastbun run test:envbun run build
Blocked/External:
bun run test:smoke:distcurrently fails outside this tokenizer change indist/core/services/overlay-manager.test.js: dirty overlay-window code callswindow.getTitle()on a test mock without that method.