3.1 KiB
id, title, status, assignee, created_date, updated_date, labels, dependencies, priority
| id | title | status | assignee | created_date | updated_date | labels | dependencies | priority | |||
|---|---|---|---|---|---|---|---|---|---|---|---|
| TASK-333 | Suppress aru subtitle annotations | Done | 2026-05-04 04:39 | 2026-05-04 05:02 |
|
medium |
Description
Add ある / 有る to the subtitle annotation suppression path so aru tokens remain hoverable and never receive N+1, JLPT, frequency, or name-match annotation metadata. Known-word highlighting is special: if a filtered aru token is known and known highlighting is enabled, it should still render as known.
Acceptance Criteria
- #1
あるand kanji headword/surface variants such as有るare excluded by the subtitle annotation filter. - #2 Annotation stripping clears N+1, JLPT, frequency, and name metadata for
arutokens while preserving token hover data. - #3 Known-word highlighting still applies to filtered tokens, including
aru, when known-word lookup marks them known. - #4 Regression coverage fails before the fix and passes after.
Implementation Plan
- Add
ある/有る/在るto the shared subtitle annotation hard-exclusion terms. - Preserve/recompute known-word status for filtered tokens while stripping N+1, JLPT, frequency, and name metadata.
- Add RED/GREEN unit and tokenizer regression coverage, plus a changelog fragment.
- Run targeted tests and full handoff gate.
Implementation Notes
TDD path: added failing annotation-stage coverage first. Initial implementation made targeted tests pass, then broader tokenizer coverage revealed an older fixture expecting ある to remain lexical; updated that integration expectation to the new requested behavior. Follow-up correction: known-word highlighting is the lone annotation exception for filtered tokens, so the strip path now preserves known state and annotateTokens recomputes known status for filtered tokens while still clearing N+1/JLPT/frequency/name metadata.
Final Summary
Suppressed non-known subtitle annotations for aru existence verbs by adding ある, 有る, and 在る to the shared hard-exclusion list. Corrected the filtered-token path so known-word highlighting still applies whenever known highlighting is enabled; filtered tokens now keep/gain isKnown but still lose N+1, JLPT, frequency, and name metadata.
Added and updated annotation-stage and tokenizer regression coverage for aru, particles, helper fragments, interjections, and other filtered known tokens. Added changes/333-aru-annotation-filter.md.
Validation passed: RED failures observed before implementation/correction; bun test src/core/services/tokenizer/annotation-stage.test.ts; bun test src/core/services/tokenizer.test.ts; bun run typecheck; bun run format:check:src; bun run changelog:lint; bun run test:fast; bun run test:env; bun run build; bun run test:smoke:dist.