SubMiner/backlog/tasks/task-333 - Suppress-aru-subtitle-annotations.md at 745996c72d067b757c26d555fb0142a13c94ee7e - SubMiner

sudacode/SubMiner

Fork 0

mirror of https://github.com/ksyasuda/SubMiner.git synced 2026-05-04 00:41:33 -07:00

Files

T

sudacode 9bcea2fc5f

fix: preserve known highlighting for filtered tokens

2026-05-04 00:06:27 -07:00

3.1 KiB

Raw Blame History

id, title, status, assignee, created_date, updated_date, labels, dependencies, priority

title

status

assignee

created_date

updated_date

labels

dependencies

priority

TASK-333

Suppress aru subtitle annotations

Done

2026-05-04 04:39

2026-05-04 05:02

tokenizer

annotations

bug

medium

Description

Add ある / 有る to the subtitle annotation suppression path so aru tokens remain hoverable and never receive N+1, JLPT, frequency, or name-match annotation metadata. Known-word highlighting is special: if a filtered aru token is known and known highlighting is enabled, it should still render as known.

Acceptance Criteria

#1 ある and kanji headword/surface variants such as 有る are excluded by the subtitle annotation filter.
#2 Annotation stripping clears N+1, JLPT, frequency, and name metadata for aru tokens while preserving token hover data.
#3 Known-word highlighting still applies to filtered tokens, including aru, when known-word lookup marks them known.
#4 Regression coverage fails before the fix and passes after.

Implementation Plan

Add ある/有る/在る to the shared subtitle annotation hard-exclusion terms.
Preserve/recompute known-word status for filtered tokens while stripping N+1, JLPT, frequency, and name metadata.
Add RED/GREEN unit and tokenizer regression coverage, plus a changelog fragment.
Run targeted tests and full handoff gate.

Implementation Notes

TDD path: added failing annotation-stage coverage first. Initial implementation made targeted tests pass, then broader tokenizer coverage revealed an older fixture expecting ある to remain lexical; updated that integration expectation to the new requested behavior. Follow-up correction: known-word highlighting is the lone annotation exception for filtered tokens, so the strip path now preserves known state and annotateTokens recomputes known status for filtered tokens while still clearing N+1/JLPT/frequency/name metadata.

Final Summary

Suppressed non-known subtitle annotations for aru existence verbs by adding ある, 有る, and 在る to the shared hard-exclusion list. Corrected the filtered-token path so known-word highlighting still applies whenever known highlighting is enabled; filtered tokens now keep/gain isKnown but still lose N+1, JLPT, frequency, and name metadata.

Added and updated annotation-stage and tokenizer regression coverage for aru, particles, helper fragments, interjections, and other filtered known tokens. Added changes/333-aru-annotation-filter.md.

Validation passed: RED failures observed before implementation/correction; bun test src/core/services/tokenizer/annotation-stage.test.ts; bun test src/core/services/tokenizer.test.ts; bun run typecheck; bun run format:check:src; bun run changelog:lint; bun run test:fast; bun run test:env; bun run build; bun run test:smoke:dist.

3.1 KiB Raw Blame History