Files
SubMiner/backlog/tasks/task-319 - Suppress-annotations-for-expressive-interjection-subtitles.md
T

2.8 KiB

id, title, status, assignee, created_date, updated_date, labels, dependencies, references, priority
id title status assignee created_date updated_date labels dependencies references priority
TASK-319 Suppress annotations for expressive interjection subtitles Done
Codex
2026-05-03 03:18 2026-05-03 03:20
bug
subtitle-annotations
src/core/services/tokenizer/subtitle-annotation-filter.ts
src/core/services/tokenizer/annotation-stage.test.ts
medium

Description

Interjection-only subtitle tokens such as ハァ and はっ should remain hoverable as tokens but must not receive known, N+1, frequency, or JLPT annotation styling. Current behavior can still annotate these forms when dictionary/POS metadata does not trip the existing exclusion gate.

Acceptance Criteria

  • #1 Standalone ハァ/はっ-style interjection tokens have annotation metadata cleared even when dictionary metadata exists.
  • #2 Filtering remains scoped so content-bearing non-interjection tokens still receive annotations.
  • #3 Regression coverage exercises the reported subtitle pattern: ハァ… / (ガーフィール)はっ!

Implementation Plan

  1. Add failing regression coverage around annotation filtering for the reported interjection forms, including katakana ハァ and small-tsu はっ with surrounding subtitle punctuation/name text.
  2. Tighten the shared subtitle annotation exclusion gate so expressive kana interjections clear annotation metadata without relying only on MeCab pos1=感動詞.
  3. Run the focused tokenizer/annotation tests, then update acceptance criteria and notes.

Implementation Notes

Implemented via shared subtitle annotation exclusion term normalization: added はぁ so katakana ハァ normalizes into the existing term gate. Existing small-tsu kana SFX logic already covers はっ. Regression confirms both reported forms clear known/N+1/frequency/JLPT metadata while a normal noun keeps frequency annotation.

Final Summary

Summary:

  • Added a regression for the reported subtitle pattern ハァ… / (ガーフィール)はっ!, with annotation metadata present on both interjection tokens.
  • Extended the shared subtitle annotation exclusion term set so ハァ normalizes to はぁ and is stripped of annotation styling. Existing はっ handling remains covered by small-tsu kana SFX filtering.
  • Added a change fragment for the user-visible bug fix.

Verification:

  • bun test src/core/services/tokenizer/annotation-stage.test.ts
  • bun test src/core/services/tokenizer/annotation-stage.test.ts src/core/services/tokenizer.test.ts src/renderer/subtitle-render.test.ts
  • bun run typecheck