mirror of
https://github.com/ksyasuda/SubMiner.git
synced 2026-04-27 16:19:35 -07:00
2.8 KiB
2.8 KiB
id, title, status, assignee, created_date, updated_date, labels, dependencies, priority
| id | title | status | assignee | created_date | updated_date | labels | dependencies | priority | |||
|---|---|---|---|---|---|---|---|---|---|---|---|
| TASK-307 | Exclude kana-only words from N+1 subtitle targets | Done |
|
2026-04-27 01:52 | 2026-04-27 01:57 |
|
medium |
Description
Subtitle N+1 annotation is over-targeting kana-only or hiragana/katakana tokens that collapse to dictionary words. Adjust targeting so kana-only tokens are not selected as N+1 candidates, while preserving tokenization/hover behavior and other annotation metadata where existing filters allow it.
Acceptance Criteria
- #1 Kana-only subtitle tokens are not marked as N+1 targets.
- #2 Kanji or mixed lexical tokens can still be marked as N+1 targets when they are the single unknown candidate in a sentence.
- #3 Regression coverage demonstrates the kana-only N+1 exclusion.
Implementation Plan
- Add a failing regression in
src/core/services/tokenizer.test.tsshowing a kana-only Yomitan token is not selected as the single N+1 target, while a mixed lexical token in the same style still can be targeted. - Implement the smallest filter in
src/token-merger.ts: N+1 candidate selection rejects tokens whose surface is entirely kana; word-count behavior remains governed by existing annotation/POS filters. - Run the focused tokenizer tests, then update task acceptance criteria/final summary.
Implementation Notes
Implemented a surface-level kana-only guard in N+1 candidate selection. Kept existing word-count/POS filtering behavior intact; updated tokenizer and annotation-stage expectations where old tests intentionally allowed kana-only N+1 targets.
Final Summary
Summary:
- Added kana-only surface detection to
isNPlusOneCandidateTokenso hiragana/katakana-only subtitle tokens are not selected as N+1 targets. - Added/updated tokenizer and annotation-stage regressions for kana-only targets while preserving non-kana N+1 behavior.
- Added changelog fragment
changes/307-kana-nplusone-targets.md.
Verification:
bun test src/core/services/tokenizer.test.ts --test-name-pattern "kana-only N\+1"failed before the fix withtrue !== false.bun test src/core/services/tokenizer/annotation-stage.test.ts src/core/services/tokenizer.test.tspassed.bun run typecheckpassed.bun run test:fastpassed.bun run changelog:lintpassed.bunx prettier --check src/core/services/tokenizer.test.ts src/core/services/tokenizer/annotation-stage.test.ts src/token-merger.ts changes/307-kana-nplusone-targets.mdpassed.