mirror of
https://github.com/ksyasuda/SubMiner.git
synced 2026-04-26 04:19:27 -07:00
- Carry matched headword wordClasses from termsFind into YomitanScanToken - Map recognized Yomitan wordClasses to SubMiner coarse POS before annotation - MeCab enrichment now fills only missing POS fields, preserving existing coarse pos1 - Exclude standalone grammar particles, して helper fragments, and single-kana surfaces from annotations - Respect source-text punctuation gaps when counting N+1 sentence words - Preserve known-word highlight on excluded kanji-containing tokens - Add backlog tasks 304 (N+1 boundary bug) and 305 (wordClasses POS, done)
1.2 KiB
1.2 KiB
id, title, status, assignee, created_date, labels, dependencies, priority
| id | title | status | assignee | created_date | labels | dependencies | priority | |||
|---|---|---|---|---|---|---|---|---|---|---|
| TASK-304 | Fix N+1 sentence boundary counting across Yomitan punctuation gaps | In Progress | 2026-04-26 05:33 |
|
medium |
Description
N+1 target selection should respect sentence-ending punctuation from the original subtitle text even when Yomitan token output omits punctuation tokens. Current behavior can treat multiple subtitle sentences as one token span and incorrectly satisfy the minimum content-token threshold.
Acceptance Criteria
- #1 A subtitle like
てんめ!ふざけんなよ!does not markふざけん/similar single-content-token second sentence as N+1 when the minimum sentence word count is 3. - #2 N+1 sentence segmentation uses original subtitle text offsets or equivalent source-boundary data, not only punctuation tokens returned by Yomitan.
- #3 Existing annotation exclusion behavior for particles/grammar tokens remains unchanged.
- #4 Regression tests cover Yomitan-style token streams where punctuation is absent from the token list.