1.6 KiB
id, title, status, assignee, created_date, updated_date, labels, dependencies, priority
| id | title | status | assignee | created_date | updated_date | labels | dependencies | priority |
|---|---|---|---|---|---|---|---|---|
| TASK-75 | Tokenizer: configurable POS exclusions for N+1 and frequency annotations | Done | 2026-03-01 01:23 | 2026-03-01 01:32 | medium |
Description
N+1 and frequency highlighting should ignore non-learning tokens (e.g., particles/auxiliary forms) based on MeCab POS1 tags, while remaining user-configurable.
Problem example: for subtitle phrase containing になれば, the highlighted N+1 target should not be the non-useful inflection/token piece when POS indicates an excluded class.
Implement configurable exclusion defaults with add/remove overrides so users can tune behavior without code changes.
Acceptance Criteria
- #1 Default exclusion set omits non-useful POS1 classes from both N+1 candidate selection and frequency highlighting.
- #2 Users can add extra POS1 exclusions and remove defaults via config.
- #3 Tokenizer/annotation tests cover default behavior and config add/remove overrides.
Final Summary
Implemented configurable annotation POS exclusions with defaults+add/remove for both MeCab POS1 and POS2, wired to N+1 candidate selection and frequency highlighting. Added POS2 default exclusion (非自立), expanded POS1 defaults for function words, added Yomitan->MeCab enrichment to carry pos2/pos3 metadata, updated config docs/examples, and added regression tests including になれば case.