Files
SubMiner/backlog/tasks/task-75 - Tokenizer-configurable-POS-exclusions-for-N1-and-frequency-annotations.md
2026-02-28 21:15:26 -08:00

42 lines
1.6 KiB
Markdown

---
id: TASK-75
title: 'Tokenizer: configurable POS exclusions for N+1 and frequency annotations'
status: Done
assignee: []
created_date: '2026-03-01 01:23'
updated_date: '2026-03-01 04:14'
labels: []
dependencies: []
priority: medium
ordinal: 6000
---
## Description
<!-- SECTION:DESCRIPTION:BEGIN -->
N+1 and frequency highlighting should ignore non-learning tokens (e.g., particles/auxiliary forms) based on MeCab POS1 tags, while remaining user-configurable.
Problem example: for subtitle phrase containing になれば, the highlighted N+1 target should not be the non-useful inflection/token piece when POS indicates an excluded class.
Implement configurable exclusion defaults with add/remove overrides so users can tune behavior without code changes.
<!-- SECTION:DESCRIPTION:END -->
## Acceptance Criteria
<!-- AC:BEGIN -->
- [x] #1 Default exclusion set omits non-useful POS1 classes from both N+1 candidate selection and frequency highlighting.
- [x] #2 Users can add extra POS1 exclusions and remove defaults via config.
- [x] #3 Tokenizer/annotation tests cover default behavior and config add/remove overrides.
<!-- AC:END -->
## Final Summary
<!-- SECTION:FINAL_SUMMARY:BEGIN -->
Implemented configurable annotation POS exclusions with defaults+add/remove for both MeCab POS1 and POS2, wired to N+1 candidate selection and frequency highlighting. Added POS2 default exclusion (非自立), expanded POS1 defaults for function words, added Yomitan->MeCab enrichment to carry pos2/pos3 metadata, updated config docs/examples, and added regression tests including になれば case.
<!-- SECTION:FINAL_SUMMARY:END -->