Files
SubMiner/backlog/tasks/task-75 - Tokenizer-configurable-POS-exclusions-for-N1-and-frequency-annotations.md
2026-03-01 02:36:51 -08:00

1.6 KiB

id, title, status, assignee, created_date, updated_date, labels, dependencies, priority, ordinal
id title status assignee created_date updated_date labels dependencies priority ordinal
TASK-75 Tokenizer: configurable POS exclusions for N+1 and frequency annotations Done
2026-03-01 01:23 2026-03-01 04:14
medium 6000

Description

N+1 and frequency highlighting should ignore non-learning tokens (e.g., particles/auxiliary forms) based on MeCab POS1 tags, while remaining user-configurable.

Problem example: for subtitle phrase containing になれば, the highlighted N+1 target should not be the non-useful inflection/token piece when POS indicates an excluded class.

Implement configurable exclusion defaults with add/remove overrides so users can tune behavior without code changes.

Acceptance Criteria

  • #1 Default exclusion set omits non-useful POS1 classes from both N+1 candidate selection and frequency highlighting.
  • #2 Users can add extra POS1 exclusions and remove defaults via config.
  • #3 Tokenizer/annotation tests cover default behavior and config add/remove overrides.

Final Summary

Implemented configurable annotation POS exclusions with defaults+add/remove for both MeCab POS1 and POS2, wired to N+1 candidate selection and frequency highlighting. Added POS2 default exclusion (非自立), expanded POS1 defaults for function words, added Yomitan->MeCab enrichment to carry pos2/pos3 metadata, updated config docs/examples, and added regression tests including になれば case.