Files
SubMiner/backlog/tasks/task-60 - Remove-hard-coded-particle-term-exclusions-from-frequency-lookup.md
2026-02-17 22:54:09 -08:00

1.2 KiB

id, title, status, assignee, created_date, updated_date, labels, dependencies, ordinal
id title status assignee created_date updated_date labels dependencies ordinal
TASK-60 Remove hard-coded particle term exclusions from frequency lookup Done
2026-02-16 22:20 2026-02-18 04:11
25000

Description

Update tokenizer frequency filtering to rely on MeCab POS information instead of a hard-coded set of particle surface forms.

Acceptance Criteria

  • #1 FREQUENCY_EXCLUDED_PARTICLES hard-coded term list is removed.
  • #2 Frequency exclusion for particles/auxiliaries is driven by POS metadata.
  • #3 Tokenizer tests cover POS-driven exclusion behavior.

Final Summary

Removed hard-coded particle surface exclusions (FREQUENCY_EXCLUDED_PARTICLES) from tokenizer frequency logic. Frequency skip now relies on POS metadata only: partOfSpeech (particle/bound_auxiliary) and MeCab-enriched pos1 (助詞/助動詞) for Yomitan tokens. Added tokenizer test tokenizeSubtitleService skips frequency rank when Yomitan token is enriched as particle by mecab pos1 to validate POS-driven exclusion.