mirror of
https://github.com/ksyasuda/SubMiner.git
synced 2026-02-27 18:22:41 -08:00
1.2 KiB
1.2 KiB
id, title, status, assignee, created_date, updated_date, labels, dependencies, ordinal
| id | title | status | assignee | created_date | updated_date | labels | dependencies | ordinal |
|---|---|---|---|---|---|---|---|---|
| TASK-60 | Remove hard-coded particle term exclusions from frequency lookup | Done | 2026-02-16 22:20 | 2026-02-18 04:11 | 25000 |
Description
Update tokenizer frequency filtering to rely on MeCab POS information instead of a hard-coded set of particle surface forms.
Acceptance Criteria
- #1
FREQUENCY_EXCLUDED_PARTICLEShard-coded term list is removed. - #2 Frequency exclusion for particles/auxiliaries is driven by POS metadata.
- #3 Tokenizer tests cover POS-driven exclusion behavior.
Final Summary
Removed hard-coded particle surface exclusions (FREQUENCY_EXCLUDED_PARTICLES) from tokenizer frequency logic. Frequency skip now relies on POS metadata only: partOfSpeech (particle/bound_auxiliary) and MeCab-enriched pos1 (助詞/助動詞) for Yomitan tokens. Added tokenizer test tokenizeSubtitleService skips frequency rank when Yomitan token is enriched as particle by mecab pos1 to validate POS-driven exclusion.