- Add `shouldAllowHonorificPrefixNounFrequency` to exempt お/ご/御 + noun merged tokens from frequency exclusion - Add regression test for `ご機嫌` asserting rank 5484 is preserved after MeCab enrichment and annotation - Close TASK-341
3.9 KiB
id, title, status, assignee, created_date, updated_date, labels, dependencies, documentation, priority
| id | title | status | assignee | created_date | updated_date | labels | dependencies | documentation | priority | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TASK-341 | Fix frequency highlight for honorific prefix noun tokens | Done |
|
2026-05-05 02:08 | 2026-05-05 02:10 |
|
|
high |
Description
User reported subtitle token ご機嫌 in (フランク)ご機嫌が良くないようだな アンドリュー shows Yomitan/JPDB rank 5484 in popup but is not highlighted as frequent. Frequency annotation currently excludes merged tokens containing default-excluded POS parts such as 接頭詞; ordinal prefix-noun tokens already have an exception. Desired outcome: honorific prefix + noun lexical tokens like ご機嫌 keep their valid frequency rank so renderer can apply frequent-token styling, while standalone prefixes and noisy merged grammar fragments remain excluded.
Acceptance Criteria
- #1
ご機嫌-style honorific prefix + noun tokens retain a finite frequency rank after annotation/tokenization when frequency highlighting is enabled. - #2 Standalone prefix/noise tokens remain excluded from frequency annotation.
- #3 Regression test covers the reported
ご機嫌rank 5484 behavior. - #4 Relevant tokenizer/annotation tests pass.
Implementation Plan
- Add a failing regression around honorific prefix + noun token frequency retention, using
ご機嫌with rank 5484 and POS接頭詞|名詞/名詞接続|一般. - Implement a narrow annotation-stage exception for lexical honorific prefix-noun tokens, adjacent to the existing ordinal prefix-noun allowance.
- Verify standalone prefix/noise exclusion behavior remains covered.
- Run targeted tokenizer/annotation tests and update acceptance criteria/final notes.
Implementation Notes
TDD red verified: bun test src/core/services/tokenizer.test.ts -t "honorific prefix-noun" failed with actual: undefined, expected: 5484 before implementation.
Implemented a narrow honorific prefix-noun frequency allowance for merged お/ご/御 + noun tokens with POS 接頭詞|名詞 and prefix POS2 名詞接続. Existing standalone prefix/noise exclusion tests still pass.
Verification: bun test src/core/services/tokenizer.test.ts src/core/services/tokenizer/annotation-stage.test.ts passed (164 tests); bun run typecheck passed; bunx prettier --check src/core/services/tokenizer/annotation-stage.ts src/core/services/tokenizer.test.ts passed. Repo-wide bun run format:check:src still fails on pre-existing src/core/services/stats-window.ts formatting.
Final Summary
Fixed frequency annotation for lexical honorific prefix-noun tokens such as ご機嫌. The annotation filter now allows merged お/ご/御 prefix + noun tokens with MeCab POS 接頭詞|名詞 / 名詞接続|... to retain a valid frequency rank, while standalone prefixes and existing noise filters remain excluded.
Added a tokenizer regression for the reported ご機嫌 case asserting rank 5484 is preserved after MeCab enrichment and annotation.
Verification:
bun test src/core/services/tokenizer.test.ts -t "honorific prefix-noun"failed before the fix withundefinedvs5484, then passed after the fix.bun test src/core/services/tokenizer.test.ts src/core/services/tokenizer/annotation-stage.test.tspassed (164 tests).bun run typecheckpassed.bunx prettier --check src/core/services/tokenizer/annotation-stage.ts src/core/services/tokenizer.test.tspassed.
Note: repo-wide bun run format:check:src currently fails on unrelated existing formatting in src/core/services/stats-window.ts.