mirror of
https://github.com/ksyasuda/SubMiner.git
synced 2026-02-27 18:22:41 -08:00
- map segmented Yomitan lines into single logical tokens and improve candidate selection heuristics - limit frequency lookup to selected token text with POS-based exclusions and add debug logging hook - add standalone Yomitan parser test script, deterministic utility-script shutdown, and docs/backlog updates
1.3 KiB
1.3 KiB
id, title, status, assignee, created_date, updated_date, labels, dependencies
| id | title | status | assignee | created_date | updated_date | labels | dependencies |
|---|---|---|---|---|---|---|---|
| TASK-59 | Restrict Yomitan frequency lookup to selected headword only | Done | 2026-02-16 22:16 | 2026-02-16 22:18 |
Description
Update tokenizer and related scripts/tests so frequency lookup no longer uses Yomitan headword variant lists and instead only uses the selected headword returned by Yomitan.
Acceptance Criteria
- #1 Frequency ranking for Yomitan tokens uses only the token headword (with existing fallback behavior) and not
frequencyLookupTermsvariants. - #2 Tokenizer tests reflect the new headword-only lookup behavior.
- #3 Parser testing script output no longer implies variant-based frequency lookup.
Final Summary
Updated frequency lookup to use only the selected token lookup text (headword first, fallback to reading/surface only when headword is absent) and removed Yomitan variant-term usage. Removed frequencyLookupTerms from token mapping/types, updated tokenizer tests for headword-only behavior, and aligned helper scripts (scripts/get_frequency.ts, scripts/test-yomitan-parser.ts) so diagnostics/output no longer imply variant-based lookup.