SubMiner/backlog/tasks/task-174 - Fix-missing-frequency-highlights-for-merged-tokenizer-tokens.md at 46fbea902a38677f930e5d956c273cc65645269c - SubMiner

mirror of https://github.com/ksyasuda/SubMiner.git synced 2026-03-20 12:11:28 -07:00

Files

Harden stats APIs and fix Electron Yomitan debug runtime

- Validate stats session IDs/limits and add AnkiConnect request timeouts
- Stabilize stats window/runtime lifecycle and tighten window security defaults
- Fix Electron CLI debug startup by unsetting `ELECTRON_RUN_AS_NODE` and wiring Yomitan session state
- Expand regression coverage for tracker queries/events ordering and session aggregates
- Update docs for stats dashboard usage and Yomitan lookup troubleshooting

2026-03-17 20:05:07 -07:00

3.6 KiB

Raw Blame History

id, title, status, assignee, created_date, updated_date, labels, dependencies, references, priority

title

status

assignee

created_date

updated_date

labels

dependencies

references

priority

TASK-174

Fix missing frequency highlights for merged tokenizer tokens

In Progress

codex

2026-03-15 10:18

2026-03-15 10:40

bug

tokenizer

frequency-highlighting

/Users/sudacode/projects/japanese/SubMiner/src/core/services/tokenizer.ts

/Users/sudacode/projects/japanese/SubMiner/src/core/services/tokenizer/parser-selection-stage.ts

/Users/sudacode/projects/japanese/SubMiner/src/core/services/tokenizer/yomitan-parser-runtime.ts

/Users/sudacode/projects/japanese/SubMiner/scripts/get_frequency.ts

/Users/sudacode/projects/japanese/SubMiner/scripts/test-yomitan-parser.ts

high

Description

Frequency highlighting can miss words that should color within the configured top-X limit when tokenizer candidate selection keeps merged Yomitan units that combine a content word with trailing function text. The annotation stage then conservatively clears frequency for the whole merged token, so visible high-frequency words lose highlighting. The standalone debug CLIs are also failing to initialize the shared Yomitan runtime, which blocks reliable repro for this class of bug.

Acceptance Criteria

#1 Tokenizer no longer drops frequency highlighting for content words in merged-token cases where a better scanning parse candidate would preserve highlightable tokens.
#2 A regression test covers the reported sentence shape and fails before the fix.
#3 The standalone frequency/parser debug path can initialize the shared Yomitan runtime well enough to reproduce tokenizer output instead of immediately reporting runtime/session wiring errors.

Implementation Plan

Add a regression test for the reported merged-token frequency miss, centered on Yomitan scanning candidate selection and downstream frequency annotation.
Update tokenizer candidate selection so merged content+function tokens do not win over candidates that preserve highlightable content tokens.
Repair the standalone frequency/parser debug scripts so their Electron/Yomitan runtime wiring matches current shared runtime expectations.
Verify with targeted tokenizer/parser tests and the standalone debug repro command.

Implementation Notes

Initial triage: shared frequency class logic looks correct; likely failure is upstream tokenizer candidate selection producing merged content+function tokens that annotation later excludes from frequency. Standalone debug scripts also fail to initialize a usable Electron/Yomitan runtime, blocking reliable repro from the current CLI path.

Repro after fixing the standalone Electron wrapper does not support the original highlight claim for 誰でもいいからかかってこいよ: the tokenizer reports かかってこい with frequencyRank 63098, so it correctly stays uncolored at --color-top-x 10000 and becomes colorable once the threshold is raised above that rank. The concrete bug fixed in this pass is the standalone Electron debug path: package scripts now unset ELECTRON_RUN_AS_NODE, and the scripts normalize Electron imports/guards so get-frequency:electron can reach real Electron/Yomitan runtime state instead of immediately falling back to Node-mode diagnostics. test-yomitan-parser:electron still shows extension/service-worker issues against the existing profile and was not stabilized in this pass.

3.6 KiB Raw Blame History

Description

Acceptance Criteria

Implementation Plan

Implementation Notes

3.6 KiB

Raw Blame History