Files
SubMiner/src/anki-integration/known-word-cache.test.ts
T
sudacode 430373f010 feat(tokenizer): use Yomitan word classes for subtitle POS filtering (#57)
* feat(tokenizer): use Yomitan word classes for subtitle POS filtering

- Carry matched headword wordClasses from termsFind into YomitanScanToken
- Map recognized Yomitan wordClasses to SubMiner coarse POS before annotation
- MeCab enrichment now fills only missing POS fields, preserving existing coarse pos1
- Exclude standalone grammar particles, して helper fragments, and single-kana surfaces from annotations
- Respect source-text punctuation gaps when counting N+1 sentence words
- Preserve known-word highlight on excluded kanji-containing tokens
- Add backlog tasks 304 (N+1 boundary bug) and 305 (wordClasses POS, done)

* fix(tokenizer): preserve annotation and enrichment behavior

* fix: restore jlpt subtitle underlines

* fix: exclude kana-only n+1 targets

* fix: refresh overlay on Hyprland fullscreen

* fix: address fullscreen and n-plus-one review notes

* fix: address CodeRabbit review comments

* fix: accept modified digits for multi-line sentence mining

* Cancel pending Linux MPV fullscreen overlay refresh bursts

- return a cancel handle from the Linux refresh burst scheduler
- clear pending refresh bursts when overlays hide or windows close
- tighten the burst test polling to wait for the async refresh

* fix: suppress N+1 for kana-only candidates and fix minSentenceWords coun

- Treat kana-only tokens with surrounding subtitle punctuation (…, ―, etc.) as kana-only so they are not promoted to N+1 targets
- Exclude unknown tokens filtered from N+1 targeting from the minSentenceWords count so filtered kana-only unknowns cannot satisfy sentence length threshold
- Add regression tests for kana-only candidate suppression and filtered-unknown padding cases

* Suppress subtitle annotations for grammar fragments

- Hide annotation metadata for auxiliary inflection and ja-nai endings
- Preserve lexical `くれる` forms and add regression coverage

* Fix kana-only N+1 tokenizer regression test

- Use a pure-kana fixture for the subtitle token N+1 case
- Update task notes for the latest CodeRabbit follow-up

* Fix managed playback exit and tokenizer grammar splits

- Ignore background stats daemons during regular app startup
- Split standalone grammar endings before applying annotations
- Clear helper-span annotations for auxiliary-only tokens

* fix: refresh current subtitle after known-word mining

* fix: suppress sigh interjection annotations

* fix: preserve jlpt underline color after lookup

* Replace grammar-ending permutations with shared matcher; preserve word a

- Extract `grammar-ending.ts` with `isStandaloneGrammarEndingText` / `isSubtitleGrammarEndingText` pattern matchers
- Replace `STANDALONE_GRAMMAR_ENDINGS` set in parser-selection-stage with shared matcher
- Replace generated phrase sets in subtitle-annotation-filter with shared matcher
- Remove stale duplicate subtitle-exclusion constants and helpers from annotation-stage
- Manual clipboard card updates now write only to the sentence audio field, leaving word/expression audio untouched

* fix: CI changelog, annotation options threading, and Jellyfin quit

- Add `type: fixed` / `area:` frontmatter to `changes/319` to pass `changelog:lint`
- Thread `TokenizerAnnotationOptions` through `stripSubtitleAnnotationMetadata` so `sourceText` is honored
- Include `jellyfinPlay` in `shouldQuitOnDisconnectWhenOverlayRuntimeInitialized` predicate
- Make mouse test `elementFromPoint` stubs coordinate-sensitive
- Make Lua test `.tmp` mkdir portable on Windows

* Preserve overlay across macOS flaps and mpv playlist changes

- keep visible overlays alive during transient macOS tracker loss
- reuse the running mpv overlay path on playlist navigation
- update regression coverage and changelog fragments

* fix: restore stats daemon deferral

* fix: keep subtitle prefetch alive after cache hits

* Fix JLPT underline color drift and AniList skipped-threshold sync

- Replace JLPT `text-decoration` underlines with `border-bottom` so Chromium selection/hover cannot repaint them to another annotation's color
- Lock JLPT underline color for combined annotation selectors (known, n+1, frequency) and character hover/selection states
- Trigger AniList post-watch check on every mpv time-position update to catch skipped completion thresholds
- Fall back to filename-parser season/episode when guessit omits them

* fix: address coderabbit feedback

* fix: sync AniList after seeked completion

* fix: preserve ordinal frequency annotations

* fix: preserve known highlighting for filtered tokens

* fix: address PR #57 CodeRabbit feedback

- Acquire AniList post-watch in-flight lock before async gating to prevent duplicate writes
- Isolate manual watched mark result from AniList post-watch callback failures
- Report known-word cache clears as mutations during immediate append when state existed
- Add regression tests for each fix

* fix: stop AniList setup reopening on Linux when keyring token exists

- Gate setup success on token persistence: `saveToken` now returns `boolean`; on failure, keeps the setup window open instead of reporting success
- Config reload passes `allowSetupPrompt: false` so playback reloads don't re-open the setup window
- Add regression test for persistence-failure path

* fix: suppress known highlights for subtitle particles

* fix: retry transient AniList safeStorage failures

* fix: hide overlay focus ring

* fix: align Hyprland fullscreen overlays

* fix: restore subtitle playback keybindings

* fix: align Hyprland overlay windows to mpv and stop pinning them

- Force-apply exact Hyprland move/resize/setprop dispatches when bounds are provided
- Stop pinning overlay windows; toggle pin off when Hyprland reports pinned=true
- Compensate stats overlay outer placement for Electron/Wayland content insets
- Make stats overlay window and page opaque so mpv cannot show through transparent insets
- Constrain stats app to h-screen with internal scroll so content covers mpv from y=0
- Lock overlay/stats window titles against page-title-updated events
- Add regression coverage for placement dispatches, inset compensation, and CSS overlay mode

* fix: retain frequency rank for honorific prefix-noun tokens

- Add `shouldAllowHonorificPrefixNounFrequency` to exempt お/ご/御 + noun merged tokens from frequency exclusion
- Add regression test for `ご機嫌` asserting rank 5484 is preserved after MeCab enrichment and annotation
- Close TASK-341

* fix: map openCharacterDictionary session action to --open-character-dict

- Add missing Lua CLI dispatch entry for openCharacterDictionary
- Add regression test for Alt+Meta+A binding and CLI flag forwarding

* fix: keep macOS overlay interactive while mpv remains active

- Overlay no longer hides or becomes click-through during tracker refreshes when mpv is the focused window
- Preserve already-visible overlay when tracker is temporarily not ready but mpv target signal is active
- Add regression tests for active-mpv tracker refresh and transient tracker-not-ready paths

* fix: address coderabbit subtitle follow-ups

* fix: resolve media detail from sessions when lifetime summary is absent

- Change `getMediaDetail` JOIN to LEFT JOIN on `imm_lifetime_media` and fall back to aggregated session metrics when no lifetime row exists
- Add filter `AND (lm.video_id IS NOT NULL OR s.session_id IS NOT NULL)` to keep results valid
- Add regression test covering the session-visible / media-detail-missing mismatch

* fix: address PR-57 CodeRabbit findings and CI failures

- use filtered word counts in media detail session token aggregation
- cancel fullscreen refresh burst on exit via updateLinuxMpvFullscreenOverlayRefreshBurst
- guard Hyprland JSON.parse in try/catch; exclude windowtitle from geometry events
- narrow focus suppression from :focus to :focus-visible
- apply JLPT lock selectors to word-name-match tokens (N1–N5)

* fix: macOS overlay z-order and Yomitan compound token known highlighting

- Release always-on-top when tracked mpv loses foreground on macOS
- Skip visible overlay blur restacking on macOS to avoid covering unrelated windows
- Prefer Yomitan internal parse tokens over fragmented scanner output for known-word decisions
- Add regression tests for both behaviors

* fix: macOS visible-overlay blur no longer invokes Windows-only blur call

- Split win32/darwin branches in handleOverlayWindowBlurred so darwin visible blur returns early without calling onWindowsVisibleOverlayBlur
- Add regression test asserting Windows callback stays inactive on macOS visible overlay blur
- Close TASK-347
2026-05-12 12:08:09 -07:00

591 lines
14 KiB
TypeScript

import test from 'node:test';
import assert from 'node:assert/strict';
import fs from 'node:fs';
import os from 'node:os';
import path from 'node:path';
import type { AnkiConnectConfig } from '../types/anki';
import { KnownWordCacheManager } from './known-word-cache';
async function waitForCondition(
condition: () => boolean,
timeoutMs = 500,
intervalMs = 10,
): Promise<void> {
const startedAt = Date.now();
while (Date.now() - startedAt < timeoutMs) {
if (condition()) {
return;
}
await new Promise((resolve) => setTimeout(resolve, intervalMs));
}
throw new Error('Timed out waiting for condition');
}
function createKnownWordCacheHarness(config: AnkiConnectConfig): {
manager: KnownWordCacheManager;
calls: {
findNotes: number;
notesInfo: number;
};
statePath: string;
clientState: {
findNotesResult: number[];
notesInfoResult: Array<{ noteId: number; fields: Record<string, { value: string }> }>;
findNotesByQuery: Map<string, number[]>;
};
cleanup: () => void;
} {
const stateDir = fs.mkdtempSync(path.join(os.tmpdir(), 'subminer-known-word-cache-'));
const statePath = path.join(stateDir, 'known-words-cache.json');
const calls = {
findNotes: 0,
notesInfo: 0,
};
const clientState = {
findNotesResult: [] as number[],
notesInfoResult: [] as Array<{ noteId: number; fields: Record<string, { value: string }> }>,
findNotesByQuery: new Map<string, number[]>(),
};
const manager = new KnownWordCacheManager({
client: {
findNotes: async (query) => {
calls.findNotes += 1;
if (clientState.findNotesByQuery.has(query)) {
return clientState.findNotesByQuery.get(query) ?? [];
}
return clientState.findNotesResult;
},
notesInfo: async (noteIds) => {
calls.notesInfo += 1;
return clientState.notesInfoResult.filter((note) => noteIds.includes(note.noteId));
},
},
getConfig: () => config,
knownWordCacheStatePath: statePath,
showStatusNotification: () => undefined,
});
return {
manager,
calls,
statePath,
clientState,
cleanup: () => {
fs.rmSync(stateDir, { recursive: true, force: true });
},
};
}
test('KnownWordCacheManager startLifecycle keeps fresh persisted cache without immediate refresh', async () => {
const config: AnkiConnectConfig = {
knownWords: {
highlightEnabled: true,
refreshMinutes: 60,
},
};
const { manager, calls, statePath, cleanup } = createKnownWordCacheHarness(config);
const originalDateNow = Date.now;
try {
Date.now = () => 120_000;
fs.writeFileSync(
statePath,
JSON.stringify({
version: 2,
refreshedAtMs: 120_000,
scope: '{"refreshMinutes":60,"scope":"is:note","fieldsWord":""}',
words: ['猫'],
notes: {
'1': ['猫'],
},
}),
'utf-8',
);
manager.startLifecycle();
assert.equal(manager.isKnownWord('猫'), true);
assert.equal(calls.findNotes, 0);
assert.equal(calls.notesInfo, 0);
assert.equal(
(
manager as unknown as {
getMsUntilNextRefresh: () => number;
}
).getMsUntilNextRefresh() > 0,
true,
);
} finally {
Date.now = originalDateNow;
manager.stopLifecycle();
cleanup();
}
});
test('KnownWordCacheManager startLifecycle immediately refreshes stale persisted cache', async () => {
const config: AnkiConnectConfig = {
fields: {
word: 'Word',
},
knownWords: {
highlightEnabled: true,
refreshMinutes: 1,
},
};
const { manager, calls, statePath, clientState, cleanup } = createKnownWordCacheHarness(config);
const originalDateNow = Date.now;
try {
Date.now = () => 120_000;
fs.writeFileSync(
statePath,
JSON.stringify({
version: 2,
refreshedAtMs: 59_000,
scope: '{"refreshMinutes":1,"scope":"is:note","fieldsWord":"Word"}',
words: ['猫'],
notes: {
'1': ['猫'],
},
}),
'utf-8',
);
clientState.findNotesResult = [1];
clientState.notesInfoResult = [
{
noteId: 1,
fields: {
Word: { value: '犬' },
},
},
];
manager.startLifecycle();
await waitForCondition(() => calls.findNotes === 1 && calls.notesInfo === 1);
assert.equal(manager.isKnownWord('猫'), false);
assert.equal(manager.isKnownWord('犬'), true);
} finally {
Date.now = originalDateNow;
manager.stopLifecycle();
cleanup();
}
});
test('KnownWordCacheManager invalidates persisted cache when fields.word changes', () => {
const config: AnkiConnectConfig = {
deck: 'Mining',
fields: {
word: 'Word',
},
knownWords: {
highlightEnabled: true,
},
};
const { manager, cleanup } = createKnownWordCacheHarness(config);
try {
manager.appendFromNoteInfo({
noteId: 1,
fields: {
Word: { value: '猫' },
},
});
assert.equal(manager.isKnownWord('猫'), true);
config.fields = {
...config.fields,
word: 'Expression',
};
(
manager as unknown as {
loadKnownWordCacheState: () => void;
}
).loadKnownWordCacheState();
assert.equal(manager.isKnownWord('猫'), false);
} finally {
cleanup();
}
});
test('KnownWordCacheManager refresh incrementally reconciles deleted and edited note words', async () => {
const config: AnkiConnectConfig = {
fields: {
word: 'Word',
},
knownWords: {
highlightEnabled: true,
},
};
const { manager, statePath, clientState, cleanup } = createKnownWordCacheHarness(config);
try {
fs.writeFileSync(
statePath,
JSON.stringify({
version: 2,
refreshedAtMs: 1,
scope: '{"refreshMinutes":1440,"scope":"is:note","fieldsWord":"Word"}',
words: ['猫', '犬'],
notes: {
'1': ['猫'],
'2': ['犬'],
},
}),
'utf-8',
);
(
manager as unknown as {
loadKnownWordCacheState: () => void;
}
).loadKnownWordCacheState();
clientState.findNotesResult = [1];
clientState.notesInfoResult = [
{
noteId: 1,
fields: {
Word: { value: '鳥' },
},
},
];
await manager.refresh(true);
assert.equal(manager.isKnownWord('猫'), false);
assert.equal(manager.isKnownWord('犬'), false);
assert.equal(manager.isKnownWord('鳥'), true);
const persisted = JSON.parse(fs.readFileSync(statePath, 'utf-8')) as {
version: number;
words: string[];
notes?: Record<string, string[]>;
};
assert.equal(persisted.version, 2);
assert.deepEqual(persisted.words.sort(), ['鳥']);
assert.deepEqual(persisted.notes, {
'1': ['鳥'],
});
} finally {
cleanup();
}
});
test('KnownWordCacheManager skips malformed note info without fields', async () => {
const config: AnkiConnectConfig = {
fields: {
word: 'Word',
},
knownWords: {
highlightEnabled: true,
},
};
const { manager, clientState, cleanup } = createKnownWordCacheHarness(config);
try {
clientState.findNotesResult = [1, 2];
clientState.notesInfoResult = [
{
noteId: 1,
fields: undefined as unknown as Record<string, { value: string }>,
},
{
noteId: 2,
fields: {
Word: { value: '猫' },
},
},
];
await manager.refresh(true);
assert.equal(manager.isKnownWord('猫'), true);
assert.equal(manager.isKnownWord('犬'), false);
} finally {
cleanup();
}
});
test('KnownWordCacheManager preserves cache state key captured before refresh work', async () => {
const config: AnkiConnectConfig = {
fields: {
word: 'Word',
},
knownWords: {
highlightEnabled: true,
refreshMinutes: 1,
},
};
const stateDir = fs.mkdtempSync(path.join(os.tmpdir(), 'subminer-known-word-cache-key-'));
const statePath = path.join(stateDir, 'known-words-cache.json');
let notesInfoStarted = false;
let releaseNotesInfo!: () => void;
const notesInfoGate = new Promise<void>((resolve) => {
releaseNotesInfo = resolve;
});
const manager = new KnownWordCacheManager({
client: {
findNotes: async () => [1],
notesInfo: async () => {
notesInfoStarted = true;
await notesInfoGate;
return [
{
noteId: 1,
fields: {
Word: { value: '猫' },
},
},
];
},
},
getConfig: () => config,
knownWordCacheStatePath: statePath,
showStatusNotification: () => undefined,
});
try {
const refreshPromise = manager.refresh(true);
await waitForCondition(() => notesInfoStarted);
config.fields = {
...config.fields,
word: 'Expression',
};
releaseNotesInfo();
await refreshPromise;
const persisted = JSON.parse(fs.readFileSync(statePath, 'utf-8')) as {
scope: string;
words: string[];
};
assert.equal(persisted.scope, '{"refreshMinutes":1,"scope":"is:note","fieldsWord":"Word"}');
assert.deepEqual(persisted.words, ['猫']);
} finally {
fs.rmSync(stateDir, { recursive: true, force: true });
}
});
test('KnownWordCacheManager does not borrow fields from other decks during refresh', async () => {
const config: AnkiConnectConfig = {
knownWords: {
highlightEnabled: true,
decks: {
Mining: [],
Reading: ['AltWord'],
},
},
};
const { manager, clientState, cleanup } = createKnownWordCacheHarness(config);
try {
clientState.findNotesByQuery.set('deck:"Mining"', [1]);
clientState.findNotesByQuery.set('deck:"Reading"', []);
clientState.notesInfoResult = [
{
noteId: 1,
fields: {
AltWord: { value: '猫' },
},
},
];
await manager.refresh(true);
assert.equal(manager.isKnownWord('猫'), false);
} finally {
cleanup();
}
});
test('KnownWordCacheManager invalidates persisted cache when per-deck fields change', () => {
const config: AnkiConnectConfig = {
fields: {
word: 'Word',
},
knownWords: {
highlightEnabled: true,
decks: {
Mining: ['Word'],
},
},
};
const { manager, cleanup } = createKnownWordCacheHarness(config);
try {
manager.appendFromNoteInfo({
noteId: 1,
fields: {
Word: { value: '猫' },
},
});
assert.equal(manager.isKnownWord('猫'), true);
config.knownWords = {
...config.knownWords,
decks: {
Mining: ['Expression'],
},
};
(
manager as unknown as {
loadKnownWordCacheState: () => void;
}
).loadKnownWordCacheState();
assert.equal(manager.isKnownWord('猫'), false);
} finally {
cleanup();
}
});
test('KnownWordCacheManager preserves deck-specific field mappings during refresh', async () => {
const config: AnkiConnectConfig = {
knownWords: {
highlightEnabled: true,
decks: {
Mining: ['Expression'],
Reading: ['Word'],
},
},
};
const { manager, clientState, cleanup } = createKnownWordCacheHarness(config);
try {
clientState.findNotesByQuery.set('deck:"Mining"', [1]);
clientState.findNotesByQuery.set('deck:"Reading"', [2]);
clientState.notesInfoResult = [
{
noteId: 1,
fields: {
Expression: { value: '猫' },
Word: { value: 'should-not-count' },
},
},
{
noteId: 2,
fields: {
Word: { value: '犬' },
Expression: { value: 'also-ignored' },
},
},
];
await manager.refresh(true);
assert.equal(manager.isKnownWord('猫'), true);
assert.equal(manager.isKnownWord('犬'), true);
assert.equal(manager.isKnownWord('should-not-count'), false);
assert.equal(manager.isKnownWord('also-ignored'), false);
} finally {
cleanup();
}
});
test('KnownWordCacheManager uses the current deck fields for immediate append', () => {
const config: AnkiConnectConfig = {
deck: 'Mining',
fields: {
word: 'Word',
},
knownWords: {
highlightEnabled: true,
decks: {
Mining: ['Expression'],
Reading: ['Word'],
},
},
};
const { manager, cleanup } = createKnownWordCacheHarness(config);
try {
manager.appendFromNoteInfo({
noteId: 1,
fields: {
Expression: { value: '猫' },
Word: { value: 'should-not-count' },
},
});
assert.equal(manager.isKnownWord('猫'), true);
assert.equal(manager.isKnownWord('should-not-count'), false);
} finally {
cleanup();
}
});
test('KnownWordCacheManager reports immediate append cache clears as mutations', () => {
const config: AnkiConnectConfig = {
fields: {
word: 'Expression',
},
knownWords: {
highlightEnabled: true,
refreshMinutes: 60,
},
};
const { manager, statePath, cleanup } = createKnownWordCacheHarness(config);
try {
fs.writeFileSync(
statePath,
JSON.stringify({
version: 2,
refreshedAtMs: Date.now(),
scope: '{"refreshMinutes":60,"scope":"is:note","fieldsWord":"Expression"}',
words: ['猫'],
notes: {
'1': ['猫'],
},
}),
'utf-8',
);
manager.startLifecycle();
assert.equal(manager.isKnownWord('猫'), true);
config.fields = { word: 'Word' };
const changed = manager.appendFromNoteInfo({
noteId: 2,
fields: {
Word: { value: '' },
},
});
assert.equal(changed, true);
assert.equal(manager.isKnownWord('猫'), false);
} finally {
manager.stopLifecycle();
cleanup();
}
});
test('KnownWordCacheManager skips immediate append when addMinedWordsImmediately is disabled', () => {
const config: AnkiConnectConfig = {
knownWords: {
highlightEnabled: true,
addMinedWordsImmediately: false,
},
};
const { manager, statePath, cleanup } = createKnownWordCacheHarness(config);
try {
manager.appendFromNoteInfo({
noteId: 1,
fields: {
Expression: { value: '猫' },
},
});
assert.equal(manager.isKnownWord('猫'), false);
assert.equal(fs.existsSync(statePath), false);
} finally {
cleanup();
}
});