mirror of
https://github.com/ksyasuda/SubMiner.git
synced 2026-03-06 19:57:26 -08:00
fix: improve yomitan subtitle name lookup
This commit is contained in:
@@ -0,0 +1,34 @@
|
||||
---
|
||||
id: TASK-93
|
||||
title: Replace subtitle tokenizer with left-to-right Yomitan scanning parser
|
||||
status: Done
|
||||
assignee: []
|
||||
created_date: '2026-03-06 09:02'
|
||||
updated_date: '2026-03-06 09:14'
|
||||
labels:
|
||||
- tokenizer
|
||||
- yomitan
|
||||
- refactor
|
||||
dependencies: []
|
||||
priority: high
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
<!-- SECTION:DESCRIPTION:BEGIN -->
|
||||
Replace the current parseText candidate-selection tokenizer with a GSM-style left-to-right Yomitan scanning tokenizer for all subtitles. Preserve downstream token contracts for rendering, JLPT/frequency/N+1 annotation, and MeCab enrichment while improving full-term matching for names and katakana compounds.
|
||||
<!-- SECTION:DESCRIPTION:END -->
|
||||
|
||||
## Acceptance Criteria
|
||||
<!-- AC:BEGIN -->
|
||||
- [x] #1 Subtitle tokenization uses a left-to-right Yomitan scanning strategy instead of parseText candidate selection.
|
||||
- [x] #2 Token surfaces, readings, headwords, and offsets remain compatible with existing renderer and annotation stages.
|
||||
- [x] #3 Known problematic name cases such as カズマ and バニール resolve to full-token dictionary matches when Yomitan can match them.
|
||||
- [x] #4 Regression tests cover left-to-right exact-match scanning, unmatched text handling, and downstream tokenizeSubtitle integration.
|
||||
<!-- AC:END -->
|
||||
|
||||
## Final Summary
|
||||
|
||||
<!-- SECTION:FINAL_SUMMARY:BEGIN -->
|
||||
Replaced the live subtitle tokenization path with a left-to-right Yomitan `termsFind` scanner that greedily advances through the normalized subtitle text, preserving downstream `MergedToken` contracts for renderer, MeCab enrichment, JLPT, frequency, and N+1 annotation. Added runtime and integration coverage for exact-match scanning plus name cases like カズマ and kept compatibility fallback handling for older mocked parseText-style test payloads.
|
||||
<!-- SECTION:FINAL_SUMMARY:END -->
|
||||
@@ -0,0 +1,40 @@
|
||||
---
|
||||
id: TASK-94
|
||||
title: Add kana aliases for AniList character dictionary entries
|
||||
status: Done
|
||||
assignee: []
|
||||
created_date: '2026-03-06 09:20'
|
||||
updated_date: '2026-03-06 09:23'
|
||||
labels:
|
||||
- dictionary
|
||||
- tokenizer
|
||||
- anilist
|
||||
dependencies: []
|
||||
references:
|
||||
- >-
|
||||
/home/sudacode/projects/japanese/SubMiner/src/main/character-dictionary-runtime.ts
|
||||
- >-
|
||||
/home/sudacode/projects/japanese/SubMiner/src/main/character-dictionary-runtime.test.ts
|
||||
priority: high
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
<!-- SECTION:DESCRIPTION:BEGIN -->
|
||||
Generate katakana/hiragana-friendly aliases from AniList romanized character names so subtitle katakana names like カズマ match character dictionary entries even when AniList native name is kanji.
|
||||
<!-- SECTION:DESCRIPTION:END -->
|
||||
|
||||
## Acceptance Criteria
|
||||
<!-- AC:BEGIN -->
|
||||
- [x] #1 AniList character dictionary generation adds kana aliases for romanized names when native name is not already kana-only
|
||||
- [x] #2 Generated dictionary entries allow katakana subtitle names like カズマ to resolve against a kanji-native AniList character entry
|
||||
- [x] #3 Regression tests cover alias generation and resulting term bank output
|
||||
<!-- AC:END -->
|
||||
|
||||
## Final Summary
|
||||
|
||||
<!-- SECTION:FINAL_SUMMARY:BEGIN -->
|
||||
Added katakana aliases synthesized from AniList romanized character names during character dictionary generation, so kanji-native entries such as 佐藤和真 / Satou Kazuma now also emit terms like カズマ and サトウカズマ with hiragana readings. Added regression coverage verifying generated term-bank output for the Konosuba case.
|
||||
|
||||
Verified with `bun test src/main/character-dictionary-runtime.test.ts` and `bun run tsc --noEmit`.
|
||||
<!-- SECTION:FINAL_SUMMARY:END -->
|
||||
@@ -0,0 +1,39 @@
|
||||
---
|
||||
id: TASK-95
|
||||
title: Invalidate old character dictionary snapshots after kana alias schema change
|
||||
status: Done
|
||||
assignee: []
|
||||
created_date: '2026-03-06 09:25'
|
||||
updated_date: '2026-03-06 09:28'
|
||||
labels:
|
||||
- dictionary
|
||||
- cache
|
||||
dependencies: []
|
||||
references:
|
||||
- >-
|
||||
/home/sudacode/projects/japanese/SubMiner/src/main/character-dictionary-runtime.ts
|
||||
- >-
|
||||
/home/sudacode/projects/japanese/SubMiner/src/main/character-dictionary-runtime.test.ts
|
||||
priority: high
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
<!-- SECTION:DESCRIPTION:BEGIN -->
|
||||
Bump character dictionary snapshot format/version so cached AniList snapshots created before kana alias generation are rebuilt automatically on next auto-sync or generation run.
|
||||
<!-- SECTION:DESCRIPTION:END -->
|
||||
|
||||
## Acceptance Criteria
|
||||
<!-- AC:BEGIN -->
|
||||
- [x] #1 Old cached character dictionary snapshots are treated as invalid after the schema/version bump
|
||||
- [x] #2 Current snapshot generation tests cover rebuild behavior across version mismatch
|
||||
- [x] #3 No manual cache deletion is required for users to pick up kana alias term generation
|
||||
<!-- AC:END -->
|
||||
|
||||
## Final Summary
|
||||
|
||||
<!-- SECTION:FINAL_SUMMARY:BEGIN -->
|
||||
Bumped the character dictionary snapshot format version so cached AniList snapshots created before kana alias generation are automatically treated as stale and rebuilt. Added regression coverage that seeds an older-format snapshot and verifies `getOrCreateCurrentSnapshot` fetches fresh data and overwrites the stale cache.
|
||||
|
||||
Verified with `bun test src/main/character-dictionary-runtime.test.ts` and `bun run tsc --noEmit`.
|
||||
<!-- SECTION:FINAL_SUMMARY:END -->
|
||||
Reference in New Issue
Block a user