mirror of
https://github.com/ksyasuda/SubMiner.git
synced 2026-05-04 00:41:33 -07:00
Replace grammar-ending permutations with shared matcher; preserve word a
- Extract `grammar-ending.ts` with `isStandaloneGrammarEndingText` / `isSubtitleGrammarEndingText` pattern matchers - Replace `STANDALONE_GRAMMAR_ENDINGS` set in parser-selection-stage with shared matcher - Replace generated phrase sets in subtitle-annotation-filter with shared matcher - Remove stale duplicate subtitle-exclusion constants and helpers from annotation-stage - Manual clipboard card updates now write only to the sentence audio field, leaving word/expression audio untouched
This commit is contained in:
+13
-4
@@ -5,7 +5,7 @@ status: Done
|
||||
assignee:
|
||||
- codex
|
||||
created_date: '2026-05-03 00:02'
|
||||
updated_date: '2026-05-03 00:31'
|
||||
updated_date: '2026-05-03 06:05'
|
||||
labels:
|
||||
- bug
|
||||
- tokenizer
|
||||
@@ -25,6 +25,7 @@ Standalone `じゃない` grammar ending tokens should not display or persist su
|
||||
- [x] #2 Common polite/question variants such as `じゃないですか` and `ですよ` remain excluded when tokenized as a single ending token.
|
||||
- [x] #3 Regression coverage proves same-line Yomitan segments split content from trailing grammar endings so the content word can be annotated without coloring the ending.
|
||||
- [x] #4 Auxiliary-only helper spans such as `てく` + `れた` in `ベアトリスがいてくれたから` have known-word, N+1, frequency, and JLPT annotation metadata cleared.
|
||||
- [x] #5 Hard-coded grammar-ending phrase permutations are replaced by shared pattern matching, with parser selection and subtitle annotation filtering using the same grammar-ending classifier.
|
||||
<!-- AC:END -->
|
||||
|
||||
## Implementation Plan
|
||||
@@ -35,6 +36,10 @@ Standalone `じゃない` grammar ending tokens should not display or persist su
|
||||
3. Patch the shared subtitle annotation filter so kana-only auxiliary helper spans made only of grammar POS components are excluded while preserving lexical content tokens.
|
||||
4. Re-run targeted tokenizer/annotation tests, then run SubMiner change verification classifier/verifier for the touched files.
|
||||
5. Update TASK-315 acceptance criteria, notes, and final summary with commands and outcomes.
|
||||
|
||||
Replace explicit standalone grammar-ending permutations with a compact shared matcher used by parser selection and annotation filtering.
|
||||
|
||||
Add regression tests first for non-enumerated polite copula / ja-nai variants so the matcher behavior is proven, then refactor implementation and verify targeted lanes.
|
||||
<!-- SECTION:PLAN:END -->
|
||||
|
||||
## Implementation Notes
|
||||
@@ -45,14 +50,18 @@ Implemented as one focused tokenizer fix. Parser selection now splits dictionary
|
||||
2026-05-03: Reopened for approved add-on covering auxiliary-only `てく` + `れた` helper highlighting report.
|
||||
|
||||
2026-05-03: Added regression coverage for `ベアトリスがいてくれたから` where Yomitan emits `てく` + `れた` and MeCab enrichment tags `てく` as `助詞|動詞` / `接続助詞|非自立`. The regression initially failed because `てく` kept `isKnown: true` and `jlptLevel: N4`. Added a shared-filter helper for kana-only particle+non-independent-verb helper spans, preserving lexical `自立` verbs. Verification: `bun test src/core/services/tokenizer/annotation-stage.test.ts`, `bun test src/core/services/tokenizer.test.ts`, `bun test src/core/services/tokenizer/parser-selection-stage.test.ts`, `bun x prettier --check ...`, and `bun run typecheck` passed. SubMiner verifier core lane passed typecheck but `bun run test:fast` failed on unrelated existing cross-suite issues: `window.electronAPI` undefined in `src/renderer/handlers/keyboard.ts` during `src/core/services/subsync.test.ts`, followed by Bun `node:test` nested-test cascade.
|
||||
|
||||
2026-05-03: Reopened for follow-up requested by user: remove hard-coded standalone grammar-ending permutation list and lean on pattern/POS filtering where possible.
|
||||
|
||||
2026-05-03: Added shared `grammar-ending.ts` matcher for polite copula, negative copula, and explanatory endings. Parser selection now uses the standalone-ending matcher instead of `STANDALONE_GRAMMAR_ENDINGS`. Shared subtitle filter now uses the same grammar classifier instead of generated phrase sets. Removed stale duplicate subtitle-exclusion helpers from `annotation-stage.ts`; annotation-stage continues to delegate subtitle exclusion to the shared filter. Verification passed: targeted tokenizer/parser/annotation tests, Prettier check, `bun run typecheck`, `bun run test:fast`, `bun run test:env`, `bun run build`, and `bun run test:smoke:dist`. `bun run changelog:lint` remains blocked by pre-existing malformed fragment `changes/319-interjection-annotation-filter.md`.
|
||||
<!-- SECTION:NOTES:END -->
|
||||
|
||||
## Final Summary
|
||||
|
||||
<!-- SECTION:FINAL_SUMMARY:BEGIN -->
|
||||
Split dictionary-backed trailing grammar ending segments (`です`, `じゃない*`) from preceding Yomitan same-line content before annotation, and added bare `です` to the explicit polite copula exclusion set.
|
||||
Replaced grammar-ending phrase permutations with shared pattern matching. `parser-selection-stage.ts` now splits standalone grammar endings through `grammar-ending.ts` instead of `STANDALONE_GRAMMAR_ENDINGS`; `subtitle-annotation-filter.ts` uses the same classifier for polite copula, negative copula, and explanatory endings instead of generated exact phrase sets.
|
||||
|
||||
Added the approved auxiliary-helper fix for `ベアトリスがいてくれたから`: kana-only `てく` + `れた` helper spans now clear known-word, N+1, frequency, and JLPT annotation metadata when POS enrichment shows a particle + non-independent verb helper, while lexical `自立` verb forms like `くれ`/`くれる` remain eligible.
|
||||
Kept exclusion ownership cleaner: subtitle annotation exclusion remains in the shared filter, while `annotation-stage.ts` no longer carries stale duplicate subtitle-exclusion constants/helpers. Added regressions for pattern coverage including `ではないですか` splitting and no-POS grammar-ending annotation clearing.
|
||||
|
||||
Verification passed for targeted tokenizer/annotation/parser tests, Prettier check on touched files, and `bun run typecheck`. The SubMiner core verifier's `test:fast` step remains blocked by unrelated pre-existing cross-suite failures in `subsync`/renderer keyboard globals plus Bun `node:test` cascade; artifact: `.tmp/skill-verification/subminer-verify-20260502-173004-CMu3ai/`.
|
||||
Verification passed: targeted tokenizer/parser/annotation tests, Prettier check, `bun run typecheck`, `bun run test:fast`, `bun run test:env`, `bun run build`, and `bun run test:smoke:dist`. `bun run changelog:lint` is blocked by pre-existing malformed `changes/319-interjection-annotation-filter.md`; new fragment `changes/321-grammar-ending-pattern-filter.md` uses the current metadata format.
|
||||
<!-- SECTION:FINAL_SUMMARY:END -->
|
||||
|
||||
@@ -0,0 +1,63 @@
|
||||
---
|
||||
id: TASK-321
|
||||
title: Preserve word audio during manual clipboard card updates
|
||||
status: Done
|
||||
assignee:
|
||||
- '@Codex'
|
||||
created_date: '2026-05-03 06:22'
|
||||
updated_date: '2026-05-03 06:23'
|
||||
labels:
|
||||
- anki
|
||||
- mining
|
||||
dependencies: []
|
||||
priority: medium
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
<!-- SECTION:DESCRIPTION:BEGIN -->
|
||||
Manual Ctrl+Shift+C/Ctrl+V card updates on already-mined cards should refresh the sentence content and generated sentence media without removing or replacing the existing word/expression audio. The word is unchanged in this flow, so the configured word audio field must be left untouched while sentence audio remains forced-overwrite behavior from TASK-299.
|
||||
<!-- SECTION:DESCRIPTION:END -->
|
||||
|
||||
## Acceptance Criteria
|
||||
<!-- AC:BEGIN -->
|
||||
- [x] #1 Manual clipboard subtitle update replaces the resolved sentence audio field with newly generated sentence audio.
|
||||
- [x] #2 Manual clipboard subtitle update does not include the configured word/expression audio field in Anki field updates.
|
||||
- [x] #3 Animated image generation still uses the existing word audio duration for lead-in sync when configured.
|
||||
- [x] #4 A regression test covers preserving word/expression audio during manual clipboard update.
|
||||
<!-- AC:END -->
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
<!-- SECTION:PLAN:BEGIN -->
|
||||
1. Update the focused manual clipboard card update regression so generated audio is written only to the resolved sentence audio field and the configured word/expression audio field is absent from updateNoteFields payloads.
|
||||
2. Run the focused test and confirm it fails for the existing TASK-299 behavior.
|
||||
3. Change CardCreationService.updateLastAddedFromClipboard to stop merging/updating expression audio while preserving forced overwrite for sentence audio.
|
||||
4. Run the focused test; then run adjacent Anki card-creation tests if the focused gate passes.
|
||||
5. Update task acceptance criteria/final notes with verification results.
|
||||
<!-- SECTION:PLAN:END -->
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
<!-- SECTION:NOTES:BEGIN -->
|
||||
Implemented narrow manual clipboard update change in CardCreationService.updateLastAddedFromClipboard: generated audio now force-overwrites only the resolved sentence audio field and no longer writes the configured word/expression audio field. Animated AVIF lead-in still runs from the original note info before image generation, preserving existing word-audio sync behavior.
|
||||
<!-- SECTION:NOTES:END -->
|
||||
|
||||
## Final Summary
|
||||
|
||||
<!-- SECTION:FINAL_SUMMARY:BEGIN -->
|
||||
Summary:
|
||||
- Manual Ctrl+Shift+C/Ctrl+V card updates now leave the configured word/expression audio field untouched while force-replacing the resolved sentence audio field.
|
||||
- Updated the regression test to assert the Anki update payload omits ExpressionAudio and only merges SentenceAudio with forced overwrite.
|
||||
- Updated docs-site behavior notes and added a changelog fragment for the sentence-only manual audio replacement behavior.
|
||||
|
||||
Verification:
|
||||
- bun test src/anki-integration/card-creation-manual-update.test.ts src/anki-integration/card-creation.test.ts src/anki-integration/animated-image-sync.test.ts
|
||||
- bun run typecheck
|
||||
- bun run docs:test
|
||||
- bun run docs:build
|
||||
- git diff --check -- src/anki-integration/card-creation.ts src/anki-integration/card-creation-manual-update.test.ts docs-site/mining-workflow.md docs-site/anki-integration.md docs-site/configuration.md changes/322-preserve-word-audio-manual-update.md
|
||||
|
||||
Blocked gate:
|
||||
- bun run changelog:lint is blocked by pre-existing malformed changes/319-interjection-annotation-filter.md, which is outside this task's files.
|
||||
<!-- SECTION:FINAL_SUMMARY:END -->
|
||||
Reference in New Issue
Block a user