Replace grammar-ending permutations with shared matcher; preserve word a

- Extract `grammar-ending.ts` with `isStandaloneGrammarEndingText` / `isSubtitleGrammarEndingText` pattern matchers - Replace `STANDALONE_GRAMMAR_ENDINGS` set in parser-selection-stage with shared matcher - Replace generated phrase sets in subtitle-annotation-filter with shared matcher - Remove stale duplicate subtitle-exclusion constants and helpers from annotation-stage - Manual clipboard card updates now write only to the sentence audio field, leaving word/expression audio untouched
2026-05-04 00:41:33 -07:00 · 2026-05-02 23:25:33 -07:00
parent f83005bf70
commit a9625f8777
15 changed files with 285 additions and 265 deletions
@@ -5,7 +5,7 @@ status: Done
 assignee:
  - codex
 created_date: '2026-05-03 00:02'
-updated_date: '2026-05-03 00:31'
+updated_date: '2026-05-03 06:05'
 labels:
  - bug
  - tokenizer
@@ -25,6 +25,7 @@ Standalone `じゃない` grammar ending tokens should not display or persist su
 - [x] #2 Common polite/question variants such as `じゃないですか` and `ですよ` remain excluded when tokenized as a single ending token.
 - [x] #3 Regression coverage proves same-line Yomitan segments split content from trailing grammar endings so the content word can be annotated without coloring the ending.
 - [x] #4 Auxiliary-only helper spans such as `てく` + `れた` in `ベアトリスがいてくれたから` have known-word, N+1, frequency, and JLPT annotation metadata cleared.
+- [x] #5 Hard-coded grammar-ending phrase permutations are replaced by shared pattern matching, with parser selection and subtitle annotation filtering using the same grammar-ending classifier.
 <!-- AC:END -->

 ## Implementation Plan
@@ -35,6 +36,10 @@ Standalone `じゃない` grammar ending tokens should not display or persist su
 3. Patch the shared subtitle annotation filter so kana-only auxiliary helper spans made only of grammar POS components are excluded while preserving lexical content tokens.
 4. Re-run targeted tokenizer/annotation tests, then run SubMiner change verification classifier/verifier for the touched files.
 5. Update TASK-315 acceptance criteria, notes, and final summary with commands and outcomes.
+
+Replace explicit standalone grammar-ending permutations with a compact shared matcher used by parser selection and annotation filtering.
+
+Add regression tests first for non-enumerated polite copula / ja-nai variants so the matcher behavior is proven, then refactor implementation and verify targeted lanes.
 <!-- SECTION:PLAN:END -->

 ## Implementation Notes
@@ -45,14 +50,18 @@ Implemented as one focused tokenizer fix. Parser selection now splits dictionary
 2026-05-03: Reopened for approved add-on covering auxiliary-only `てく` + `れた` helper highlighting report.

 2026-05-03: Added regression coverage for `ベアトリスがいてくれたから` where Yomitan emits `てく` + `れた` and MeCab enrichment tags `てく` as `助詞|動詞` / `接続助詞|非自立`. The regression initially failed because `てく` kept `isKnown: true` and `jlptLevel: N4`. Added a shared-filter helper for kana-only particle+non-independent-verb helper spans, preserving lexical `自立` verbs. Verification: `bun test src/core/services/tokenizer/annotation-stage.test.ts`, `bun test src/core/services/tokenizer.test.ts`, `bun test src/core/services/tokenizer/parser-selection-stage.test.ts`, `bun x prettier --check ...`, and `bun run typecheck` passed. SubMiner verifier core lane passed typecheck but `bun run test:fast` failed on unrelated existing cross-suite issues: `window.electronAPI` undefined in `src/renderer/handlers/keyboard.ts` during `src/core/services/subsync.test.ts`, followed by Bun `node:test` nested-test cascade.
+
+2026-05-03: Reopened for follow-up requested by user: remove hard-coded standalone grammar-ending permutation list and lean on pattern/POS filtering where possible.
+
+2026-05-03: Added shared `grammar-ending.ts` matcher for polite copula, negative copula, and explanatory endings. Parser selection now uses the standalone-ending matcher instead of `STANDALONE_GRAMMAR_ENDINGS`. Shared subtitle filter now uses the same grammar classifier instead of generated phrase sets. Removed stale duplicate subtitle-exclusion helpers from `annotation-stage.ts`; annotation-stage continues to delegate subtitle exclusion to the shared filter. Verification passed: targeted tokenizer/parser/annotation tests, Prettier check, `bun run typecheck`, `bun run test:fast`, `bun run test:env`, `bun run build`, and `bun run test:smoke:dist`. `bun run changelog:lint` remains blocked by pre-existing malformed fragment `changes/319-interjection-annotation-filter.md`.
 <!-- SECTION:NOTES:END -->

 ## Final Summary

 <!-- SECTION:FINAL_SUMMARY:BEGIN -->
-Split dictionary-backed trailing grammar ending segments (`です`, `じゃない*`) from preceding Yomitan same-line content before annotation, and added bare `です` to the explicit polite copula exclusion set.
+Replaced grammar-ending phrase permutations with shared pattern matching. `parser-selection-stage.ts` now splits standalone grammar endings through `grammar-ending.ts` instead of `STANDALONE_GRAMMAR_ENDINGS`; `subtitle-annotation-filter.ts` uses the same classifier for polite copula, negative copula, and explanatory endings instead of generated exact phrase sets.

-Added the approved auxiliary-helper fix for `ベアトリスがいてくれたから`: kana-only `てく` + `れた` helper spans now clear known-word, N+1, frequency, and JLPT annotation metadata when POS enrichment shows a particle + non-independent verb helper, while lexical `自立` verb forms like `くれ`/`くれる` remain eligible.
+Kept exclusion ownership cleaner: subtitle annotation exclusion remains in the shared filter, while `annotation-stage.ts` no longer carries stale duplicate subtitle-exclusion constants/helpers. Added regressions for pattern coverage including `ではないですか` splitting and no-POS grammar-ending annotation clearing.

-Verification passed for targeted tokenizer/annotation/parser tests, Prettier check on touched files, and `bun run typecheck`. The SubMiner core verifier's `test:fast` step remains blocked by unrelated pre-existing cross-suite failures in `subsync`/renderer keyboard globals plus Bun `node:test` cascade; artifact: `.tmp/skill-verification/subminer-verify-20260502-173004-CMu3ai/`.
+Verification passed: targeted tokenizer/parser/annotation tests, Prettier check, `bun run typecheck`, `bun run test:fast`, `bun run test:env`, `bun run build`, and `bun run test:smoke:dist`. `bun run changelog:lint` is blocked by pre-existing malformed `changes/319-interjection-annotation-filter.md`; new fragment `changes/321-grammar-ending-pattern-filter.md` uses the current metadata format.
 <!-- SECTION:FINAL_SUMMARY:END -->
@@ -0,0 +1,63 @@
+---
+id: TASK-321
+title: Preserve word audio during manual clipboard card updates
+status: Done
+assignee:
+  - '@Codex'
+created_date: '2026-05-03 06:22'
+updated_date: '2026-05-03 06:23'
+labels:
+  - anki
+  - mining
+dependencies: []
+priority: medium
+---
+
+## Description
+
+<!-- SECTION:DESCRIPTION:BEGIN -->
+Manual Ctrl+Shift+C/Ctrl+V card updates on already-mined cards should refresh the sentence content and generated sentence media without removing or replacing the existing word/expression audio. The word is unchanged in this flow, so the configured word audio field must be left untouched while sentence audio remains forced-overwrite behavior from TASK-299.
+<!-- SECTION:DESCRIPTION:END -->
+
+## Acceptance Criteria
+<!-- AC:BEGIN -->
+- [x] #1 Manual clipboard subtitle update replaces the resolved sentence audio field with newly generated sentence audio.
+- [x] #2 Manual clipboard subtitle update does not include the configured word/expression audio field in Anki field updates.
+- [x] #3 Animated image generation still uses the existing word audio duration for lead-in sync when configured.
+- [x] #4 A regression test covers preserving word/expression audio during manual clipboard update.
+<!-- AC:END -->
+
+## Implementation Plan
+
+<!-- SECTION:PLAN:BEGIN -->
+1. Update the focused manual clipboard card update regression so generated audio is written only to the resolved sentence audio field and the configured word/expression audio field is absent from updateNoteFields payloads.
+2. Run the focused test and confirm it fails for the existing TASK-299 behavior.
+3. Change CardCreationService.updateLastAddedFromClipboard to stop merging/updating expression audio while preserving forced overwrite for sentence audio.
+4. Run the focused test; then run adjacent Anki card-creation tests if the focused gate passes.
+5. Update task acceptance criteria/final notes with verification results.
+<!-- SECTION:PLAN:END -->
+
+## Implementation Notes
+
+<!-- SECTION:NOTES:BEGIN -->
+Implemented narrow manual clipboard update change in CardCreationService.updateLastAddedFromClipboard: generated audio now force-overwrites only the resolved sentence audio field and no longer writes the configured word/expression audio field. Animated AVIF lead-in still runs from the original note info before image generation, preserving existing word-audio sync behavior.
+<!-- SECTION:NOTES:END -->
+
+## Final Summary
+
+<!-- SECTION:FINAL_SUMMARY:BEGIN -->
+Summary:
+- Manual Ctrl+Shift+C/Ctrl+V card updates now leave the configured word/expression audio field untouched while force-replacing the resolved sentence audio field.
+- Updated the regression test to assert the Anki update payload omits ExpressionAudio and only merges SentenceAudio with forced overwrite.
+- Updated docs-site behavior notes and added a changelog fragment for the sentence-only manual audio replacement behavior.
+
+Verification:
+- bun test src/anki-integration/card-creation-manual-update.test.ts src/anki-integration/card-creation.test.ts src/anki-integration/animated-image-sync.test.ts
+- bun run typecheck
+- bun run docs:test
+- bun run docs:build
+- git diff --check -- src/anki-integration/card-creation.ts src/anki-integration/card-creation-manual-update.test.ts docs-site/mining-workflow.md docs-site/anki-integration.md docs-site/configuration.md changes/322-preserve-word-audio-manual-update.md
+
+Blocked gate:
+- bun run changelog:lint is blocked by pre-existing malformed changes/319-interjection-annotation-filter.md, which is outside this task's files.
+<!-- SECTION:FINAL_SUMMARY:END -->
@@ -0,0 +1,5 @@
+type: fixed
+area: tokenizer
+
+- Tokenizer: Replaced hard-coded standalone grammar-ending permutations with shared pattern matching for polite copula, negative copula, and explanatory subtitle endings.
+- Tokenizer: Kept grammar annotation exclusion logic in the shared subtitle filter and removed stale duplicate exclusion helpers from the annotation stage.
@@ -0,0 +1,4 @@
+type: fixed
+area: anki
+
+- Anki: Manual clipboard subtitle updates now preserve existing word audio while replacing sentence audio and animated-image media.
@@ -213,7 +213,7 @@ Animated AVIF requires an AV1 encoder (`libaom-av1`, `libsvtav1`, or `librav1e`)
 }
 ```

-`overwriteAudio` applies to automatic card updates and duplicate-card enrichment. Manual clipboard subtitle updates (`Ctrl/Cmd+C`, then `Ctrl/Cmd+V`) always replace generated audio in both the expression audio field and sentence audio field.
+`overwriteAudio` applies to automatic card updates and duplicate-card enrichment. Manual clipboard subtitle updates (`Ctrl/Cmd+C`, then `Ctrl/Cmd+V`) always replace generated sentence audio, while leaving the word audio field unchanged.

 ## AI Translation

@@ -893,7 +893,7 @@ This example is intentionally compact. The option table below documents availabl
 | `media.audioPadding`                              | number (seconds)                        | Padding around audio clip timing (default: `0.5`)                                                                                                                                          |
 | `media.fallbackDuration`                          | number (seconds)                        | Default duration if timing unavailable (default: `3.0`)                                                                                                                                    |
 | `media.maxMediaDuration`                          | number (seconds)                        | Max duration for generated media from multi-line copy (default: `30`, `0` to disable)                                                                                                      |
-| `behavior.overwriteAudio`                         | `true`, `false`                         | Replace existing audio on updates; when `false`, new audio is appended/prepended per `behavior.mediaInsertMode`; manual clipboard updates always replace generated audio (default: `true`) |
+| `behavior.overwriteAudio`                         | `true`, `false`                         | Replace existing audio on updates; when `false`, new audio is appended/prepended per `behavior.mediaInsertMode`; manual clipboard updates always replace generated sentence audio (default: `true`) |
 | `behavior.overwriteImage`                         | `true`, `false`                         | Replace existing images on updates; when `false`, new images are appended/prepended per `behavior.mediaInsertMode` (default: `true`)                                                       |
 | `behavior.mediaInsertMode`                        | `"append"`, `"prepend"`                 | Where to insert new media when overwrite is off (default: `"append"`)                                                                                                                      |
 | `behavior.highlightWord`                          | `true`, `false`                         | Highlight the word in sentence context (default: `true`)                                                                                                                                   |
@@ -100,7 +100,7 @@ If you prefer a hands-on approach (animecards-style), you can copy the current s
   - For multiple lines: press `Ctrl/Cmd+Shift+C`, then a digit `1`–`9` to select how many recent subtitle lines to combine. The combined text is copied to the clipboard.
 3. Press `Ctrl/Cmd+V` to update the last-added card with the clipboard contents plus audio, image, and translation — the same fields auto-update would fill.

-Manual clipboard updates always replace generated audio in both the expression audio field and sentence audio field, even when `ankiConnect.behavior.overwriteAudio` is disabled. The manual flow assumes you are intentionally replacing the proxy-generated clip on the newest card.
+Manual clipboard updates always replace generated sentence audio, even when `ankiConnect.behavior.overwriteAudio` is disabled. The word audio field is left unchanged because the word itself does not change in this flow.

 This is useful when auto-update is disabled or when you want explicit control over which subtitle line gets attached to the card.

@@ -126,7 +126,7 @@ function createManualUpdateService(overrides: Partial<CardCreationDeps> = {}): {
  };
 }

-test('manual clipboard subtitle update replaces expression and sentence audio even when overwriteAudio is disabled', async () => {
+test('manual clipboard subtitle update replaces sentence audio without touching expression audio', async () => {
  const { service, updatedFields, mergeCalls, storedMedia } = createManualUpdateService();

  await service.updateLastAddedFromClipboard('字幕');
@@ -134,10 +134,10 @@ test('manual clipboard subtitle update replaces expression and sentence audio ev
  assert.equal(updatedFields.length, 1);
  assert.equal(storedMedia.length, 1);
  const audioValue = `[sound:${storedMedia[0]}]`;
-  assert.equal(updatedFields[0]?.ExpressionAudio, audioValue);
  assert.equal(updatedFields[0]?.SentenceAudio, audioValue);
+  assert.equal('ExpressionAudio' in updatedFields[0]!, false);
  assert.deepEqual(
    mergeCalls.map((call) => call.overwrite),
-    [true, true],
+    [true],
  );
 });
@@ -219,10 +219,6 @@ export class CardCreationService {
          this.deps.getConfig(),
        );
        const sentenceAudioField = this.getResolvedSentenceAudioFieldName(noteInfo);
-        const expressionAudioField = this.deps.resolveConfiguredFieldName(
-          noteInfo,
-          this.deps.getConfig().fields?.audio || 'ExpressionAudio',
-        );
        const sentenceField = this.deps.getEffectiveSentenceCardConfig().sentenceField;

        const sentence = blocks.join(' ');
@@ -252,22 +248,15 @@ export class CardCreationService {

            if (audioBuffer) {
              await this.deps.client.storeMediaFile(audioFilename, audioBuffer);
-              if (sentenceAudioField || expressionAudioField) {
+              if (sentenceAudioField) {
                const audioValue = `[sound:${audioFilename}]`;
-                const audioFields = new Set(
-                  [sentenceAudioField, expressionAudioField].filter(
-                    (fieldName): fieldName is string => Boolean(fieldName),
-                  ),
+                const existingAudio = noteInfo.fields[sentenceAudioField]?.value || '';
+                // Manual clipboard updates intentionally replace old captured sentence audio.
+                updatedFields[sentenceAudioField] = this.deps.mergeFieldValue(
+                  existingAudio,
+                  audioValue,
+                  true,
                );
-                for (const audioField of audioFields) {
-                  const existingAudio = noteInfo.fields[audioField]?.value || '';
-                  // Manual clipboard updates intentionally replace old captured audio.
-                  updatedFields[audioField] = this.deps.mergeFieldValue(
-                    existingAudio,
-                    audioValue,
-                    true,
-                  );
-                }
              }
              miscInfoFilename = audioFilename;
              updatePerformed = true;
@@ -301,6 +301,31 @@ test('shouldExcludeTokenFromSubtitleAnnotations excludes standalone polite copul
  }
 });

+test('shouldExcludeTokenFromSubtitleAnnotations excludes grammar-ending patterns without enumerating variants', () => {
+  const tokens = [
+    makeToken({
+      surface: 'ですわ',
+      headword: 'です',
+      reading: 'デスワ',
+      partOfSpeech: PartOfSpeech.other,
+      pos1: '',
+      pos2: '',
+    }),
+    makeToken({
+      surface: 'ではないですか',
+      headword: 'ない',
+      reading: 'デハナイデスカ',
+      partOfSpeech: PartOfSpeech.other,
+      pos1: '',
+      pos2: '',
+    }),
+  ];
+
+  for (const token of tokens) {
+    assert.equal(shouldExcludeTokenFromSubtitleAnnotations(token), true, token.surface);
+  }
+});
+
 test('shouldExcludeTokenFromSubtitleAnnotations excludes auxiliary-stem そうだ grammar tails', () => {
  const token = makeToken({
    surface: 'そうだ',
@@ -18,57 +18,6 @@ const KATAKANA_TO_HIRAGANA_OFFSET = 0x60;
 const KATAKANA_CODEPOINT_START = 0x30a1;
 const KATAKANA_CODEPOINT_END = 0x30f6;
 const JLPT_LEVEL_LOOKUP_CACHE_LIMIT = 2048;
-const SUBTITLE_ANNOTATION_EXCLUDED_TERMS = new Set([
-  'ああ',
-  'ええ',
-  'うう',
-  'おお',
-  'はあ',
-  'はは',
-  'へえ',
-  'ふう',
-  'ほう',
-]);
-const SUBTITLE_ANNOTATION_EXCLUDED_EXPLANATORY_ENDING_PREFIXES = ['ん', 'の', 'なん', 'なの'];
-const SUBTITLE_ANNOTATION_EXCLUDED_EXPLANATORY_ENDING_CORES = [
-  'だ',
-  'です',
-  'でした',
-  'だった',
-  'では',
-  'じゃ',
-  'でしょう',
-  'だろう',
-] as const;
-const SUBTITLE_ANNOTATION_EXCLUDED_EXPLANATORY_ENDING_TRAILING_PARTICLES = [
-  '',
-  'か',
-  'ね',
-  'よ',
-  'な',
-  'けど',
-  'よね',
-  'かな',
-  'かね',
-] as const;
-const SUBTITLE_ANNOTATION_EXCLUDED_EXPLANATORY_ENDINGS = new Set(
-  SUBTITLE_ANNOTATION_EXCLUDED_EXPLANATORY_ENDING_PREFIXES.flatMap((prefix) =>
-    SUBTITLE_ANNOTATION_EXCLUDED_EXPLANATORY_ENDING_CORES.flatMap((core) =>
-      SUBTITLE_ANNOTATION_EXCLUDED_EXPLANATORY_ENDING_TRAILING_PARTICLES.map(
-        (particle) => `${prefix}${core}${particle}`,
-      ),
-    ),
-  ),
-);
-const SUBTITLE_ANNOTATION_EXCLUDED_TRAILING_PARTICLE_SUFFIXES = new Set([
-  'って',
-  'ってよ',
-  'ってね',
-  'ってな',
-  'ってさ',
-  'ってか',
-  'ってば',
-]);

 const jlptLevelLookupCaches = new WeakMap<
  (text: string) => JlptLevel | null,
@@ -104,10 +53,6 @@ function normalizePos1Tag(pos1: string | undefined): string {
  return typeof pos1 === 'string' ? pos1.trim() : '';
 }

-const SUBTITLE_ANNOTATION_EXCLUDED_POS1 = new Set(['感動詞']);
-const SUBTITLE_ANNOTATION_GRAMMAR_ONLY_POS1 = new Set(['助詞', '助動詞', '連体詞']);
-const AUXILIARY_STEM_GRAMMAR_TAIL_POS1 = new Set(['名詞', '助動詞', '助詞']);
-
 function splitNormalizedTagParts(normalizedTag: string): string[] {
  if (!normalizedTag) {
    return [];
@@ -129,57 +74,6 @@ function isExcludedByTagSet(normalizedTag: string, exclusions: ReadonlySet<strin
  return parts.some((part) => exclusions.has(part));
 }

-function isExcludedFromSubtitleAnnotationsByPos1(normalizedPos1: string): boolean {
-  const parts = splitNormalizedTagParts(normalizedPos1);
-  if (parts.some((part) => SUBTITLE_ANNOTATION_EXCLUDED_POS1.has(part))) {
-    return true;
-  }
-
-  return parts.length > 0 && parts.every((part) => SUBTITLE_ANNOTATION_GRAMMAR_ONLY_POS1.has(part));
-}
-
-function isExcludedTrailingParticleMergedToken(token: MergedToken): boolean {
-  const normalizedSurface = normalizeJlptTextForExclusion(token.surface);
-  const normalizedHeadword = normalizeJlptTextForExclusion(token.headword);
-  if (
-    !normalizedSurface ||
-    !normalizedHeadword ||
-    !normalizedSurface.startsWith(normalizedHeadword)
-  ) {
-    return false;
-  }
-
-  const suffix = normalizedSurface.slice(normalizedHeadword.length);
-  if (!SUBTITLE_ANNOTATION_EXCLUDED_TRAILING_PARTICLE_SUFFIXES.has(suffix)) {
-    return false;
-  }
-
-  const pos1Parts = splitNormalizedTagParts(normalizePos1Tag(token.pos1));
-  if (pos1Parts.length < 2) {
-    return false;
-  }
-
-  const [leadingPos1, ...trailingPos1] = pos1Parts;
-  if (!leadingPos1 || SUBTITLE_ANNOTATION_GRAMMAR_ONLY_POS1.has(leadingPos1)) {
-    return false;
-  }
-
-  return trailingPos1.length > 0 && trailingPos1.every((part) => part === '助詞');
-}
-
-function isAuxiliaryStemGrammarTailToken(token: MergedToken): boolean {
-  const pos1Parts = splitNormalizedTagParts(normalizePos1Tag(token.pos1));
-  if (
-    pos1Parts.length === 0 ||
-    !pos1Parts.every((part) => AUXILIARY_STEM_GRAMMAR_TAIL_POS1.has(part))
-  ) {
-    return false;
-  }
-
-  const pos3Parts = splitNormalizedTagParts(normalizePos2Tag(token.pos3));
-  return pos3Parts.includes('助動詞語幹');
-}
-
 function resolvePos1Exclusions(options: AnnotationStageOptions): ReadonlySet<string> {
  if (options.pos1Exclusions) {
    return options.pos1Exclusions;
@@ -609,44 +503,6 @@ function isJlptEligibleToken(token: MergedToken): boolean {
  return true;
 }

-function isExcludedFromSubtitleAnnotationsByTerm(token: MergedToken): boolean {
-  const candidates = [token.surface, token.reading, resolveJlptLookupText(token)].filter(
-    (candidate): candidate is string => typeof candidate === 'string' && candidate.length > 0,
-  );
-
-  for (const candidate of candidates) {
-    const trimmedCandidate = candidate.trim();
-    if (!trimmedCandidate) {
-      continue;
-    }
-
-    const normalizedCandidate = normalizeJlptTextForExclusion(trimmedCandidate);
-    if (!normalizedCandidate) {
-      continue;
-    }
-
-    if (
-      SUBTITLE_ANNOTATION_EXCLUDED_TERMS.has(trimmedCandidate) ||
-      SUBTITLE_ANNOTATION_EXCLUDED_TERMS.has(normalizedCandidate) ||
-      SUBTITLE_ANNOTATION_EXCLUDED_EXPLANATORY_ENDINGS.has(trimmedCandidate) ||
-      SUBTITLE_ANNOTATION_EXCLUDED_EXPLANATORY_ENDINGS.has(normalizedCandidate)
-    ) {
-      return true;
-    }
-
-    if (
-      isTrailingSmallTsuKanaSfx(trimmedCandidate) ||
-      isTrailingSmallTsuKanaSfx(normalizedCandidate) ||
-      isReduplicatedKanaSfxWithOptionalTrailingTo(trimmedCandidate) ||
-      isReduplicatedKanaSfxWithOptionalTrailingTo(normalizedCandidate)
-    ) {
-      return true;
-    }
-  }
-
-  return false;
-}
-
 export function shouldExcludeTokenFromSubtitleAnnotations(token: MergedToken): boolean {
  return sharedShouldExcludeTokenFromSubtitleAnnotations(token);
 }
@@ -771,9 +627,7 @@ export function annotateTokens(
      });
      return {
        ...strippedToken,
-        isKnown:
-          nPlusOneEnabled &&
-          computeExcludedTokenKnownStatus(token, deps.isKnownWord),
+        isKnown: nPlusOneEnabled && computeExcludedTokenKnownStatus(token, deps.isKnownWord),
      };
    }

@@ -0,0 +1,124 @@
+const KATAKANA_TO_HIRAGANA_OFFSET = 0x60;
+const KATAKANA_CODEPOINT_START = 0x30a1;
+const KATAKANA_CODEPOINT_END = 0x30f6;
+
+const SENTENCE_FINAL_PARTICLE_SUFFIXES = ['', 'か', 'ね', 'よ', 'な', 'わ'] as const;
+const EXPLANATORY_ENDING_PREFIXES = ['ん', 'の', 'なん', 'なの'] as const;
+const EXPLANATORY_ENDING_CORES = [
+  'だ',
+  'です',
+  'でした',
+  'だった',
+  'では',
+  'じゃ',
+  'でしょう',
+  'だろう',
+] as const;
+const EXPLANATORY_ENDING_TRAILING_PARTICLES = [
+  '',
+  'か',
+  'ね',
+  'よ',
+  'な',
+  'けど',
+  'よね',
+  'かな',
+  'かね',
+] as const;
+const EXPLANATORY_ENDING_THOUGHT_SUFFIXES = ['か', 'かな', 'かね'] as const;
+const NEGATIVE_COPULA_PREFIXES = ['じゃ', 'では'] as const;
+
+export function normalizeGrammarEndingText(text: string): string {
+  const raw = text.trim();
+  if (!raw) {
+    return '';
+  }
+
+  let normalized = '';
+  for (const char of raw) {
+    const code = char.codePointAt(0);
+    if (code === undefined) {
+      continue;
+    }
+
+    if (code >= KATAKANA_CODEPOINT_START && code <= KATAKANA_CODEPOINT_END) {
+      normalized += String.fromCodePoint(code - KATAKANA_TO_HIRAGANA_OFFSET);
+      continue;
+    }
+
+    normalized += char;
+  }
+
+  return normalized;
+}
+
+function matchesSuffix(text: string, suffixes: readonly string[]): boolean {
+  return suffixes.some((suffix) => text === suffix);
+}
+
+function matchesPoliteCopulaEnding(text: string): boolean {
+  if (!text.startsWith('です')) {
+    return false;
+  }
+
+  return matchesSuffix(text.slice('です'.length), SENTENCE_FINAL_PARTICLE_SUFFIXES);
+}
+
+function matchesNegativeCopulaEnding(text: string): boolean {
+  for (const prefix of NEGATIVE_COPULA_PREFIXES) {
+    const negativeStem = `${prefix}ない`;
+    if (!text.startsWith(negativeStem)) {
+      continue;
+    }
+
+    const suffix = text.slice(negativeStem.length);
+    return (
+      matchesSuffix(suffix, SENTENCE_FINAL_PARTICLE_SUFFIXES) || matchesPoliteCopulaEnding(suffix)
+    );
+  }
+
+  return false;
+}
+
+function matchesExplanatoryEnding(text: string): boolean {
+  for (const prefix of EXPLANATORY_ENDING_PREFIXES) {
+    if (EXPLANATORY_ENDING_THOUGHT_SUFFIXES.some((suffix) => text === `${prefix}${suffix}`)) {
+      return true;
+    }
+
+    if (!text.startsWith(prefix)) {
+      continue;
+    }
+
+    const suffix = text.slice(prefix.length);
+    for (const core of EXPLANATORY_ENDING_CORES) {
+      if (!suffix.startsWith(core)) {
+        continue;
+      }
+
+      if (matchesSuffix(suffix.slice(core.length), EXPLANATORY_ENDING_TRAILING_PARTICLES)) {
+        return true;
+      }
+    }
+  }
+
+  return false;
+}
+
+export function isStandaloneGrammarEndingText(text: string): boolean {
+  const normalized = normalizeGrammarEndingText(text);
+  if (!normalized) {
+    return false;
+  }
+
+  return matchesPoliteCopulaEnding(normalized) || matchesNegativeCopulaEnding(normalized);
+}
+
+export function isSubtitleGrammarEndingText(text: string): boolean {
+  const normalized = normalizeGrammarEndingText(text);
+  if (!normalized) {
+    return false;
+  }
+
+  return isStandaloneGrammarEndingText(normalized) || matchesExplanatoryEnding(normalized);
+}
@@ -219,6 +219,38 @@ test('splits trailing ja-nai grammar endings from preceding content', () => {
  );
 });

+test('splits trailing negative-copula grammar endings by pattern', () => {
+  const parseResults = [
+    makeParseItem('scanning-parser', [
+      [
+        { text: '問題', reading: 'もんだい', headword: '問題' },
+        { text: 'ではないですか', reading: 'ではないですか', headword: 'ない' },
+      ],
+    ]),
+  ];
+
+  const tokens = selectYomitanParseTokens(parseResults, () => false, 'headword');
+  assert.deepEqual(
+    tokens?.map((token) => ({
+      surface: token.surface,
+      reading: token.reading,
+      headword: token.headword,
+    })),
+    [
+      {
+        surface: '問題',
+        reading: 'もんだい',
+        headword: '問題',
+      },
+      {
+        surface: 'ではないですか',
+        reading: 'ではないですか',
+        headword: 'ない',
+      },
+    ],
+  );
+});
+
 test('merges trailing katakana continuation without headword into previous token', () => {
  const parseResults = [
    makeParseItem('scanning-parser', [
@@ -1,4 +1,5 @@
 import { MergedToken, NPlusOneMatchMode, PartOfSpeech } from '../../../types';
+import { isStandaloneGrammarEndingText } from './grammar-ending';

 interface YomitanParseHeadword {
  term?: unknown;
@@ -24,24 +25,6 @@ export interface YomitanParseCandidate {
  tokens: MergedToken[];
 }

-const STANDALONE_GRAMMAR_ENDINGS = new Set([
-  'です',
-  'ですか',
-  'ですね',
-  'ですよ',
-  'ですな',
-  'じゃない',
-  'じゃないか',
-  'じゃないね',
-  'じゃないよ',
-  'じゃないな',
-  'じゃないです',
-  'じゃないですか',
-  'じゃないですね',
-  'じゃないですよ',
-  'じゃないですな',
-]);
-
 function isObject(value: unknown): value is Record<string, unknown> {
  return Boolean(value && typeof value === 'object');
 }
@@ -164,7 +147,7 @@ function isStandaloneGrammarEndingSegment(segment: YomitanParseSegment): boolean
  const headword = extractYomitanHeadword(segment).trim();
  return (
    headword.length > 0 &&
-    (STANDALONE_GRAMMAR_ENDINGS.has(surface) || STANDALONE_GRAMMAR_ENDINGS.has(headword))
+    (isStandaloneGrammarEndingText(surface) || isStandaloneGrammarEndingText(headword))
  );
 }

@@ -8,6 +8,7 @@ import {
 } from '../../../token-pos2-exclusions';
 import { MergedToken, PartOfSpeech } from '../../../types';
 import { shouldIgnoreJlptByTerm } from '../jlpt-token-filter';
+import { isSubtitleGrammarEndingText } from './grammar-ending';

 const KATAKANA_TO_HIRAGANA_OFFSET = 0x60;
 const KATAKANA_CODEPOINT_START = 0x30a1;
@@ -58,61 +59,6 @@ export const SUBTITLE_ANNOTATION_EXCLUDED_TERMS = new Set([
  'ものか',
  ...STANDALONE_GRAMMAR_PARTICLE_PHRASES,
 ]);
-const SUBTITLE_ANNOTATION_EXCLUDED_EXPLANATORY_ENDING_PREFIXES = ['ん', 'の', 'なん', 'なの'];
-const SUBTITLE_ANNOTATION_EXCLUDED_EXPLANATORY_ENDING_CORES = [
-  'だ',
-  'です',
-  'でした',
-  'だった',
-  'では',
-  'じゃ',
-  'でしょう',
-  'だろう',
-] as const;
-const SUBTITLE_ANNOTATION_EXCLUDED_EXPLANATORY_ENDING_TRAILING_PARTICLES = [
-  '',
-  'か',
-  'ね',
-  'よ',
-  'な',
-  'けど',
-  'よね',
-  'かな',
-  'かね',
-] as const;
-const SUBTITLE_ANNOTATION_EXCLUDED_EXPLANATORY_ENDING_THOUGHT_SUFFIXES = [
-  'か',
-  'かな',
-  'かね',
-] as const;
-const SUBTITLE_ANNOTATION_EXCLUDED_POLITE_COPULA_SUFFIXES = ['', 'か', 'ね', 'よ', 'な'] as const;
-const SUBTITLE_ANNOTATION_EXCLUDED_JA_NAI_SUFFIXES = [
-  '',
-  'か',
-  'ね',
-  'よ',
-  'な',
-  'です',
-  'ですか',
-  'ですよ',
-  'ですね',
-  'ですな',
-] as const;
-const SUBTITLE_ANNOTATION_EXCLUDED_EXPLANATORY_ENDINGS = new Set(
-  SUBTITLE_ANNOTATION_EXCLUDED_EXPLANATORY_ENDING_PREFIXES.flatMap((prefix) =>
-    SUBTITLE_ANNOTATION_EXCLUDED_EXPLANATORY_ENDING_CORES.flatMap((core) =>
-      SUBTITLE_ANNOTATION_EXCLUDED_EXPLANATORY_ENDING_TRAILING_PARTICLES.map(
-        (particle) => `${prefix}${core}${particle}`,
-      ),
-    ),
-  ),
-);
-const SUBTITLE_ANNOTATION_EXCLUDED_POLITE_COPULA_ENDINGS = new Set(
-  SUBTITLE_ANNOTATION_EXCLUDED_POLITE_COPULA_SUFFIXES.map((suffix) => `です${suffix}`),
-);
-const SUBTITLE_ANNOTATION_EXCLUDED_JA_NAI_ENDINGS = new Set(
-  SUBTITLE_ANNOTATION_EXCLUDED_JA_NAI_SUFFIXES.map((suffix) => `じゃない${suffix}`),
-);
 const SUBTITLE_ANNOTATION_EXCLUDED_TRAILING_PARTICLE_SUFFIXES = new Set([
  'って',
  'ってよ',
@@ -460,25 +406,11 @@ function isExcludedByTerm(token: MergedToken): boolean {
      continue;
    }

-    if (
-      SUBTITLE_ANNOTATION_EXCLUDED_EXPLANATORY_ENDING_PREFIXES.some((prefix) =>
-        SUBTITLE_ANNOTATION_EXCLUDED_EXPLANATORY_ENDING_THOUGHT_SUFFIXES.some(
-          (suffix) => normalized === `${prefix}${suffix}`,
-        ),
-      )
-    ) {
-      return true;
-    }
-
    if (
      SUBTITLE_ANNOTATION_EXCLUDED_TERMS.has(trimmed) ||
      SUBTITLE_ANNOTATION_EXCLUDED_TERMS.has(normalized) ||
-      SUBTITLE_ANNOTATION_EXCLUDED_EXPLANATORY_ENDINGS.has(trimmed) ||
-      SUBTITLE_ANNOTATION_EXCLUDED_EXPLANATORY_ENDINGS.has(normalized) ||
-      SUBTITLE_ANNOTATION_EXCLUDED_POLITE_COPULA_ENDINGS.has(trimmed) ||
-      SUBTITLE_ANNOTATION_EXCLUDED_POLITE_COPULA_ENDINGS.has(normalized) ||
-      SUBTITLE_ANNOTATION_EXCLUDED_JA_NAI_ENDINGS.has(trimmed) ||
-      SUBTITLE_ANNOTATION_EXCLUDED_JA_NAI_ENDINGS.has(normalized) ||
+      isSubtitleGrammarEndingText(trimmed) ||
+      isSubtitleGrammarEndingText(normalized) ||
      shouldIgnoreJlptByTerm(trimmed) ||
      shouldIgnoreJlptByTerm(normalized)
    ) {