mirror of
https://github.com/ksyasuda/SubMiner.git
synced 2026-03-24 00:11:27 -07:00
* fix: harden preload argv parsing for popup windows * fix: align youtube playback with shared overlay startup * fix: unwrap mpv youtube streams for anki media mining * docs: update docs for youtube subtitle and mining flow * refactor: unify cli and runtime wiring for startup and youtube flow * feat: update subtitle sidebar overlay behavior * chore: add shared log-file source for diagnostics * fix(ci): add changelog fragment for immersion changes * fix: address CodeRabbit review feedback * fix: persist canonical title from youtube metadata * style: format stats library tab * fix: address latest review feedback * style: format stats library files * test: stub launcher youtube deps in CI * test: isolate launcher youtube flow deps * test: stub launcher youtube deps in failing case * test: force x11 backend in launcher ci harness * test: address latest review feedback * fix(launcher): preserve user YouTube ytdl raw options * docs(backlog): update task tracking notes * fix(immersion): special-case youtube media paths in runtime and tracking * feat(stats): improve YouTube media metadata and picker key handling * fix(ci): format stats media library hook * fix: address latest CodeRabbit review items * docs: update youtube release notes and docs * feat: auto-load youtube subtitles before manual picker * fix: restore app-owned youtube subtitle flow * docs: update youtube playback docs and config copy * refactor: remove legacy youtube launcher mode plumbing * fix: refine youtube subtitle startup binding * docs: clarify youtube subtitle startup behavior * fix: address PR #31 latest review follow-ups * fix: address PR #31 follow-up review comments * test: harden youtube picker test harness * udpate backlog * fix: add timeout to youtube metadata probe * docs: refresh youtube and stats docs * update backlog * update backlog * chore: release v0.9.0
61 lines
3.8 KiB
Markdown
61 lines
3.8 KiB
Markdown
---
|
|
id: TASK-209
|
|
title: Exclude grammar-tail そうだ from subtitle annotations
|
|
status: Done
|
|
assignee:
|
|
- codex
|
|
created_date: '2026-03-20 04:06'
|
|
updated_date: '2026-03-23 03:22'
|
|
labels:
|
|
- bug
|
|
- tokenizer
|
|
dependencies: []
|
|
references:
|
|
- >-
|
|
/Users/sudacode/projects/japanese/SubMiner/src/core/services/tokenizer/annotation-stage.ts
|
|
- >-
|
|
/Users/sudacode/projects/japanese/SubMiner/src/core/services/tokenizer/annotation-stage.test.ts
|
|
- >-
|
|
/Users/sudacode/projects/japanese/SubMiner/src/core/services/tokenizer.test.ts
|
|
priority: high
|
|
ordinal: 120500
|
|
---
|
|
|
|
## Description
|
|
|
|
<!-- SECTION:DESCRIPTION:BEGIN -->
|
|
Sentence-final grammar-tail `そうだ` tokens can still receive subtitle annotation styling, including frequency highlighting, when Yomitan returns a standalone `そうだ` token and MeCab enriches it as an auxiliary-stem/coupla pattern (`名詞|助動詞`, `助動詞語幹`). Keep the subtitle text visible, but treat this grammar tail like other grammar-only endings so it renders without annotation metadata.
|
|
<!-- SECTION:DESCRIPTION:END -->
|
|
|
|
## Acceptance Criteria
|
|
<!-- AC:BEGIN -->
|
|
- [x] #1 Sentence-final grammar-tail `そうだ` tokens enriched as auxiliary-stem/copula patterns do not receive frequency highlighting or other subtitle annotation metadata.
|
|
- [x] #2 The preceding lexical token in cases like `与えるそうだ` keeps its existing annotation behavior.
|
|
- [x] #3 Regression tests cover the annotation-stage exclusion and end-to-end subtitle tokenization for the `そうだ` grammar-tail case.
|
|
<!-- AC:END -->
|
|
|
|
## Implementation Plan
|
|
|
|
<!-- SECTION:PLAN:BEGIN -->
|
|
1. Add focused regression coverage for the reported `与えるそうだ` case at both annotation-stage and tokenizeSubtitle levels.
|
|
2. Reproduce failure by modeling the MeCab-enriched grammar-tail shape (`名詞|助動詞`, `特殊`, `助動詞語幹`) that currently keeps frequency metadata.
|
|
3. Update subtitle-annotation exclusion logic to recognize auxiliary-stem/copula grammar tails via POS metadata plus normalized tail text, not a raw sentence-specific string match.
|
|
4. Re-run targeted tokenizer and annotation-stage tests, then record the verification commands and outcome in the task notes.
|
|
<!-- SECTION:PLAN:END -->
|
|
|
|
## Implementation Notes
|
|
|
|
<!-- SECTION:NOTES:BEGIN -->
|
|
Investigated reported `与えるそうだ` case. MeCab tags `そう` as `名詞,特殊,助動詞語幹` and `だ` as `助動詞`; after overlap enrichment the Yomitan token becomes `pos1=名詞|助動詞`, `pos2=特殊`, `pos3=助動詞語幹`, which currently escapes subtitle-annotation exclusion and can keep a frequency rank.
|
|
|
|
Implemented a POS-shape subtitle-annotation exclusion for MeCab-enriched auxiliary-stem grammar tails. The new predicate keys off merged tokens whose POS tags stay within `名詞/助動詞/助詞` and whose POS3 includes `助動詞語幹`, which clears annotation metadata for `そうだ`-style tails without hard-coding the full subtitle text.
|
|
|
|
Verification: `bun test src/core/services/tokenizer/annotation-stage.test.ts`, `bun test src/core/services/tokenizer.test.ts --test-name-pattern 'explanatory ending|interjection|single-kana merged tokens from frequency highlighting|auxiliary-stem そうだ grammar tails|composite function/content token from frequency highlighting|keeps frequency for content-led merged token with trailing colloquial suffixes'`
|
|
<!-- SECTION:NOTES:END -->
|
|
|
|
## Final Summary
|
|
|
|
<!-- SECTION:FINAL_SUMMARY:BEGIN -->
|
|
Added regression coverage for `与えるそうだ` and updated subtitle annotation exclusion logic to drop annotation metadata for MeCab-enriched auxiliary-stem grammar tails. The fix is POS-driven rather than sentence-specific, so `そうだ`-style grammar endings stay visible/hoverable as plain text while neighboring lexical tokens keep their existing frequency/JLPT behavior.
|
|
<!-- SECTION:FINAL_SUMMARY:END -->
|