mirror of
https://github.com/ksyasuda/SubMiner.git
synced 2026-02-27 18:22:41 -08:00
feat(subtitles): add line-break display toggle and narrow-space normalization
This commit is contained in:
@@ -0,0 +1,36 @@
|
||||
---
|
||||
id: TASK-90
|
||||
title: Normalize narrow Unicode whitespace in tokenizer input
|
||||
status: Done
|
||||
assignee: []
|
||||
created_date: '2026-02-20 06:17'
|
||||
updated_date: '2026-02-20 06:20'
|
||||
labels: []
|
||||
dependencies: []
|
||||
priority: medium
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
<!-- SECTION:DESCRIPTION:BEGIN -->
|
||||
Fix tokenizer behavior where subtitle lines containing narrow/invisible Unicode spacing between Japanese segments can be split/grouped incorrectly compared with normal space handling.
|
||||
<!-- SECTION:DESCRIPTION:END -->
|
||||
|
||||
## Acceptance Criteria
|
||||
<!-- AC:BEGIN -->
|
||||
- [x] #1 A regression test reproduces the subtitle sample containing narrow/invisible Unicode spacing and fails before fix.
|
||||
- [x] #2 Tokenizer normalization treats narrow/invisible spacing variants consistently with regular spacing for grouping/highlight behavior.
|
||||
- [x] #3 Existing tokenizer tests still pass.
|
||||
<!-- AC:END -->
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
<!-- SECTION:NOTES:BEGIN -->
|
||||
Linked from subagent session `codex-narrow-space-tokenizer-20260220T061716Z-p97s`.
|
||||
|
||||
Added `src/subtitle/stages/normalize.test.ts` regression for `\u200B` separator in subtitle sample and updated `normalizeTokenizerInput` to map `U+200B/U+2060/U+FEFF` to regular spaces before whitespace collapsing.
|
||||
|
||||
Validation:
|
||||
- `bun run build && node --test dist/subtitle/stages/normalize.test.js`
|
||||
- `node --test dist/core/services/tokenizer.test.js`
|
||||
<!-- SECTION:NOTES:END -->
|
||||
@@ -0,0 +1,36 @@
|
||||
---
|
||||
id: TASK-91
|
||||
title: Add config toggle to preserve visible overlay subtitle line breaks
|
||||
status: Done
|
||||
assignee: []
|
||||
created_date: '2026-02-20 06:35'
|
||||
updated_date: '2026-02-20 06:42'
|
||||
labels: []
|
||||
dependencies: []
|
||||
priority: medium
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
<!-- SECTION:DESCRIPTION:BEGIN -->
|
||||
Add a `subtitleStyle` config option that keeps visible-overlay subtitle line breaks (newline/carriage-return normalized to line breaks) instead of flattening them to spaces. Default should preserve current behavior for consistency with texthooker.
|
||||
<!-- SECTION:DESCRIPTION:END -->
|
||||
|
||||
## Acceptance Criteria
|
||||
<!-- AC:BEGIN -->
|
||||
- [x] #1 New config option exists with default disabled and validation/docs coverage.
|
||||
- [x] #2 When enabled, visible overlay preserves subtitle line breaks while rendering tokenized subtitles.
|
||||
- [x] #3 When disabled, current rendering behavior remains unchanged.
|
||||
- [x] #4 Relevant config + renderer tests pass.
|
||||
<!-- AC:END -->
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
<!-- SECTION:NOTES:BEGIN -->
|
||||
Added `subtitleStyle.preserveLineBreaks` (default `false`) to types/default config/registry/config validation and docs/example configs.
|
||||
|
||||
Renderer now supports line-break-preserving token output via `alignTokensToSourceText` in `src/renderer/subtitle-render.ts`, which inserts source-text separators (including `\n`) between token spans when enabled.
|
||||
|
||||
Validation:
|
||||
- `bun run build && node --test dist/config/config.test.js dist/renderer/subtitle-render.test.js`
|
||||
<!-- SECTION:NOTES:END -->
|
||||
Reference in New Issue
Block a user