feat(subtitles): add line-break display toggle and narrow-space normalization

This commit is contained in:
2026-02-19 22:50:27 -08:00
parent bc75a0cfbd
commit 8c2d82e361
18 changed files with 344 additions and 6 deletions

View File

@@ -0,0 +1,36 @@
---
id: TASK-90
title: Normalize narrow Unicode whitespace in tokenizer input
status: Done
assignee: []
created_date: '2026-02-20 06:17'
updated_date: '2026-02-20 06:20'
labels: []
dependencies: []
priority: medium
---
## Description
<!-- SECTION:DESCRIPTION:BEGIN -->
Fix tokenizer behavior where subtitle lines containing narrow/invisible Unicode spacing between Japanese segments can be split/grouped incorrectly compared with normal space handling.
<!-- SECTION:DESCRIPTION:END -->
## Acceptance Criteria
<!-- AC:BEGIN -->
- [x] #1 A regression test reproduces the subtitle sample containing narrow/invisible Unicode spacing and fails before fix.
- [x] #2 Tokenizer normalization treats narrow/invisible spacing variants consistently with regular spacing for grouping/highlight behavior.
- [x] #3 Existing tokenizer tests still pass.
<!-- AC:END -->
## Implementation Notes
<!-- SECTION:NOTES:BEGIN -->
Linked from subagent session `codex-narrow-space-tokenizer-20260220T061716Z-p97s`.
Added `src/subtitle/stages/normalize.test.ts` regression for `\u200B` separator in subtitle sample and updated `normalizeTokenizerInput` to map `U+200B/U+2060/U+FEFF` to regular spaces before whitespace collapsing.
Validation:
- `bun run build && node --test dist/subtitle/stages/normalize.test.js`
- `node --test dist/core/services/tokenizer.test.js`
<!-- SECTION:NOTES:END -->

View File

@@ -0,0 +1,36 @@
---
id: TASK-91
title: Add config toggle to preserve visible overlay subtitle line breaks
status: Done
assignee: []
created_date: '2026-02-20 06:35'
updated_date: '2026-02-20 06:42'
labels: []
dependencies: []
priority: medium
---
## Description
<!-- SECTION:DESCRIPTION:BEGIN -->
Add a `subtitleStyle` config option that keeps visible-overlay subtitle line breaks (newline/carriage-return normalized to line breaks) instead of flattening them to spaces. Default should preserve current behavior for consistency with texthooker.
<!-- SECTION:DESCRIPTION:END -->
## Acceptance Criteria
<!-- AC:BEGIN -->
- [x] #1 New config option exists with default disabled and validation/docs coverage.
- [x] #2 When enabled, visible overlay preserves subtitle line breaks while rendering tokenized subtitles.
- [x] #3 When disabled, current rendering behavior remains unchanged.
- [x] #4 Relevant config + renderer tests pass.
<!-- AC:END -->
## Implementation Notes
<!-- SECTION:NOTES:BEGIN -->
Added `subtitleStyle.preserveLineBreaks` (default `false`) to types/default config/registry/config validation and docs/example configs.
Renderer now supports line-break-preserving token output via `alignTokensToSourceText` in `src/renderer/subtitle-render.ts`, which inserts source-text separators (including `\n`) between token spans when enabled.
Validation:
- `bun run build && node --test dist/config/config.test.js dist/renderer/subtitle-render.test.js`
<!-- SECTION:NOTES:END -->