mirror of
https://github.com/ksyasuda/SubMiner.git
synced 2026-02-27 18:22:41 -08:00
35 lines
1.7 KiB
Markdown
35 lines
1.7 KiB
Markdown
# Agent: `codex-narrow-space-tokenizer-20260220T061716Z-p97s`
|
|
|
|
- alias: `codex-narrow-space-tokenizer`
|
|
- mission: `Fix narrow/invisible subtitle spacing causing incorrect tokenizer boundaries.`
|
|
- status: `done`
|
|
- branch: `main`
|
|
- started_at: `2026-02-20T06:17:31Z`
|
|
- heartbeat_minutes: `5`
|
|
|
|
## Current Work (newest first)
|
|
- [2026-02-20T06:20:07Z] handoff: normalized invisible separators in tokenizer input; added regression test; targeted tests green.
|
|
- [2026-02-20T06:19:20Z] test: `bun run build && node --test dist/subtitle/stages/normalize.test.js` (pass, 1/1); `node --test dist/core/services/tokenizer.test.js` (pass, 43/43).
|
|
- [2026-02-20T06:18:38Z] edit: updated `normalizeTokenizerInput` to map `U+200B/U+2060/U+FEFF` to regular spaces before whitespace collapsing.
|
|
- [2026-02-20T06:18:02Z] test: added failing regression for subtitle sample with `\u200B` separator.
|
|
- [2026-02-20T06:17:31Z] intent: create TASK-90; TDD-first regression for narrow Unicode spacing in subtitle line `キリキリと かかってこい`.
|
|
- [2026-02-20T06:17:31Z] progress: coordination started; index row added; scanning tokenizer normalization points next.
|
|
|
|
## Files Touched
|
|
- `docs/subagents/INDEX.md`
|
|
- `docs/subagents/agents/codex-narrow-space-tokenizer-20260220T061716Z-p97s.md`
|
|
- `docs/subagents/collaboration.md`
|
|
- `backlog/tasks/task-90 - Normalize-narrow-Unicode-whitespace-in-tokenizer-input.md`
|
|
- `src/subtitle/stages/normalize.ts`
|
|
- `src/subtitle/stages/normalize.test.ts`
|
|
|
|
## Assumptions
|
|
- issue likely Unicode spacing code point treated as token boundary.
|
|
- target behavior: collapse/normalize narrow spacing to standard spacing before lookup token grouping.
|
|
|
|
## Open Questions / Blockers
|
|
- possible overlap with TASK-85 refactor touching tokenizer paths.
|
|
|
|
## Next Step
|
|
- done.
|