Files
SubMiner/backlog/tasks/task-41 - Add-real-time-sentence-difficulty-scoring-with-auto-pause-on-hard-lines.md
2026-02-17 22:54:09 -08:00

2.6 KiB

id, title, status, assignee, created_date, labels, dependencies, priority
id title status assignee created_date labels dependencies priority
TASK-41 Add real-time sentence difficulty scoring with auto-pause on hard lines To Do
2026-02-14 02:09
feature
nlp
immersion
subtitle
TASK-23
TASK-25
medium

Description

Compute a real-time difficulty score for each subtitle line using JLPT level data (TASK-23) and frequency dictionary data (TASK-25), and use this score to drive smart playback features.

Motivation

Learners at different levels have different needs. N4 learners want to pause on N2+ lines; advanced learners want to skip easy content. A per-line difficulty score enables intelligent playback that adapts to the learner's level.

Features

  1. Per-line difficulty score: Combine JLPT levels and frequency ranks of tokens to produce a composite difficulty score (e.g., 1-5 scale or JLPT-equivalent label)
  2. Visual difficulty indicator: Subtle color/icon on each subtitle line indicating difficulty
  3. Auto-pause on difficult lines: Configurable threshold — pause playback when a line exceeds the user's set difficulty level
  4. Per-episode difficulty rating: Average difficulty across all lines, shown in the episode browser (TASK-34)
  5. Difficulty trend within a video: Show whether difficulty increases/decreases over the episode (useful for detecting climax scenes with complex dialogue)

Scoring algorithm (suggested)

  • For each token in a line, look up JLPT level (N5=1, N1=5) and frequency rank
  • Weight unknown words (not in Anki known-word cache from TASK-24) more heavily
  • Composite score = weighted average of token difficulties, with bonus for line length and grammar complexity
  • Configurable weights so users can tune sensitivity

Design constraints

  • Scoring must run synchronously during subtitle rendering without perceptible latency
  • Score computation should be cached per subtitle line (lines repeat on seeks/replays)
  • Auto-pause should be debounced to avoid rapid pause/unpause on sequential hard lines

Acceptance Criteria

  • #1 Each subtitle line receives a difficulty score based on JLPT and frequency data.
  • #2 A visual indicator shows per-line difficulty in the overlay.
  • #3 Auto-pause triggers when a line exceeds the user's configured difficulty threshold.
  • #4 Difficulty scoring does not add perceptible latency to subtitle rendering.
  • #5 Per-episode average difficulty is available for the episode browser.
  • #6 Scoring weights are configurable in settings.