SubMiner/backlog/tasks/task-41 - Add-real-time-sentence-difficulty-scoring-with-auto-pause-on-hard-lines.md at 07cedabfe317a8be5a8aadbc480887c8df830f96 - SubMiner

sudacode/SubMiner

Fork 0

mirror of https://github.com/ksyasuda/SubMiner.git synced 2026-02-28 06:22:45 -08:00

Files

sudacode f20d019c11 pretty

2026-02-17 22:54:09 -08:00

2.6 KiB

Raw Blame History

id, title, status, assignee, created_date, labels, dependencies, priority

title

status

assignee

created_date

labels

dependencies

priority

TASK-41

Add real-time sentence difficulty scoring with auto-pause on hard lines

To Do

2026-02-14 02:09

feature

nlp

immersion

subtitle

TASK-23

TASK-25

medium

Description

Compute a real-time difficulty score for each subtitle line using JLPT level data (TASK-23) and frequency dictionary data (TASK-25), and use this score to drive smart playback features.

Motivation

Learners at different levels have different needs. N4 learners want to pause on N2+ lines; advanced learners want to skip easy content. A per-line difficulty score enables intelligent playback that adapts to the learner's level.

Features

Per-line difficulty score: Combine JLPT levels and frequency ranks of tokens to produce a composite difficulty score (e.g., 1-5 scale or JLPT-equivalent label)
Visual difficulty indicator: Subtle color/icon on each subtitle line indicating difficulty
Auto-pause on difficult lines: Configurable threshold — pause playback when a line exceeds the user's set difficulty level
Per-episode difficulty rating: Average difficulty across all lines, shown in the episode browser (TASK-34)
Difficulty trend within a video: Show whether difficulty increases/decreases over the episode (useful for detecting climax scenes with complex dialogue)

Scoring algorithm (suggested)

For each token in a line, look up JLPT level (N5=1, N1=5) and frequency rank
Weight unknown words (not in Anki known-word cache from TASK-24) more heavily
Composite score = weighted average of token difficulties, with bonus for line length and grammar complexity
Configurable weights so users can tune sensitivity

Design constraints

Scoring must run synchronously during subtitle rendering without perceptible latency
Score computation should be cached per subtitle line (lines repeat on seeks/replays)
Auto-pause should be debounced to avoid rapid pause/unpause on sequential hard lines

Acceptance Criteria

#1 Each subtitle line receives a difficulty score based on JLPT and frequency data.
#2 A visual indicator shows per-line difficulty in the overlay.
#3 Auto-pause triggers when a line exceeds the user's configured difficulty threshold.
#4 Difficulty scoring does not add perceptible latency to subtitle rendering.
#5 Per-episode average difficulty is available for the episode browser.
#6 Scoring weights are configurable in settings.

2.6 KiB Raw Blame History