sudacode 430373f010 feat(tokenizer): use Yomitan word classes for subtitle POS filtering (#57)
* feat(tokenizer): use Yomitan word classes for subtitle POS filtering

- Carry matched headword wordClasses from termsFind into YomitanScanToken
- Map recognized Yomitan wordClasses to SubMiner coarse POS before annotation
- MeCab enrichment now fills only missing POS fields, preserving existing coarse pos1
- Exclude standalone grammar particles, して helper fragments, and single-kana surfaces from annotations
- Respect source-text punctuation gaps when counting N+1 sentence words
- Preserve known-word highlight on excluded kanji-containing tokens
- Add backlog tasks 304 (N+1 boundary bug) and 305 (wordClasses POS, done)

* fix(tokenizer): preserve annotation and enrichment behavior

* fix: restore jlpt subtitle underlines

* fix: exclude kana-only n+1 targets

* fix: refresh overlay on Hyprland fullscreen

* fix: address fullscreen and n-plus-one review notes

* fix: address CodeRabbit review comments

* fix: accept modified digits for multi-line sentence mining

* Cancel pending Linux MPV fullscreen overlay refresh bursts

- return a cancel handle from the Linux refresh burst scheduler
- clear pending refresh bursts when overlays hide or windows close
- tighten the burst test polling to wait for the async refresh

* fix: suppress N+1 for kana-only candidates and fix minSentenceWords coun

- Treat kana-only tokens with surrounding subtitle punctuation (…, ―, etc.) as kana-only so they are not promoted to N+1 targets
- Exclude unknown tokens filtered from N+1 targeting from the minSentenceWords count so filtered kana-only unknowns cannot satisfy sentence length threshold
- Add regression tests for kana-only candidate suppression and filtered-unknown padding cases

* Suppress subtitle annotations for grammar fragments

- Hide annotation metadata for auxiliary inflection and ja-nai endings
- Preserve lexical `くれる` forms and add regression coverage

* Fix kana-only N+1 tokenizer regression test

- Use a pure-kana fixture for the subtitle token N+1 case
- Update task notes for the latest CodeRabbit follow-up

* Fix managed playback exit and tokenizer grammar splits

- Ignore background stats daemons during regular app startup
- Split standalone grammar endings before applying annotations
- Clear helper-span annotations for auxiliary-only tokens

* fix: refresh current subtitle after known-word mining

* fix: suppress sigh interjection annotations

* fix: preserve jlpt underline color after lookup

* Replace grammar-ending permutations with shared matcher; preserve word a

- Extract `grammar-ending.ts` with `isStandaloneGrammarEndingText` / `isSubtitleGrammarEndingText` pattern matchers
- Replace `STANDALONE_GRAMMAR_ENDINGS` set in parser-selection-stage with shared matcher
- Replace generated phrase sets in subtitle-annotation-filter with shared matcher
- Remove stale duplicate subtitle-exclusion constants and helpers from annotation-stage
- Manual clipboard card updates now write only to the sentence audio field, leaving word/expression audio untouched

* fix: CI changelog, annotation options threading, and Jellyfin quit

- Add `type: fixed` / `area:` frontmatter to `changes/319` to pass `changelog:lint`
- Thread `TokenizerAnnotationOptions` through `stripSubtitleAnnotationMetadata` so `sourceText` is honored
- Include `jellyfinPlay` in `shouldQuitOnDisconnectWhenOverlayRuntimeInitialized` predicate
- Make mouse test `elementFromPoint` stubs coordinate-sensitive
- Make Lua test `.tmp` mkdir portable on Windows

* Preserve overlay across macOS flaps and mpv playlist changes

- keep visible overlays alive during transient macOS tracker loss
- reuse the running mpv overlay path on playlist navigation
- update regression coverage and changelog fragments

* fix: restore stats daemon deferral

* fix: keep subtitle prefetch alive after cache hits

* Fix JLPT underline color drift and AniList skipped-threshold sync

- Replace JLPT `text-decoration` underlines with `border-bottom` so Chromium selection/hover cannot repaint them to another annotation's color
- Lock JLPT underline color for combined annotation selectors (known, n+1, frequency) and character hover/selection states
- Trigger AniList post-watch check on every mpv time-position update to catch skipped completion thresholds
- Fall back to filename-parser season/episode when guessit omits them

* fix: address coderabbit feedback

* fix: sync AniList after seeked completion

* fix: preserve ordinal frequency annotations

* fix: preserve known highlighting for filtered tokens

* fix: address PR #57 CodeRabbit feedback

- Acquire AniList post-watch in-flight lock before async gating to prevent duplicate writes
- Isolate manual watched mark result from AniList post-watch callback failures
- Report known-word cache clears as mutations during immediate append when state existed
- Add regression tests for each fix

* fix: stop AniList setup reopening on Linux when keyring token exists

- Gate setup success on token persistence: `saveToken` now returns `boolean`; on failure, keeps the setup window open instead of reporting success
- Config reload passes `allowSetupPrompt: false` so playback reloads don't re-open the setup window
- Add regression test for persistence-failure path

* fix: suppress known highlights for subtitle particles

* fix: retry transient AniList safeStorage failures

* fix: hide overlay focus ring

* fix: align Hyprland fullscreen overlays

* fix: restore subtitle playback keybindings

* fix: align Hyprland overlay windows to mpv and stop pinning them

- Force-apply exact Hyprland move/resize/setprop dispatches when bounds are provided
- Stop pinning overlay windows; toggle pin off when Hyprland reports pinned=true
- Compensate stats overlay outer placement for Electron/Wayland content insets
- Make stats overlay window and page opaque so mpv cannot show through transparent insets
- Constrain stats app to h-screen with internal scroll so content covers mpv from y=0
- Lock overlay/stats window titles against page-title-updated events
- Add regression coverage for placement dispatches, inset compensation, and CSS overlay mode

* fix: retain frequency rank for honorific prefix-noun tokens

- Add `shouldAllowHonorificPrefixNounFrequency` to exempt お/ご/御 + noun merged tokens from frequency exclusion
- Add regression test for `ご機嫌` asserting rank 5484 is preserved after MeCab enrichment and annotation
- Close TASK-341

* fix: map openCharacterDictionary session action to --open-character-dict

- Add missing Lua CLI dispatch entry for openCharacterDictionary
- Add regression test for Alt+Meta+A binding and CLI flag forwarding

* fix: keep macOS overlay interactive while mpv remains active

- Overlay no longer hides or becomes click-through during tracker refreshes when mpv is the focused window
- Preserve already-visible overlay when tracker is temporarily not ready but mpv target signal is active
- Add regression tests for active-mpv tracker refresh and transient tracker-not-ready paths

* fix: address coderabbit subtitle follow-ups

* fix: resolve media detail from sessions when lifetime summary is absent

- Change `getMediaDetail` JOIN to LEFT JOIN on `imm_lifetime_media` and fall back to aggregated session metrics when no lifetime row exists
- Add filter `AND (lm.video_id IS NOT NULL OR s.session_id IS NOT NULL)` to keep results valid
- Add regression test covering the session-visible / media-detail-missing mismatch

* fix: address PR-57 CodeRabbit findings and CI failures

- use filtered word counts in media detail session token aggregation
- cancel fullscreen refresh burst on exit via updateLinuxMpvFullscreenOverlayRefreshBurst
- guard Hyprland JSON.parse in try/catch; exclude windowtitle from geometry events
- narrow focus suppression from :focus to :focus-visible
- apply JLPT lock selectors to word-name-match tokens (N1–N5)

* fix: macOS overlay z-order and Yomitan compound token known highlighting

- Release always-on-top when tracked mpv loses foreground on macOS
- Skip visible overlay blur restacking on macOS to avoid covering unrelated windows
- Prefer Yomitan internal parse tokens over fragmented scanner output for known-word decisions
- Add regression tests for both behaviors

* fix: macOS visible-overlay blur no longer invokes Windows-only blur call

- Split win32/darwin branches in handleOverlayWindowBlurred so darwin visible blur returns early without calling onWindowsVisibleOverlayBlur
- Add regression test asserting Windows callback stays inactive on macOS visible overlay blur
- Close TASK-347
2026-05-12 12:08:09 -07:00
2026-04-11 21:45:52 -07:00
2026-04-11 21:54:00 -07:00

SubMiner logo

SubMiner

Look up words with Yomitan, export to Anki in one key, track your immersion — all without leaving mpv.

Installation · Requirements · Usage · Documentation

Downloads Release AUR Platform License TypeScript

SubMiner demo

Features

Dictionary Lookups

Yomitan runs inside the overlay. Trigger a lookup on any word for full dictionary popups — definitions, pitch accent, frequency data — without ever leaving mpv.

Yomitan dictionary popup over annotated subtitles in mpv

Instant Anki Mining

Create an Anki card with the sentence, audio clip, screenshot, and machine translation from the exact playback moment with one key press, click, or controller input.

Anki card created from SubMiner with sentence, audio, and screenshot

Reading Annotations

Real-time subtitle annotations with frequency highlighting, JLPT tags, N+1 targeting, and a character name dictionary. Known words fade back; new words stand out. Grammar-only tokens render as plain text so you focus on what matters.

Annotated subtitles with frequency coloring, JLPT underlines, and N+1 targets

Immersion Dashboard

Local stats dashboard — watch time, anime library, vocabulary growth, mining throughput, session history, and trends. All stored locally, no third-party tracking.

Stats dashboard showing watch time, cards mined, streaks, and tracking data

Playlist Browser

Browse sibling episode files and the active mpv queue in one overlay modal. Open it with Ctrl+Alt+P to append episodes from the current directory, jump to queued items, remove entries, or reorder the playlist without leaving playback.

Stats dashboard showing watch time, cards mined, streaks, and tracking data

Integrations

YouTube Auto-loaded yt-dlp subtitle tracks at startup with config-driven primary/secondary language priorities and a manual overlay picker on demand (Ctrl+Alt+C)
AniList Automatic episode tracking and progress sync
Jellyfin Browse, launch, and cast media from your Jellyfin server with setup and discovery controls in the app tray
Jimaku Search and download Japanese subtitles
alass / ffsubsync Automatic subtitle retiming — requires alass or ffsubsync on your PATH (optional; subtitle syncing is disabled without them)
WebSocket Annotated subtitle feed for external clients (texthooker pages, custom tools)
Texthooker page receiving annotated subtitle lines via WebSocket


Requirements

Required Recommended Optional
Player mpv with IPC socket
Processing ffmpeg (audio clips & screenshots) mecab + mecab-ipadic (annotation POS filtering), guessit (AniSkip), alass / ffsubsync (subtitle sync)
Media yt-dlp, chafa, ffmpegthumbnailer
Selection fzf / rofi

Tip

ffmpeg is not strictly required to run SubMiner, but without it audio clips and screenshots will not be attached to Anki cards. Most users will want it installed.

Note

bun is required if building from source or using the CLI wrapper: subminer. Pre-built releases (AppImage, DMG, installer) do not require it.

Platform-specific:

Linux macOS Windows
Hyprland (hyprctl) · X11/Xwayland (xdotool + xwininfo) Accessibility permission No extra deps

Note

Wayland support is compositor-specific. Wayland has no universal API for window positioning and each compositor exposes its own IPC, so SubMiner needs a dedicated backend per compositor. Hyprland is the only native Wayland backend supported currenlty. All other Linux compositors require both mpv and SubMiner to run under X11 or Xwayland. The launcher detects your compositor and configures this automatically.

Arch Linux
paru -S --needed mpv ffmpeg
# Optional
paru -S --needed mecab-git mecab-ipadic yt-dlp fzf rofi chafa ffmpegthumbnailer xdotool xorg-xwininfo
# Optional: subtitle sync (install at least one for subtitle syncing to work)
paru -S --needed alass python-ffsubsync
# X11 / Xwayland (required for non-Hyprland compositors)
paru -S --needed xdotool xorg-xwininfo
macOS
brew install mpv ffmpeg
# Optional
brew install mecab mecab-ipadic yt-dlp fzf rofi chafa ffmpegthumbnailer
# Optional: subtitle sync (install at least one for subtitle syncing to work)
brew install alass
pip install ffsubsync

Grant Accessibility permission to SubMiner in System Settings > Privacy & Security > Accessibility.

Windows

Install mpv and ffmpeg and ensure both are on your PATH.

Optionally install MeCab for Windows with the UTF-8 dictionary for additional metadata enrichment.


Quick Start

1. Install

Arch Linux (AUR)
paru -S subminer-bin

Or manually:

git clone https://aur.archlinux.org/subminer-bin.git && cd subminer-bin && makepkg -si
Linux (AppImage)
mkdir -p ~/.local/bin
wget https://github.com/ksyasuda/SubMiner/releases/latest/download/SubMiner.AppImage -O ~/.local/bin/SubMiner.AppImage \
	&& chmod +x ~/.local/bin/SubMiner.AppImage
wget https://github.com/ksyasuda/SubMiner/releases/latest/download/subminer -O ~/.local/bin/subminer \
	&& chmod +x ~/.local/bin/subminer

Note

The subminer wrapper uses a Bun shebang. Make sure bun is on your PATH.

macOS

Download the latest DMG or ZIP from GitHub Releases and drag SubMiner.app into /Applications.

Also download the subminer launcher (recommended):

sudo curl -fSL https://github.com/ksyasuda/SubMiner/releases/latest/download/subminer -o /usr/local/bin/subminer \
	&& sudo chmod +x /usr/local/bin/subminer

Note

The subminer launcher uses a Bun shebang. Make sure bun is on your PATH.

Windows

Download the latest installer or portable .zip from GitHub Releases. Make sure mpv is on your PATH.

Windows support is experimental. Core features such as mining, annotations, and dictionary lookups work, but some functionality may be missing or unstable. Bug reports welcome.

Note: On Windows the subminer launcher requires bun and must be invoked with bun run subminer instead of running the script directly. The recommended alternative is the SubMiner mpv shortcut created during first-run setup — double-click it, drag files onto it, or run SubMiner.exe --launch-mpv from a terminal. See the Windows mpv Shortcut section for details.

From source

See the build-from-source guide.

2. First Launch

subminer app --setup            # launch the first-run setup wizard

SubMiner creates a default config, starts in the system tray, and opens a setup popup that walks you through installing the mpv plugin and configuring Yomitan dictionaries. Follow the on-screen steps to complete setup.

Jellyfin setup is available from the tray or subminer jellyfin; once Jellyfin is enabled with a server URL, the tray can toggle Jellyfin Discovery for the current app session.

Note

On Windows, run SubMiner.exe directly — it opens the setup wizard automatically on first launch.

3. Verify Setup

subminer doctor                 # verify mpv, ffmpeg, config, and socket

Note

On Windows, use bun run subminer doctor or run SubMiner.exe directly — first-run setup will guide you through dependency checks.

4. Mine

subminer video.mkv          # play video with overlay
subminer --start video.mkv  # explicit overlay start
subminer stats              # open immersion dashboard
subminer stats -b           # stats daemon in background
subminer stats -s           # stop background stats daemon

On Windows, the subminer script must be run with bun run subminer (e.g. bun run subminer video.mkv). The recommended alternative is the SubMiner mpv shortcut (created during setup) or SubMiner.exe --launch-mpv. Drag a video file onto the shortcut to play it, or double-click it to open mpv with SubMiner's defaults.

Documentation

Full guides on configuration, Anki setup, Jellyfin, immersion tracking, and more: docs.subminer.moe


Acknowledgments

SubMiner builds on the work of these open-source projects:

Project Role
Anacreon-Script Inspiration for the mining workflow
asbplayer Inspiration for subtitle sidebar and logic for YouTube subtitle parsing
Bee's Character Dictionary Character name recognition in subtitles
GameSentenceMiner Inspiration for Electron overlay with Yomitan integration
jellyfin-mpv-shim Jellyfin integration
Jimaku.cc Japanese subtitle search and downloads
Renji's Texthooker Page Base for the WebSocket texthooker integration
Yomitan Dictionary engine powering all lookups and the morphological parser
yomitan-jlpt-vocab JLPT level tags for vocabulary

License

GNU General Public License v3.0

Languages
TypeScript 95.3%
Lua 2.3%
CSS 1%
Shell 0.7%
HTML 0.2%
Other 0.4%