docs: refresh immersion and stats documentation

This commit is contained in:
2026-03-17 00:48:57 -07:00
parent 390ae1b2f2
commit 078792e0b2
7 changed files with 132 additions and 36 deletions

View File

@@ -27,7 +27,7 @@ SubMiner is an Electron overlay that sits on top of mpv. It turns your video pla
- **Look up words as you watch** — Yomitan dictionary popups on hover or keyboard-driven token-by-token navigation
- **One-key Anki mining** — Creates cards with sentence, audio, screenshot, and translation; optional local AnkiConnect proxy auto-enriches Yomitan cards instantly
- **Reading annotations** — N+1 targeting, frequency-dictionary highlighting, JLPT underlining, and character name dictionary for anime/manga proper nouns
- **Immersion stats** — Optional local dashboard and overlay for watch time, anime progress, session drill-down, vocabulary growth, mining throughput, and card mining directly from example sentences
- **Immersion stats** — Optional local dashboard and overlay for watch time, anime progress, session drill-down, vocabulary growth, mining throughput, and card mining directly from example sentences; exact lifetime totals are kept locally in SQLite by default
- **Subtitle tools** — Download from Jimaku, sync with alass/ffsubsync
- **Jellyfin & AniList integration** — Remote playback, cast device mode, and automatic episode progress tracking
- **Texthooker & API** — Built-in texthooker page and annotated websocket feed for external clients

View File

@@ -0,0 +1,4 @@
type: fixed
area: anki
- Fixed repeated character-dictionary startup work by scheduling auto-sync only from mpv media-path changes instead of also re-triggering it from connection and media-title events for the same title.

View File

@@ -0,0 +1,7 @@
type: fixed
area: overlay
- Fixed macOS fullscreen overlay stability by keeping the passive visible overlay from stealing focus, re-raising the overlay window when reasserting its macOS topmost level, and tolerating one transient macOS tracker/helper miss before hiding the overlay.
- Kept subtitle tokenization warmup one-shot for the lifetime of the app so later fullscreen/media churn on macOS does not replay the startup warmup gate after the first file is ready.
- Added a bounded macOS tracker loss-grace window so fullscreen enter/leave transitions do not immediately hide and reload the overlay when the helper briefly loses the mpv window.
- Skipped subtitle/tokenization refresh invalidation on character-dictionary auto-sync completion when the dictionary was already current, preventing startup flash/reload loops on unchanged media.

View File

@@ -0,0 +1,8 @@
type: changed
area: immersion
- Kept immersion tracking history by default while preserving daily/monthly rollup maintenance.
- Added exact lifetime summary reads for overview/anime/media stats so dashboard totals no longer depend on rescanning raw telemetry.
- Reduced tracker storage overhead by removing duplicated subtitle text from subtitle-line event payloads.
- Deduplicated episode cover-art blobs through a shared blob store and updated cover-art reads/writes to resolve shared images correctly.
- Added indexes for large-history session, telemetry, vocabulary, kanji, and cover-art queries to keep dashboard reads fast as the SQLite database grows.

View File

@@ -1180,12 +1180,20 @@ Enable or disable local immersion analytics stored in SQLite for mined subtitles
"queueCap": 1000,
"payloadCapBytes": 256,
"maintenanceIntervalMs": 86400000,
"retentionMode": "preset",
"retentionPreset": "balanced",
"retention": {
"eventsDays": 7,
"telemetryDays": 30,
"dailyRollupsDays": 365,
"monthlyRollupsDays": 1825,
"vacuumIntervalDays": 7
"eventsDays": 0,
"telemetryDays": 0,
"sessionsDays": 0,
"dailyRollupsDays": 0,
"monthlyRollupsDays": 0,
"vacuumIntervalDays": 0
},
"lifetimeSummaries": {
"global": true,
"anime": true,
"media": true
}
}
}
@@ -1200,11 +1208,16 @@ Enable or disable local immersion analytics stored in SQLite for mined subtitles
| `queueCap` | integer (`100`-`100000`) | In-memory queue cap. Overflow drops oldest writes. Default `1000`. |
| `payloadCapBytes` | integer (`64`-`8192`) | Event payload byte cap before truncation marker. Default `256`. |
| `maintenanceIntervalMs` | integer (`60000`-`604800000`) | Prune + rollup maintenance cadence. Default `86400000` (24h). |
| `retention.eventsDays` | integer (`1`-`3650`) | Raw event retention window. Default `7` days. |
| `retention.telemetryDays` | integer (`1`-`3650`) | Telemetry retention window. Default `30` days. |
| `retention.dailyRollupsDays` | integer (`1`-`36500`) | Daily rollup retention window. Default `365` days. |
| `retention.monthlyRollupsDays` | integer (`1`-`36500`) | Monthly rollup retention window. Default `1825` days (~5 years). |
| `retention.vacuumIntervalDays` | integer (`1`-`3650`) | Minimum spacing between `VACUUM` passes. Default `7` days. |
| `retentionMode` | `preset`,`advanced` | Retention mode. `preset` applies `retentionPreset`, `advanced` uses explicit values only. Default `preset`. |
| `retentionPreset` | `minimal`,`balanced`,`deep-history` | Retention preset used when `retentionMode = "preset"`. Default `balanced`. |
| `retention.eventsDays` | integer (`0`-`3650`) | Raw event retention window in days. Default `0` (keep all). |
| `retention.telemetryDays` | integer (`0`-`3650`) | Telemetry retention window in days. Default `0` (keep all). |
| `retention.sessionsDays` | integer (`0`-`3650`) | Session retention window in days. Default `0` (keep all). |
| `retention.dailyRollupsDays` | integer (`0`-`36500`) | Daily rollup retention window. Default `0` (keep all). |
| `retention.monthlyRollupsDays` | integer (`0`-`36500`) | Monthly rollup retention window. Default `0` (keep all). |
| `retention.vacuumIntervalDays` | integer (`0`-`3650`) | Minimum spacing between `VACUUM` passes. `0` disables vacuum. Default `0` (disabled). |
Default behavior keeps raw events, telemetry, sessions, and rollups forever while still maintaining lifetime summary tables and daily/monthly rollups for faster reads. If you later want bounded retention, switch `retentionMode` or set explicit `retention.*` values.
When `dbPath` is blank or omitted, SubMiner writes telemetry and session summaries to the default app-data location:

View File

@@ -2,7 +2,7 @@
SubMiner can log your watching and mining activity to a local SQLite database, then surface it in the built-in stats dashboard. Tracking is enabled by default and can be turned off if you do not want local analytics.
When enabled, SubMiner records per-session statistics (watch time, subtitle lines seen, words encountered, cards mined) and maintains daily and monthly rollups. You can view that data in SubMiner's stats UI or query the database directly with any SQLite tool.
When enabled, SubMiner records per-session statistics (watch time, subtitle lines seen, words encountered, cards mined) and maintains exact lifetime summary tables plus daily/monthly rollups. You can view that data in SubMiner's stats UI or query the database directly with any SQLite tool.
## Enabling
@@ -74,16 +74,35 @@ The Vocabulary tab toolbar includes an **Exclusions** button for hiding words fr
## Retention Defaults
Data is kept for the following durations before automatic cleanup:
By default, SubMiner keeps all retention tables and raw data (`0` means keep all) while continuing daily/monthly rollup maintenance:
| Data type | Retention |
| -------------- | --------- |
| Raw events | 7 days |
| Telemetry | 30 days |
| Daily rollups | 1 year |
| Monthly rollups | 5 years |
| Raw events | 0 (keep all) |
| Telemetry | 0 (keep all) |
| Sessions | 0 (keep all) |
| Daily rollups | 0 (keep all) |
| Monthly rollups | 0 (keep all) |
Maintenance runs on startup and every 24 hours. Vacuum runs weekly.
Maintenance runs on startup and every 24 hours. Vacuum runs only when `retention.vacuumIntervalDays` is non-zero.
In practice:
- Overview totals read from lifetime summary tables, so all-time watch time/cards/words stay exact even if raw query paths evolve.
- Anime and episode pages keep lifetime totals from summary tables while session drill-down still reads retained sessions directly. With the current defaults, both are kept forever.
- Trends can read the full available history because daily/monthly rollups are also kept forever by default.
- Vocabulary and kanji totals are cumulative and not bounded by the raw session retention knobs.
## Storage / Performance Model
The tracker is optimized for "keep everything" defaults:
- Exact all-time totals live in dedicated lifetime summary tables (`imm_lifetime_global`, `imm_lifetime_anime`, `imm_lifetime_media`).
- Ended-session totals are persisted onto `imm_sessions`, so most dashboard reads do not need to rescan raw telemetry.
- Daily and monthly rollups remain available for chart queries and coarse trend views.
- Subtitle text is stored once in `imm_subtitle_lines`; subtitle-line event payloads keep compact metadata only.
- Cover-art binaries are deduplicated through a shared blob store so episodes in the same series do not each carry duplicate image bytes.
- Hot tables have dedicated indexes for session time ranges, telemetry sample windows, frequency-ranked vocabulary, and cover-art lookup keys.
## Configurable Knobs
@@ -98,9 +117,15 @@ All policy options live under `immersionTracking` in your config:
| `maintenanceIntervalMs` | How often maintenance runs |
| `retention.eventsDays` | Raw event retention |
| `retention.telemetryDays` | Telemetry retention |
| `retention.sessionsDays` | Session retention |
| `retention.dailyRollupsDays` | Daily rollup retention |
| `retention.monthlyRollupsDays` | Monthly rollup retention |
| `retention.vacuumIntervalDays` | Minimum spacing between vacuums |
| `retentionMode` | `preset` or `advanced` |
| `retentionPreset` | `minimal`, `balanced`, or `deep-history` (used by `retentionMode`) |
| `lifetimeSummaries.global` | Maintain global lifetime totals |
| `lifetimeSummaries.anime` | Maintain per-anime lifetime totals |
| `lifetimeSummaries.media` | Maintain per-media lifetime totals |
## Query Templates
@@ -129,26 +154,43 @@ SELECT
s.video_id,
s.started_at_ms,
s.ended_at_ms,
COALESCE(SUM(t.active_watched_ms), 0) AS active_watched_ms,
COALESCE(SUM(t.words_seen), 0) AS words_seen,
COALESCE(SUM(t.cards_mined), 0) AS cards_mined,
COALESCE(s.active_watched_ms, 0) AS active_watched_ms,
COALESCE(s.words_seen, 0) AS words_seen,
COALESCE(s.cards_mined, 0) AS cards_mined,
CASE
WHEN COALESCE(SUM(t.active_watched_ms), 0) > 0
THEN COALESCE(SUM(t.words_seen), 0) / (COALESCE(SUM(t.active_watched_ms), 0) / 60000.0)
WHEN COALESCE(s.active_watched_ms, 0) > 0
THEN COALESCE(s.words_seen, 0) / (COALESCE(s.active_watched_ms, 0) / 60000.0)
ELSE NULL
END AS words_per_min,
CASE
WHEN COALESCE(SUM(t.active_watched_ms), 0) > 0
THEN (COALESCE(SUM(t.cards_mined), 0) * 60.0) / (COALESCE(SUM(t.active_watched_ms), 0) / 60000.0)
WHEN COALESCE(s.active_watched_ms, 0) > 0
THEN (COALESCE(s.cards_mined, 0) * 60.0) / (COALESCE(s.active_watched_ms, 0) / 60000.0)
ELSE NULL
END AS cards_per_hour
FROM imm_sessions s
LEFT JOIN imm_session_telemetry t ON t.session_id = s.session_id
GROUP BY s.session_id
ORDER BY s.started_at_ms DESC
LIMIT ?;
```
### Lifetime anime totals
```sql
SELECT
a.anime_id,
a.canonical_title,
la.total_sessions,
la.total_active_ms,
la.total_cards,
la.total_words_seen,
la.total_lines_seen,
la.first_watched_ms,
la.last_watched_ms
FROM imm_lifetime_anime la
JOIN imm_anime a ON a.anime_id = la.anime_id
ORDER BY la.last_watched_ms DESC
LIMIT ?;
```
### Daily rollups
```sql
@@ -192,16 +234,25 @@ LIMIT ?;
- Queue overflow policy: drop oldest queued writes, keep newest.
- SQLite pragmas: `journal_mode=WAL`, `synchronous=NORMAL`, `foreign_keys=ON`, `busy_timeout=2500`.
- Rollups run incrementally from the last processed telemetry sample; startup performs a one-time bootstrap pass.
- If retention pruning removes telemetry/session rows, maintenance triggers a full rollup rebuild to resync historical aggregates.
- Cover-art blobs are deduplicated into `imm_cover_art_blobs` and referenced from `imm_media_art`.
- Large-table reads are index-backed for `sample_ms`, session time windows, frequency-ranked words/kanji, and cover-art identity lookups.
### Schema (v3)
### Schema (v12)
Core tables:
- `imm_videos` — video key/title/source metadata
- `imm_sessions` — session UUID, video reference, timing/status
- `imm_sessions` — session UUID, video reference, timing/status, final denormalized totals
- `imm_session_telemetry` — high-frequency session aggregates over time
- `imm_session_events` — event stream with compact numeric event types
- `imm_subtitle_lines` — persisted subtitle text and timing per session/video
Lifetime summary tables:
- `imm_lifetime_global`
- `imm_lifetime_anime`
- `imm_lifetime_media`
- `imm_lifetime_applied_sessions`
Rollup tables:
@@ -212,3 +263,8 @@ Vocabulary tables:
- `imm_words(id, headword, word, reading, first_seen, last_seen, frequency)`
- `imm_kanji(id, kanji, first_seen, last_seen, frequency)`
Media-art tables:
- `imm_media_art` — per-video cover metadata plus shared blob reference
- `imm_cover_art_blobs` — deduplicated image bytes keyed by blob hash

View File

@@ -498,13 +498,21 @@
"queueCap": 1000, // In-memory write queue cap before overflow policy applies.
"payloadCapBytes": 256, // Max JSON payload size per event before truncation.
"maintenanceIntervalMs": 86400000, // Maintenance cadence (prune + rollup + vacuum checks).
"retentionMode": "preset", // Retention mode to use for defaults. Values: preset | advanced
"retentionPreset": "balanced", // Named preset when retentionMode is preset.
"retention": {
"eventsDays": 7, // Raw event retention window in days.
"telemetryDays": 30, // Telemetry retention window in days.
"dailyRollupsDays": 365, // Daily rollup retention window in days.
"monthlyRollupsDays": 1825, // Monthly rollup retention window in days.
"vacuumIntervalDays": 7 // Minimum days between VACUUM runs.
} // Retention setting.
"eventsDays": 0, // Raw event retention window in days. Use 0 to keep all.
"telemetryDays": 0, // Telemetry retention window in days. Use 0 to keep all.
"sessionsDays": 0, // Session retention window in days. Use 0 to keep all.
"dailyRollupsDays": 0, // Daily rollup retention window in days. Use 0 to keep all.
"monthlyRollupsDays": 0, // Monthly rollup retention window in days. Use 0 to keep all.
"vacuumIntervalDays": 0 // Minimum days between VACUUM runs. Use 0 to disable.
}, // Retention setting.
"lifetimeSummaries": {
"global": true, // Keep lifetime global totals.
"anime": true, // Keep lifetime per-anime totals.
"media": true // Keep lifetime per-media totals.
} // Lifetime summary setting.
}, // Enable/disable immersion tracking.
// ==========================================