# OwOCR OwOCR is a command-line text recognition tool that continuously scans for images and performs OCR (Optical Character Recognition) on them. Its main focus is Japanese, but it works for many other languages. ## Demo ### Visual novel: https://github.com/user-attachments/assets/f2196b23-f2e7-4521-820a-88c8bedb9d8e ### Manga: https://github.com/user-attachments/assets/f061854d-d20f-43e8-8c96-af5a0ea26f43 ## Installation OwOCR has been tested on Python 3.11, 3.12 and 3.13. It can be installed with `pip install owocr` after you install Python. You also need to have one or more OCR engines, check the list below for instructions. I recommend installing at least Google Lens on any operating system, and OneOCR if you are on Windows. Bing is pre-installed, Apple Vision and Live Text come pre-installed on macOS. ## Basic usage ``` owocr ``` This default behavior monitors the clipboard for images and outputs recognized text back to the clipboard. ## Main features - Multiple input sources: clipboard, folders, websockets, unix domain socket, and screen capture - Multiple output destinations: clipboard, text files, and websockets - Pause/unpause with `p` or terminate with `t`/`q` in the terminal, switch between engines with `s` or the engine-specific keys (from the engine list below) - Capture from specific screen areas, windows, of areas within windows (window capture is only supported on Windows/macOS). This also tries to capture entire sentences and filter all repetitions. If you use an online engine like Lens I recommend setting a secondary local engine with the `-es` option: `-es=oneocr` on Windows and `-es=alivetext` on macOS. With this "two pass" system only the changed areas are sent to the online service, allowing for both speed and accuracy - Multiple configurable keyboard combinations to control owocr from anywhere, including pausing, switching engines, taking a screenshot of the selected screen/window and running the automatic tool to re-select an area of the screen/window via drag and drop - Read from a unix domain socket `/tmp/owocr.sock` on macOS/Linux - Furigana filter, works by default with Japanese text (both vertical and horizontal) ## Common option examples - Write text to a file: `owocr -w=` - Read images from a folder: `owocr -r=` - Write text to a websocket: `owocr -w=websocket` (default websocket port is 7331) - Read from the screen or a portion of the screen (opens the automatic drag and drop selector): `owocr -r=screencapture` - Read from a window having "Notepad" in the title: `owocr -r=screencapture -sa=Notepad` - Read from a portion of a window having "Notepad" in the title (opens the automatic drag and drop selector): `owocr -r=screencapture -sa=Notepad -swa` ## Configuration There are many more options and customization features. For complete documentation of all available settings: - View all command-line options and their descriptions: `owocr -h` - Check the automatically generated config file at `~/.config/owocr_config.ini` on Linux/macOS, or `C:\Users\yourusername\.config\owocr_config.ini` on Windows - See a sample config file: [owocr_config.ini](https://raw.githubusercontent.com/AuroraWright/owocr/master/owocr_config.ini) The command-line options/config file allow you to configure OCR providers, hotkeys, screen capture settings, notifications, and much more. # Supported engines ## Local - [Manga OCR](https://github.com/kha-white/manga-ocr) (with optional [comic-text-detector](https://github.com/dmMaze/comic-text-detector) as segmenter) - install: `pip install "owocr[mangaocr]"` → keys: `m` (regular, ideal for small text areas), `n` (segmented, ideal for manga panels/larger images with multiple text areas) - [EasyOCR](https://github.com/JaidedAI/EasyOCR) - install: `pip install "owocr[easyocr]"` → key: `e` - [RapidOCR](https://github.com/RapidAI/RapidOCR) - install: `pip install "owocr[rapidocr]"` → key: `r` - Apple Vision framework - Probably the best local engine to date. **macOS only - Recommended (pre-installed)** → key: `a` - Apple Live Text (VisionKit framework) - It should be the same as Vision except that in Sonoma Apple added vertical text reading. **macOS only - Recommended (pre-installed)** → key: `d` - WinRT OCR: install: `pip install "owocr[winocr]"`. It can also be used by installing winocr on a Windows virtual machine and running the server there (`winocr_serve`) and specifying the IP address of the Windows VM/machine in the config file. **Windows 10/11 only** → key: `w` - OneOCR - install: `pip install "owocr[oneocr]"`. Close second local best to the Apple one. You need to copy 3 system files from Windows 11 to use it, refer to the readme [here](https://github.com/AuroraWright/oneocr). It can also be used by installing oneocr on a Windows virtual machine and running the server there (`oneocr_serve`) and specifying the IP address of the Windows VM/machine in the config file. **Windows 10/11 only - Recommended** → key: `z` - [meikiocr](https://github.com/rtr46/meikiocr) - install: `pip install "owocr[meikiocr]"`. Comparable to OneOCR in accuracy and CPU latency. Can be run on Nvidia GPUs via `pip uninstall onnxruntime && pip install onnxruntime-gpu` making it the fastest OCR available. Probably best option for Linux users. Can't process vertical text and is limited to 64 text lines and 48 characters per line. → key: `k` ## Cloud - Google Lens - install: `pip install "owocr[lens]"`. Arguably the best OCR engine to date. **Recommended** → key: `l` - Bing - Close second best. **Recommended (pre-installed)** → key: `b` - Google Vision: install: `pip install "owocr[gvision]"`, you also need a service account .json file named google_vision.json in `user directory/.config/` → key: `g` - Azure Image Analysis: install: `pip install "owocr[azure]"`, you also need to specify an api key and an endpoint in the config file → key: `v` - OCRSpace: you need to specify an api key in the config file → key: `o` # Acknowledgments This uses code from/references these people/projects: - Viola for working on the Google Lens implementation (twice!) and helping with the pyobjc VisionKit code! - @rtr46 for contributing a big overhaul allowing for coordinate support and JSON output - @bpwhelan for contributing code for other language support and for his ideas (like two pass processing) originally implemented in the Game Sentence Miner fork of owocr - @bropines for the Bing code ([Github issue](https://github.com/AuroraWright/owocr/issues/10)) - @ronaldoussoren for helping with the pyobjc VisionKit code - [Manga OCR](https://github.com/kha-white/manga-ocr) for inspiring and being the project owocr was originally derived from - [Mokuro](https://github.com/kha-white/mokuro) for the comic text detector integration code - [ocrmac](https://github.com/straussmaximilian/ocrmac) for the Apple Vision framework API - [ccylin2000_lipboard_monitor](https://github.com/vaimalaviya1233/ccylin2000_lipboard_monitor) for the Windows clipboard polling code - vicky for the demo videos in this readme!