New readme/misc stuff

2025-10-19 05:18:16 +02:00
parent 6f70d05bf2
commit ce1ec817b7
4 changed files with 94 additions and 59 deletions
--- a/README.md
+++ b/README.md
@@ -1,56 +1,73 @@
 # OwOCR
-Command line client for several OCR providers derived from [Manga OCR](https://github.com/kha-white/manga-ocr), with a focus on Japanese text.
+OwOCR is a command-line text recognition tool that continuously scans for images and performs OCR (Optical Character Recognition) on them. Its main focus is Japanese, but it works for many other languages.
-# Installation
+## Installation
-This has been tested with Python 3.11, 3.12 and 3.13. It can be installed with `pip install owocr`.
+OwOCR has been tested on Python 3.11, 3.12 and 3.13. It can be installed with `pip install owocr` after you install Python. You also need to have one or more OCR engines, check the list below for instructions. I recommend installing at least Google Lens on any operating system, and OneOCR if you are on Windows. Bing is pre-installed, Apple Vision and Live Text come pre-installed on macOS.
-# Usage
+## Basic usage
-Basic usage is comparable to Manga OCR as in, `owocr` keeps scanning for images and performing text recognition on them. Similarly, by default it will read images from the clipboard and write text back to the clipboard (or optionally, read images from a folder and/or write text to a .txt file if you specify `-r=<folder path>` or `-w=<txt file path>`).
+```
 owocr
 ```
-Additionally:
+This default behavior monitors the clipboard for images and outputs recognized text back to the clipboard.
 - Scanning the clipboard takes basically zero system resources on macOS and Windows
 - Supports reading images and/or writing text to a websocket with the `-r=websocket` and/or `-w=websocket` parameters (the port is 7331 by default, and is configurable in the config file)
 - On macOS and Linux, supports reading images from a Unix domain socket (`/tmp/owocr.sock`) with `-r=unixsocket`
 - On Windows and macOS, supports capturing from the screen directly or from a specific window with `-r=screencapture`. By default it will open a coordinate picker so you can select an area of the screen and then read from it without delays, but you can change it to screenshot the whole screen, a manual set of coordinates `x,y,width,height` or just a specific window (with the window title) or a specific area of the window. You can also change the delay between screenshots or specify a keyboard combo if you don't want screenshots to be taken periodically. Refer to the config file or to `owocr --help` for more details about the screen capture settings
 - You can read images from another source at the same time with `-rs=`, the arguments are the same as `-r`
 - You can pause/unpause the image processing by pressing "p" or terminate the script with "t" or "q" inside the terminal window
 - You can switch between OCR providers pressing their corresponding keyboard key inside the terminal window (refer to the list of keys in the providers list below)
 - You can start the script paused with the `-p` option or with a specific provider with the `-e` option (refer to `owocr -h` for the list)
 - You can specify keyboard combos in the config file to pause/unpause and switch the OCR provider from anywhere (refer to the config file or `owocr -h`)
 - You can auto pause the script after a successful text recognition with the `-a=seconds` option. 0 (the default) disables it.
 - You can enable notifications in the config file or with `-n` to show the text with a native OS notification if you're not using screen capture with automatic screenshots. **Important for macOS users:** if you use Python from brew, you need to enter this command in your terminal before the first notification: `codesign -f -s - $(brew --cellar python)/3.*/Frameworks/Python.framework` (works on Ventura/Sonoma). Older macOS versions might require Python to be installed from the [official website](https://www.python.org/downloads/). Nothing can be done about this unfortunately.
 - Optionally, you can speed up the online providers by installing fpng-py: `pip install owocr[faster-png]` (requires setting up a developer environment on most operating systems/Python versions)
 - A config file (which will be automatically created in `user directory/.config/owocr_config.ini`, on Windows `user directory` is the `C:\Users\yourusername` folder) can be used to configure the script, as an example to limit providers (to reduce clutter/memory usage) as well as specifying provider settings such as api keys etc. A sample config file is also provided [here](https://raw.githubusercontent.com/AuroraWright/owocr/master/owocr_config.ini)
-# Supported providers
+## Main features
-## Local providers
+- Multiple input sources: clipboard, folders, websockets, unix domain socket, and screen capture
- [Manga OCR](https://github.com/kha-white/manga-ocr): install with `pip install owocr[mangaocr]` ("m" key)
+- Multiple output destinations: clipboard, text files, and websockets
- [EasyOCR](https://github.com/JaidedAI/EasyOCR): install with `pip install owocr[easyocr]` ("e" key)
+- Pause/unpause with `p` or terminate with `t`/`q` in the terminal, switch between engines with `s` or the engine-specific keys (from the engine list below)
- [RapidOCR](https://github.com/RapidAI/RapidOCR): install with `pip install owocr[rapidocr]` ("r" key)
+- Capture from specific screen areas, windows, of areas within windows (window capture is only supported on Windows/macOS). This also tries to capture entire sentences and filter all repetitions. If you use an online engine like Lens I recommend setting a secondary local engine with the `-es` option: `-es=oneocr` on Windows and `-es=alivetext` on macOS. With this "two pass" system only the changed areas are sent to the online service, allowing for both speed and accuracy
- Apple Vision framework: this will work on macOS Ventura or later. In my experience, the best of the local providers for horizontal text ("a" key)
+- Multiple configurable keyboard combinations to control owocr from anywhere, including pausing, switching engines, taking a screenshot of the selected screen/window and running the automatic tool to re-select an area of the screen/window via drag and drop
- Apple Live Text (VisionKit framework): this will work on macOS Ventura or later. It should be the same as Vision except that in Sonoma Apple added vertical text reading ("d" key)
+- Read from a unix domain socket `/tmp/owocr.sock` on macOS/Linux
- WinRT OCR: install with `pip install owocr[winocr]` on Windows 10 and later. It can also be used by installing winocr on a Windows virtual machine and running the server there (`winocr_serve`) and specifying the IP address of the Windows VM/machine in the config file ("w" key)
+- Furigana filter, works by default with Japanese text (both vertical and horizontal)
 - OneOCR: install with `pip install owocr[oneocr]` on Windows 10 and later. In my experience it's pretty good, though not as much as the Apple one. You need to copy 3 system files from Windows 11 to use it, refer to the readme [here](https://github.com/AuroraWright/oneocr). It can also be used by installing oneocr on a Windows virtual machine and running the server there (`oneocr_serve`) and specifying the IP address of the Windows VM/machine in the config file ("z" key)
-## Cloud providers
+## Common option examples
- Google Lens: Google Vision in disguise (no need for API keys!), install with `pip install owocr[lens]` ("l" key)
+
- Bing: Azure in disguise (no need for API keys!) ("b" key)
+- Write text to a file: `owocr -w=<txt file path>`
- Google Vision: install with `pip install owocr[gvision]`, you also need a service account .json file named google_vision.json in `user directory/.config/` ("g" key)
+- Read images from a folder: `owocr -r=<folder path>`
- Azure Image Analysis: install with `pip install owocr[azure]`, you also need to specify an api key and an endpoint in the config file ("v" key)
+- Write text to a websocket: `owocr -w=websocket`
- OCRSpace: you need to specify an api key in the config file ("o" key)
+- Read from the screen or a portion of the screen (opens the automatic drag and drop selector): `owocr -r=screencapture`
 - Read from a window having "Notepad" in the title: `owocr -r=screencapture -sa=Notepad`
 - Read from a portion of a window having "Notepad" in the title (opens the automatic drag and drop selector): `owocr -r=screencapture -sa=Notepad -swa`
 ## Configuration
 There are many more options and customization features. For complete documentation of all available settings:
 - View all command-line options and their descriptions: `owocr -h`
 - Check the automatically generated config file at `~/.config/owocr_config.ini` on Linux/macOS, or `C:\Users\yourusername\.config\owocr_config.ini` on Windows
 - See a sample config file: [owocr_config.ini](https://raw.githubusercontent.com/AuroraWright/owocr/master/owocr_config.ini)
 The command-line options/config file allow you to configure OCR providers, hotkeys, screen capture settings, notifications, and much more.
 # Supported engines
 ## Local
 - [Manga OCR](https://github.com/kha-white/manga-ocr) - install: `pip install owocr[mangaocr]` → key: `m`
 - [EasyOCR](https://github.com/JaidedAI/EasyOCR) - install: `pip install owocr[easyocr]` → key: `e`
 - [RapidOCR](https://github.com/RapidAI/RapidOCR) - install: `pip install owocr[rapidocr]` → key: `r`
 - Apple Vision framework - Probably the best local engine to date. **macOS only - Recommended (pre-installed)** → key: `a`
 - Apple Live Text (VisionKit framework) - It should be the same as Vision except that in Sonoma Apple added vertical text reading. **macOS only - Recommended (pre-installed)** → key: `d`
 - WinRT OCR: install: `pip install owocr[winocr]`. It can also be used by installing winocr on a Windows virtual machine and running the server there (`winocr_serve`) and specifying the IP address of the Windows VM/machine in the config file. **Windows 10/11 only** → key: `w`
 - OneOCR - install: `pip install owocr[oneocr]`. Close second local best to the Apple one. You need to copy 3 system files from Windows 11 to use it, refer to the readme [here](https://github.com/AuroraWright/oneocr). It can also be used by installing oneocr on a Windows virtual machine and running the server there (`oneocr_serve`) and specifying the IP address of the Windows VM/machine in the config file. **Windows 10/11 only - Recommended** → key: `z`
 ## Cloud
 - Google Lens - install: `pip install owocr[lens]`. Arguably the best OCR engine to date. **Recommended** → key: `l`
 - Bing - Close second best. **Recommended (pre-installed)** → key: `b`
 - Google Vision: install: `pip install owocr[gvision]`, you also need a service account .json file named google_vision.json in `user directory/.config/` → key: `g`
 - Azure Image Analysis: install: `pip install owocr[azure]`, you also need to specify an api key and an endpoint in the config file → key: `v`
 - OCRSpace: you need to specify an api key in the config file → key: `o`
 # Acknowledgments
-This uses code from/references these projects:
+This uses code from/references these people/projects:
 - Viola for working on the Google Lens implementation (twice!) and helping with the pyobjc VisionKit code!
 - [google-lens-ocr](https://github.com/dimdenGD/chrome-lens-ocr) for additional Lens reverse engineering and the headers/URL parameters I currently use
 - @ronaldoussoren for helping with the pyobjc VisionKit code
 - @bropines for the Bing code ([Github issue](https://github.com/AuroraWright/owocr/issues/10))
- [Manga OCR](https://github.com/kha-white/manga-ocr)
+- [Manga OCR](https://github.com/kha-white/manga-ocr) for inspiring and being the project owocr was originally derived from
 - [ocrmac](https://github.com/straussmaximilian/ocrmac) for the Apple Vision framework API
 - [NadeOCR](https://github.com/Natsume-197/NadeOCR) for the Google Vision API
- [ccylin2000_lipboard_monitor](https://github.com/vaimalaviya1233/ccylin2000_lipboard_monitor) for the Windows clipboard polling code
+- [ccylin2000_lipboard_monitor](https://github.com/vaimalaviya1233/ccylin2000_lipboard_monitor) for the Windows clipboard polling code
--- a/owocr/config.py
+++ b/owocr/config.py
@@ -27,7 +27,7 @@ parser.add_argument('-w', '--write_to', type=str, default=argparse.SUPPRESS,
 parser.add_argument('-e', '--engine', type=str, default=argparse.SUPPRESS,
                    help='OCR engine to use. Available: "mangaocr", "glens", "bing", "gvision", "avision", "alivetext", "azure", "winrtocr", "oneocr", "easyocr", "rapidocr", "ocrspace".')
 parser.add_argument('-es', '--engine_secondary', type=str, default=argparse.SUPPRESS,
-                    help='OCR engine to use for two-pass processing.')
+                    help='Local OCR engine to use for two-pass screen capture processing.')
 parser.add_argument('-p', '--pause_at_startup', type=str2bool, nargs='?', const=True, default=argparse.SUPPRESS,
                    help='Pause at startup.')
 parser.add_argument('-d', '--delete_images', type=str2bool, nargs='?', const=True, default=argparse.SUPPRESS,
--- a/owocr/run.py
+++ b/owocr/run.py
@@ -254,16 +254,24 @@ class WebsocketServerThread(threading.Thread):
        return asyncio.run_coroutine_threadsafe(self.send_text_coroutine(text), self.loop)
    def stop_server(self):
-        self.loop.call_soon_threadsafe(self._stop_event.set)
+        try:
            self.loop.call_soon_threadsafe(self._stop_event.set)
        except RuntimeError:
            pass
    def run(self):
        async def main():
            self._loop = asyncio.get_running_loop()
            self._stop_event = stop_event = asyncio.Event()
            self._event.set()
-            self.server = start_server = websockets.serve(self.server_handler, '0.0.0.0', config.get_general('websocket_port'), max_size=1000000000)
+            websocket_port = config.get_general('websocket_port')
-            async with start_server:
+            self.server = start_server = websockets.serve(self.server_handler, '0.0.0.0', websocket_port, max_size=1000000000)
-                await stop_event.wait()
+            try:
                async with start_server:
                    await stop_event.wait()
            except OSError:
                logger.error(f"Couldn't start websocket server. Make sure port {websocket_port} is not already in use")
                terminate_handler()
        asyncio.run(main())
@@ -387,7 +395,7 @@ class TextFiltering:
            if changed_lines_count and not self.json_output:
                changed_regions_image = self._create_changed_regions_image(pil_image, changed_lines, None, None)
                if not changed_regions_image:
-                    logger.warning('Error occurred while creating the differential image.')
+                    logger.warning('Error occurred while creating the differential image')
                    return 0, 0, None
                return changed_lines_count, 0, changed_regions_image
            else:
@@ -424,7 +432,7 @@ class TextFiltering:
                    changed_regions_image = self._create_changed_regions_image(pil_image, changed_lines, None, None)
                if not changed_regions_image:
-                    logger.warning('Error occurred while creating the differential image.')
+                    logger.warning('Error occurred while creating the differential image')
                    return 0, 0, None
                return changed_lines_count, recovered_lines_count, changed_regions_image
            else:
@@ -876,7 +884,6 @@ class ScreenshotThread(threading.Thread):
        else:
            self.screen_capture_only_active_windows = config.get_general('screen_capture_only_active_windows')
            self.window_area_coordinates = None
            area_invalid_error = '"screen_capture_area" must be empty, "screen_N" where N is a screen number starting from 1, a valid set of coordinates, or a valid window name'
            if sys.platform == 'darwin':
                if config.get_general('screen_capture_old_macos_api') or int(platform.mac_ver()[0].split('.')[0]) < 14:
@@ -906,7 +913,7 @@ class ScreenshotThread(threading.Thread):
                            break
                if not window_index:
-                    logger.error(area_invalid_error)
+                    logger.error('"screen_capture_area" must be empty, "screen_N" where N is a screen number starting from 1, a valid set of coordinates, or a valid window name')
                    sys.exit(1)
                self.window_id = window_ids[window_index]
@@ -920,7 +927,7 @@ class ScreenshotThread(threading.Thread):
                self.window_handle, window_title = self.get_windows_window_handle(screen_capture_area)
                if not self.window_handle:
-                    logger.error(area_invalid_error)
+                    logger.error('"screen_capture_area" must be empty, "screen_N" where N is a screen number starting from 1, a valid set of coordinates, or a valid window name')
                    sys.exit(1)
                ctypes.windll.shcore.SetProcessDpiAwareness(2)
@@ -1248,7 +1255,7 @@ class ScreenshotThread(threading.Thread):
            self.write_result(img, is_combo)
            if img == False:
-                logger.info('The window was closed or an error occurred.')
+                logger.info('The window was closed or an error occurred')
                terminate_handler()
                break
@@ -1597,53 +1604,63 @@ def run():
    if config.has_config:
        logger.info('Parsed config file')
    else:
-        logger.warning('No config file, defaults will be used.')
+        logger.warning('No config file, defaults will be used')
        if config.downloaded_config:
            logger.info(f'A default config file has been downloaded to {config.config_path}')
    global engine_instances
    global engine_keys
    output_format = config.get_general('output_format')
    engines_setting = config.get_general('engines')
    default_engine_setting = config.get_general('engine')
    secondary_engine_setting = config.get_general('engine_secondary')
    language = config.get_general('language')
    engine_instances = []
    config_engines = []
    engine_keys = []
    default_engine = ''
    engine_secondary = ''
-    if len(config.get_general('engines')) > 0:
+    if len(engines_setting) > 0:
-        for config_engine in config.get_general('engines').split(','):
+        for config_engine in engines_setting.split(','):
            config_engines.append(config_engine.strip().lower())
    for _,engine_class in sorted(inspect.getmembers(sys.modules[__name__], lambda x: hasattr(x, '__module__') and x.__module__ and __package__ + '.ocr' in x.__module__ and inspect.isclass(x) and hasattr(x, 'name'))):
        if len(config_engines) == 0 or engine_class.name in config_engines:
            if output_format == 'json' and not engine_class.coordinate_support:
-                logger.warning(f"Skipping {engine_class.readable_name} as it does not support JSON output.")
+                logger.warning(f"Skipping {engine_class.readable_name} as it does not support JSON output")
                continue
            if config.get_engine(engine_class.name) == None:
                if engine_class.manual_language:
-                    engine_instance = engine_class(language=config.get_general('language'))
+                    engine_instance = engine_class(language=language)
                else:
                    engine_instance = engine_class()
            else:
                if engine_class.manual_language:
-                    engine_instance = engine_class(config=config.get_engine(engine_class.name), language=config.get_general('language'))
+                    engine_instance = engine_class(config=config.get_engine(engine_class.name), language=language)
                else:
                    engine_instance = engine_class(config=config.get_engine(engine_class.name))
            if engine_instance.available:
                engine_instances.append(engine_instance)
                engine_keys.append(engine_class.key)
-                if config.get_general('engine') == engine_class.name:
+                if default_engine_setting == engine_class.name:
                    default_engine = engine_class.key
-                if config.get_general('engine_secondary') == engine_class.name and engine_class.local and engine_class.coordinate_support:
+                if secondary_engine_setting == engine_class.name and engine_class.local and engine_class.coordinate_support:
                    engine_secondary = engine_class.key
    if len(engine_keys) == 0:
        logger.error('No engines available!')
        sys.exit(1)
    if default_engine_setting and not default_engine:
        logger.warning("Couldn't find selected engine, using the first one in the list")
    if secondary_engine_setting and not engine_secondary:
        logger.warning("Couldn't find selected secondary engine, make sure it's enabled, local and has JSON format support. Disabling two pass processing")
    global engine_index
    global engine_index_2
    global terminated
@@ -1765,7 +1782,8 @@ def run():
    user_input_thread = threading.Thread(target=user_input_thread_run, daemon=True)
    user_input_thread.start()
-    logger.opt(colors=True).info(f"Reading from {' and '.join(read_from_readable)}, writing to {write_to_readable} using <{engine_color}>{engine_instances[engine_index].readable_name}</{engine_color}>{' (paused)' if paused.is_set() else ''}")
+    if not terminated.is_set():
        logger.opt(colors=True).info(f"Reading from {' and '.join(read_from_readable)}, writing to {write_to_readable} using <{engine_color}>{engine_instances[engine_index].readable_name}</{engine_color}>{' (paused)' if paused.is_set() else ''}")
    while not terminated.is_set():
        img = None
--- a/owocr_config.ini
+++ b/owocr_config.ini
@@ -28,7 +28,7 @@
 ;Delete image files after processing when reading from a directory.
 ;delete_images = False
-;Available:
+;Restricts engines to load. Available:
 ;avision,alivetext,bing,glens,gvision,azure,mangaocr,winrtocr,oneocr,easyocr,rapidocr,ocrspace
 ;engines = avision,alivetext,bing,glens,gvision,azure,mangaocr,winrtocr,oneocr,easyocr,rapidocr,ocrspace