Use manga_ocr as library, separate the projects, add some documentation

2023-12-16 04:34:19 +01:00
parent c0826b1837
commit 2b208b9288
9 changed files with 75 additions and 192 deletions
--- a/README.md
+++ b/README.md
@@ -1,140 +1,37 @@
-# Manga OCR
+# OwOCR
-Optical character recognition for Japanese text, with the main focus being Japanese manga.
+Command line client for several Japanese OCR providers derived from [Manga OCR](https://github.com/kha-white/manga-ocr).
 It uses a custom end-to-end model built with Transformers' [Vision Encoder Decoder](https://huggingface.co/docs/transformers/model_doc/vision-encoder-decoder) framework. 
 Manga OCR can be used as a general purpose printed Japanese OCR, but its main goal was to provide a high quality
 text recognition, robust against various scenarios specific to manga:
 - both vertical and horizontal text
 - text with furigana
 - text overlaid on images
 - wide variety of fonts and font styles
 - low quality images
 Unlike many OCR models, Manga OCR supports recognizing multi-line text in a single forward pass,
 so that text bubbles found in manga can be processed at once, without splitting them into lines.
 See also:
 - [Poricom](https://github.com/bluaxees/Poricom), a GUI reader, which uses manga-ocr
 - [mokuro](https://github.com/kha-white/mokuro), a tool, which uses manga-ocr to generate an HTML overlay for manga
 - [Xelieu's guide](https://rentry.co/lazyXel), a comprehensive guide on setting up a reading and mining workflow with manga-ocr/mokuro (and many other useful tips)
 - Development code, including code for training and synthetic data generation: [link](manga_ocr_dev)
 - Description of synthetic data generation pipeline + examples of generated images: [link](manga_ocr_dev/synthetic_data_generator)
 # Installation
-You need Python 3.8, 3.9, 3.10 or 3.11.
+This has been tested with Python 3.11. Newer/older versions might work. For now it can be installed with `pip install https://github.com/AuroraWright/owocr/archive/master.zip`
-If you want to run with GPU, install PyTorch as described [here](https://pytorch.org/get-started/locally/#start-locally),
+# Supported providers
 otherwise this step can be skipped.
-Run in command line:
+## Local providers
 - [Manga OCR](https://github.com/kha-white/manga-ocr): refer to the readme for installation ("m" key)
 - [EasyOCR](https://github.com/JaidedAI/EasyOCR): refer to the readme for installation ("e" key)
 - [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR): refer to the [wiki](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/doc/doc_en/quickstart_en.md) for installation ("o" key)
 - Apple Vision framework: this will work on macOS Ventura or later if pyobjc (`pip install pyobjc`) is installed. In my experience, the best of the local providers for horizontal text ("a" key)
 - WinRT OCR: this will work on Windows 10 or later if winocr (`pip install winocr`) is installed. It can also be used by installing winocr on a Windows virtual machine and running the server (`winocr_serve`), installing requests (`pip install requests`) and specifying the IP address of the Windows VM/machine in the config file (see below) ("w" key)
-```commandline
+## Cloud providers
-pip3 install manga-ocr
+- Google Vision: you need a service account .json file named google_vision.json in `user directory/.config/` and installing google-cloud-vision (`pip install google-cloud-vision`)
-```
+- Azure Computer Vision: you need to specify an api key and an endpoint in the config file (see below) and to install azure-cognitiveservices-vision-computervision (`pip install azure-cognitiveservices-vision-computervision`)
 ## Troubleshooting
 - `ImportError: DLL load failed while importing fugashi: The specified module could not be found.` - might be because of Python installed from Microsoft Store, try installing Python from the [official site](https://www.python.org/downloads)
 - problem with installing `mecab-python3` on ARM architecture - try [this workaround](https://github.com/kha-white/manga-ocr/issues/16)
 # Usage
-## Python API
+It mostly functions like Manga OCR: https://github.com/kha-white/manga-ocr?tab=readme-ov-file#running-in-the-background
-
+However:
-```python
+- you can pause/unpause the clipboard image processing by pressing "p" or terminate the script with "t" or "q"
-from manga_ocr import MangaOcr
+- you can switch OCR provider with its corresponding keyboard key (refer to the list above). You can also start the script paused with the -p option or with a specific provider with the -e option (refer to `owocr -h` for the list)
-
+- holding ctrl or cmd at any time will pause the clipboard image processing temporarily
-mocr = MangaOcr()
+- for systems where text can be copied to the clipboard at the same time as images, if `*ocr_ignore*` is copied with an image, the image will be ignored
-text = mocr('/path/to/img')
+- a config file (located in `user directory/.config/owocr_config.ini`) can be used to limit providers (to reduce clutter/memory usage) as well as specifying provider settings such as api keys etc (a sample config file is provided)
 ```
 or
 ```python
 import PIL.Image
 from manga_ocr import MangaOcr
 mocr = MangaOcr()
 img = PIL.Image.open('/path/to/img')
 text = mocr(img)
 ```
 ## Running in the background
 Manga OCR can run in the background and process new images as they appear.
 You might use a tool like [ShareX](https://getsharex.com/) or [Flameshot](https://flameshot.org/) to manually capture a region of the screen and let the
 OCR read it either from the system clipboard, or a specified directory. By default, Manga OCR will write recognized text to clipboard,
 from which it can be read by a dictionary like [Yomichan](https://github.com/FooSoft/yomichan).
 Clipboard mode on Linux requires `wl-copy` for Wayland sessions or `xclip` for X11 sessions. You can find out which one your system needs by running `echo $XDG_SESSION_TYPE` in the terminal.
 Your full setup for reading manga in Japanese with a dictionary might look like this:
 capture region with ShareX -> write image to clipboard -> Manga OCR -> write text to clipboard -> Yomichan
 https://user-images.githubusercontent.com/22717958/150238361-052b95d1-0152-485f-a441-48a957536239.mp4
 - To read images from clipboard and write recognized texts to clipboard, run in command line:
    ```commandline
    manga_ocr
    ```
 - To read images from ShareX's screenshot folder, run in command line:
    ```commandline
    manga_ocr "/path/to/sharex/screenshot/folder"
    ```
 Note that when running in the clipboard scanning mode, any image that you copy to clipboard will be processed by OCR and replaced
 by recognized text. If you want to be able to copy and paste images as usual, you should use the folder scanning mode instead
 and define a separate task in ShareX just for OCR, which saves screenshots to some folder without copying them to clipboard.
 When running for the first time, downloading the model (~400 MB) might take a few minutes.
 The OCR is ready to use after `OCR ready` message appears in the logs.
 - To see other options, run in command line:
    ```commandline
    manga_ocr --help
    ```
 If `manga_ocr` doesn't work, you might also try replacing it with `python -m manga_ocr`.
 ## Usage tips
 - OCR supports multi-line text, but the longer the text, the more likely some errors are to occur.
  If the recognition failed for some part of a longer text, you might try to run it on a smaller portion of the image.
 - The model was trained specifically to handle manga well, but should do a decent job on other types of printed text,
  such as novels or video games. It probably won't be able to handle handwritten text though. 
 - The model always attempts to recognize some text on the image, even if there is none.
  Because it uses a transformer decoder (and therefore has some understanding of the Japanese language),
  it might even "dream up" some realistically looking sentences! This shouldn't be a problem for most use cases,
  but it might get improved in the next version.
 # Examples
 Here are some cherry-picked examples showing the capability of the model. 
 | image                | Manga OCR result |
 |----------------------|------------------|
 | ![](assets/examples/00.jpg) | 素直にあやまるしか |
 | ![](assets/examples/01.jpg) | 立川で見た〝穴〟の下の巨大な眼は： |
 | ![](assets/examples/02.jpg) | 実戦剣術も一流です |
 | ![](assets/examples/03.jpg) | 第３０話重苦しい闇の奥で静かに呼吸づきながら |
 | ![](assets/examples/04.jpg) | よかったじゃないわよ！何逃げてるのよ！！早くあいつを退治してよ！ |
 | ![](assets/examples/05.jpg) | ぎゃっ |
 | ![](assets/examples/06.jpg) | ピンポーーン |
 | ![](assets/examples/07.jpg) | ＬＩＮＫ！私達７人の力でガノンの塔の結界をやぶります |
 | ![](assets/examples/08.jpg) | ファイアパンチ |
 | ![](assets/examples/09.jpg) | 少し黙っている |
 | ![](assets/examples/10.jpg) | わかるかな〜？ |
 | ![](assets/examples/11.jpg) | 警察にも先生にも町中の人達に！！ |
 # Contact
 For any inquiries, please feel free to contact me at kha-white@mail.com
 # Acknowledgments
-This project was done with the usage of:
+This uses code from/references these projects:
- [Manga109-s](http://www.manga109.org/en/download_s.html) dataset
+- [Manga OCR](https://github.com/kha-white/manga-ocr)
- [CC-100](https://data.statmt.org/cc-100/) dataset
+- [ocrmac](https://github.com/straussmaximilian/ocrmac) for the Apple Vision framework API
 - [NadeOCR](https://github.com/Natsume-197/NadeOCR) for the Google Vision API
--- a/manga_ocr/init.py
+++ b/manga_ocr/init.py
@@ -1,9 +0,0 @@
 __version__ = '0.1.10'
 from manga_ocr.ocr import MangaOcr
 from manga_ocr.ocr import GoogleVision
 from manga_ocr.ocr import AppleVision
 from manga_ocr.ocr import WinRTOCR
 from manga_ocr.ocr import AzureComputerVision
 from manga_ocr.ocr import EasyOCR
 from manga_ocr.ocr import PaddleOCR
--- a/owocr/init.py
+++ b/owocr/init.py
@@ -0,0 +1,9 @@
 __version__ = '0.1.10'
 from owocr.ocr import MangaOcr
 from owocr.ocr import GoogleVision
 from owocr.ocr import AppleVision
 from owocr.ocr import WinRTOCR
 from owocr.ocr import AzureComputerVision
 from owocr.ocr import EasyOCR
 from owocr.ocr import PaddleOCR
--- a/manga_ocr/main.py
+++ b/manga_ocr/main.py
@@ -1,6 +1,6 @@
 import fire
-from manga_ocr.run import run
+from owocr.run import run
 def main():
--- a/manga_ocr/ocr.py
+++ b/manga_ocr/ocr.py
@@ -8,12 +8,15 @@ import sys
 import platform
 import jaconv
 import torch
 import numpy as np
 import json
 from PIL import Image
 from loguru import logger
-from transformers import ViTImageProcessor, AutoTokenizer, VisionEncoderDecoderModel
+
 try:
    from manga_ocr import MangaOcr as MOCR
 except ImportError:
    pass
 try:
    import Vision
@@ -68,29 +71,21 @@ class MangaOcr:
    name = "mangaocr"
    readable_name = "Manga OCR"
    key = "m"
-    available = True
+    available = False
    def __init__(self, config={'pretrained_model_name_or_path':'kha-white/manga-ocr-base','force_cpu':'False'}, pretrained_model_name_or_path='', force_cpu=False):
        if 'manga_ocr' not in sys.modules:
            logger.warning('manga-ocr not available, Manga OCR will not work!')
        else:
            if pretrained_model_name_or_path == '':
                pretrained_model_name_or_path = config['pretrained_model_name_or_path']
            if config['force_cpu'] == 'True':
                force_cpu = True
-        logger.info(f'Loading Manga OCR model from {pretrained_model_name_or_path}')
+            logger.disable("manga_ocr")
-        self.processor = ViTImageProcessor.from_pretrained(pretrained_model_name_or_path)
+            logger.info(f'Loading Manga OCR model')
-        self.tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path)
+            self.model = MOCR(pretrained_model_name_or_path, force_cpu)
-        self.model = VisionEncoderDecoderModel.from_pretrained(pretrained_model_name_or_path)
+            self.available = True
        if not force_cpu and torch.cuda.is_available():
            logger.info('Using CUDA')
            self.model.cuda()
        elif not force_cpu and torch.backends.mps.is_available():
            logger.info('Using MPS')
            warnings.filterwarnings("ignore", message=".*MPS: no support.*")
            self.model.to('mps')
        else:
            logger.info('Using CPU')
            logger.info('Manga OCR ready')
    def __call__(self, img_or_path):
@@ -101,18 +96,9 @@ class MangaOcr:
        else:
            raise ValueError(f'img_or_path must be a path or PIL.Image, instead got: {img_or_path}')
-        img = img.convert('L').convert('RGB')
+        x = self.model(img)
        x = self._preprocess(img)
        x = self.model.generate(x[None].to(self.model.device), max_length=300)[0].cpu()
        x = self.tokenizer.decode(x, skip_special_tokens=True)
        x = post_process(x)
        return x
    def _preprocess(self, img):
        pixel_values = self.processor(img, return_tensors="pt").pixel_values
        return pixel_values.squeeze()
 class GoogleVision:
    name = "gvision"
    readable_name = "Google Vision"
--- a/manga_ocr/run.py
+++ b/manga_ocr/run.py
@@ -14,7 +14,7 @@ from loguru import logger
 from pynput import keyboard
 import inspect
-from manga_ocr import *
+from owocr import *
 def are_images_identical(img1, img2):
@@ -131,7 +131,7 @@ def run(read_from='clipboard',
    default_engine = ''
    logger.info(f'Parsing config file')
-    config_file = os.path.join(os.path.expanduser('~'),'.config','ocr_config.ini')
+    config_file = os.path.join(os.path.expanduser('~'),'.config','owocr_config.ini')
    config = configparser.ConfigParser()
    res = config.read(config_file)
@@ -139,7 +139,7 @@ def run(read_from='clipboard',
        logger.warning('No config file, defaults will be used')
    else:
        try:
-            for config_engine in config['common']['engines'].split(','):
+            for config_engine in config['general']['engines'].split(','):
                config_engines.append(config_engine.strip())
        except KeyError:
            pass
--- a/owocr_config.ini
+++ b/owocr_config.ini
@@ -0,0 +1,10 @@
 [general]
 ; engines = avision,mangaocr
 [winrtocr]
 ; url = http://aaa.xxx.yyy.zzz:8000
 [azure]
 ; api_key = api_key_here
 ; endpoint = https://YOURPROJECT.cognitiveservices.azure.com/
 [mangaocr]
 pretrained_model_name_or_path = kha-white/manga-ocr-base
 force_cpu = False
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,15 +1,8 @@
 fire
 fugashi
 jaconv
 loguru
 numpy
 Pillow>=10.0.0
 pyperclip
 torch>=1.0
 transformers>=4.25.0
 unidic_lite
-google-cloud-vision
+pyinput
 azure-cognitiveservices-vision-computervision
 pynput
 easyocr
 paddleocr
--- a/setup.py
+++ b/setup.py
@@ -4,35 +4,32 @@ from setuptools import setup
 long_description = (Path(__file__).parent / "README.md").read_text('utf-8').split('# Installation')[0]
 setup(
-    name="manga-ocr",
+    name="owocr",
-    version='0.1.11',
+    version='0.1',
-    description="OCR for Japanese manga",
+    description="Japanese OCR",
    long_description=long_description,
    long_description_content_type="text/markdown",
-    url="https://github.com/kha-white/manga-ocr",
+    url="https://github.com/AuroraWright/owocr",
-    author="Maciej Budyś",
+    author="AuroraWright",
-    author_email="kha-white@mail.com",
+    author_email="fallingluma@gmail.com",
    license="Apache License 2.0",
    classifiers=[
        "Programming Language :: Python :: 3",
    ],
-    packages=['manga_ocr'],
+    packages=['owocr'],
    include_package_data=True,
    install_requires=[
        "fire",
        "fugashi",
        "jaconv",
        "loguru",
        "numpy",
        "Pillow>=10.0.0",
        "pyperclip",
-        "torch>=1.0",
+        "unidic_lite"
        "transformers>=4.25.0",
        "unidic_lite",
    ],
    entry_points={
        "console_scripts": [
-            "manga_ocr=manga_ocr.__main__:main",
+            "owocr=owocr.__main__:main",
        ]
    },
 )