Use manga_ocr as library, separate the projects, add some documentation

2023-12-16 04:34:19 +01:00
parent c0826b1837
commit 2b208b9288
9 changed files with 75 additions and 192 deletions
--- a/README.md
+++ b/README.md
@@ -1,140 +1,37 @@
-# Manga OCR
+# OwOCR

-Optical character recognition for Japanese text, with the main focus being Japanese manga.
-It uses a custom end-to-end model built with Transformers' [Vision Encoder Decoder](https://huggingface.co/docs/transformers/model_doc/vision-encoder-decoder) framework. 
-
-Manga OCR can be used as a general purpose printed Japanese OCR, but its main goal was to provide a high quality
-text recognition, robust against various scenarios specific to manga:
- both vertical and horizontal text
- text with furigana
- text overlaid on images
- wide variety of fonts and font styles
- low quality images
-
-Unlike many OCR models, Manga OCR supports recognizing multi-line text in a single forward pass,
-so that text bubbles found in manga can be processed at once, without splitting them into lines.
-
-See also:
- [Poricom](https://github.com/bluaxees/Poricom), a GUI reader, which uses manga-ocr
- [mokuro](https://github.com/kha-white/mokuro), a tool, which uses manga-ocr to generate an HTML overlay for manga
- [Xelieu's guide](https://rentry.co/lazyXel), a comprehensive guide on setting up a reading and mining workflow with manga-ocr/mokuro (and many other useful tips)
- Development code, including code for training and synthetic data generation: [link](manga_ocr_dev)
- Description of synthetic data generation pipeline + examples of generated images: [link](manga_ocr_dev/synthetic_data_generator)
+Command line client for several Japanese OCR providers derived from [Manga OCR](https://github.com/kha-white/manga-ocr).

 # Installation

-You need Python 3.8, 3.9, 3.10 or 3.11.
+This has been tested with Python 3.11. Newer/older versions might work. For now it can be installed with `pip install https://github.com/AuroraWright/owocr/archive/master.zip`

-If you want to run with GPU, install PyTorch as described [here](https://pytorch.org/get-started/locally/#start-locally),
-otherwise this step can be skipped.
+# Supported providers

-Run in command line:
+## Local providers
+- [Manga OCR](https://github.com/kha-white/manga-ocr): refer to the readme for installation ("m" key)
+- [EasyOCR](https://github.com/JaidedAI/EasyOCR): refer to the readme for installation ("e" key)
+- [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR): refer to the [wiki](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/doc/doc_en/quickstart_en.md) for installation ("o" key)
+- Apple Vision framework: this will work on macOS Ventura or later if pyobjc (`pip install pyobjc`) is installed. In my experience, the best of the local providers for horizontal text ("a" key)
+- WinRT OCR: this will work on Windows 10 or later if winocr (`pip install winocr`) is installed. It can also be used by installing winocr on a Windows virtual machine and running the server (`winocr_serve`), installing requests (`pip install requests`) and specifying the IP address of the Windows VM/machine in the config file (see below) ("w" key)

-```commandline
-pip3 install manga-ocr
-```
-
-## Troubleshooting
-
- `ImportError: DLL load failed while importing fugashi: The specified module could not be found.` - might be because of Python installed from Microsoft Store, try installing Python from the [official site](https://www.python.org/downloads)
- problem with installing `mecab-python3` on ARM architecture - try [this workaround](https://github.com/kha-white/manga-ocr/issues/16)
+## Cloud providers
+- Google Vision: you need a service account .json file named google_vision.json in `user directory/.config/` and installing google-cloud-vision (`pip install google-cloud-vision`)
+- Azure Computer Vision: you need to specify an api key and an endpoint in the config file (see below) and to install azure-cognitiveservices-vision-computervision (`pip install azure-cognitiveservices-vision-computervision`)

 # Usage

-## Python API
-
-```python
-from manga_ocr import MangaOcr
-
-mocr = MangaOcr()
-text = mocr('/path/to/img')
-```
-
-or
-
-```python
-import PIL.Image
-
-from manga_ocr import MangaOcr
-
-mocr = MangaOcr()
-img = PIL.Image.open('/path/to/img')
-text = mocr(img)
-```
-
-## Running in the background
-
-Manga OCR can run in the background and process new images as they appear.
-
-You might use a tool like [ShareX](https://getsharex.com/) or [Flameshot](https://flameshot.org/) to manually capture a region of the screen and let the
-OCR read it either from the system clipboard, or a specified directory. By default, Manga OCR will write recognized text to clipboard,
-from which it can be read by a dictionary like [Yomichan](https://github.com/FooSoft/yomichan).
-
-Clipboard mode on Linux requires `wl-copy` for Wayland sessions or `xclip` for X11 sessions. You can find out which one your system needs by running `echo $XDG_SESSION_TYPE` in the terminal.
-
-Your full setup for reading manga in Japanese with a dictionary might look like this:
-
-capture region with ShareX -> write image to clipboard -> Manga OCR -> write text to clipboard -> Yomichan
-
-https://user-images.githubusercontent.com/22717958/150238361-052b95d1-0152-485f-a441-48a957536239.mp4
-
- To read images from clipboard and write recognized texts to clipboard, run in command line:
-    ```commandline
-    manga_ocr
-    ```
- To read images from ShareX's screenshot folder, run in command line:
-    ```commandline
-    manga_ocr "/path/to/sharex/screenshot/folder"
-    ```
-Note that when running in the clipboard scanning mode, any image that you copy to clipboard will be processed by OCR and replaced
-by recognized text. If you want to be able to copy and paste images as usual, you should use the folder scanning mode instead
-and define a separate task in ShareX just for OCR, which saves screenshots to some folder without copying them to clipboard.
-
-When running for the first time, downloading the model (~400 MB) might take a few minutes.
-The OCR is ready to use after `OCR ready` message appears in the logs.
-
- To see other options, run in command line:
-    ```commandline
-    manga_ocr --help
-    ```
-
-If `manga_ocr` doesn't work, you might also try replacing it with `python -m manga_ocr`.
-
-## Usage tips
-
- OCR supports multi-line text, but the longer the text, the more likely some errors are to occur.
-  If the recognition failed for some part of a longer text, you might try to run it on a smaller portion of the image.
- The model was trained specifically to handle manga well, but should do a decent job on other types of printed text,
-  such as novels or video games. It probably won't be able to handle handwritten text though. 
- The model always attempts to recognize some text on the image, even if there is none.
-  Because it uses a transformer decoder (and therefore has some understanding of the Japanese language),
-  it might even "dream up" some realistically looking sentences! This shouldn't be a problem for most use cases,
-  but it might get improved in the next version.
-
-# Examples
-
-Here are some cherry-picked examples showing the capability of the model. 
-
-| image                | Manga OCR result |
-|----------------------|------------------|
-| ![](assets/examples/00.jpg) | 素直にあやまるしか |
-| ![](assets/examples/01.jpg) | 立川で見た〝穴〟の下の巨大な眼は： |
-| ![](assets/examples/02.jpg) | 実戦剣術も一流です |
-| ![](assets/examples/03.jpg) | 第３０話重苦しい闇の奥で静かに呼吸づきながら |
-| ![](assets/examples/04.jpg) | よかったじゃないわよ！何逃げてるのよ！！早くあいつを退治してよ！ |
-| ![](assets/examples/05.jpg) | ぎゃっ |
-| ![](assets/examples/06.jpg) | ピンポーーン |
-| ![](assets/examples/07.jpg) | ＬＩＮＫ！私達７人の力でガノンの塔の結界をやぶります |
-| ![](assets/examples/08.jpg) | ファイアパンチ |
-| ![](assets/examples/09.jpg) | 少し黙っている |
-| ![](assets/examples/10.jpg) | わかるかな〜？ |
-| ![](assets/examples/11.jpg) | 警察にも先生にも町中の人達に！！ |
-
-# Contact
-For any inquiries, please feel free to contact me at kha-white@mail.com
+It mostly functions like Manga OCR: https://github.com/kha-white/manga-ocr?tab=readme-ov-file#running-in-the-background
+However:
+- you can pause/unpause the clipboard image processing by pressing "p" or terminate the script with "t" or "q"
+- you can switch OCR provider with its corresponding keyboard key (refer to the list above). You can also start the script paused with the -p option or with a specific provider with the -e option (refer to `owocr -h` for the list)
+- holding ctrl or cmd at any time will pause the clipboard image processing temporarily
+- for systems where text can be copied to the clipboard at the same time as images, if `*ocr_ignore*` is copied with an image, the image will be ignored
+- a config file (located in `user directory/.config/owocr_config.ini`) can be used to limit providers (to reduce clutter/memory usage) as well as specifying provider settings such as api keys etc (a sample config file is provided)

 # Acknowledgments

-This project was done with the usage of:
- [Manga109-s](http://www.manga109.org/en/download_s.html) dataset
- [CC-100](https://data.statmt.org/cc-100/) dataset
+This uses code from/references these projects:
+- [Manga OCR](https://github.com/kha-white/manga-ocr)
+- [ocrmac](https://github.com/straussmaximilian/ocrmac) for the Apple Vision framework API
+- [NadeOCR](https://github.com/Natsume-197/NadeOCR) for the Google Vision API
--- a/manga_ocr/init.py
+++ b/manga_ocr/init.py
@@ -1,9 +0,0 @@
-__version__ = '0.1.10'
-
-from manga_ocr.ocr import MangaOcr
-from manga_ocr.ocr import GoogleVision
-from manga_ocr.ocr import AppleVision
-from manga_ocr.ocr import WinRTOCR
-from manga_ocr.ocr import AzureComputerVision
-from manga_ocr.ocr import EasyOCR
-from manga_ocr.ocr import PaddleOCR
--- a/owocr/init.py
+++ b/owocr/init.py
@@ -0,0 +1,9 @@
+__version__ = '0.1.10'
+
+from owocr.ocr import MangaOcr
+from owocr.ocr import GoogleVision
+from owocr.ocr import AppleVision
+from owocr.ocr import WinRTOCR
+from owocr.ocr import AzureComputerVision
+from owocr.ocr import EasyOCR
+from owocr.ocr import PaddleOCR
--- a/manga_ocr/main.py
+++ b/manga_ocr/main.py
@@ -1,6 +1,6 @@
 import fire

-from manga_ocr.run import run
+from owocr.run import run


 def main():
--- a/manga_ocr/ocr.py
+++ b/manga_ocr/ocr.py
@@ -8,12 +8,15 @@ import sys
 import platform

 import jaconv
-import torch
 import numpy as np
 import json
 from PIL import Image
 from loguru import logger
-from transformers import ViTImageProcessor, AutoTokenizer, VisionEncoderDecoderModel
+
+try:
+    from manga_ocr import MangaOcr as MOCR
+except ImportError:
+    pass

 try:
    import Vision
@@ -68,29 +71,21 @@ class MangaOcr:
    name = "mangaocr"
    readable_name = "Manga OCR"
    key = "m"
-    available = True
+    available = False

    def __init__(self, config={'pretrained_model_name_or_path':'kha-white/manga-ocr-base','force_cpu':'False'}, pretrained_model_name_or_path='', force_cpu=False):
+        if 'manga_ocr' not in sys.modules:
+            logger.warning('manga-ocr not available, Manga OCR will not work!')
+        else:
            if pretrained_model_name_or_path == '':
                pretrained_model_name_or_path = config['pretrained_model_name_or_path']
            if config['force_cpu'] == 'True':
                force_cpu = True

-        logger.info(f'Loading Manga OCR model from {pretrained_model_name_or_path}')
-        self.processor = ViTImageProcessor.from_pretrained(pretrained_model_name_or_path)
-        self.tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path)
-        self.model = VisionEncoderDecoderModel.from_pretrained(pretrained_model_name_or_path)
-
-        if not force_cpu and torch.cuda.is_available():
-            logger.info('Using CUDA')
-            self.model.cuda()
-        elif not force_cpu and torch.backends.mps.is_available():
-            logger.info('Using MPS')
-            warnings.filterwarnings("ignore", message=".*MPS: no support.*")
-            self.model.to('mps')
-        else:
-            logger.info('Using CPU')
-
+            logger.disable("manga_ocr")
+            logger.info(f'Loading Manga OCR model')
+            self.model = MOCR(pretrained_model_name_or_path, force_cpu)
+            self.available = True
            logger.info('Manga OCR ready')

    def __call__(self, img_or_path):
@@ -101,18 +96,9 @@ class MangaOcr:
        else:
            raise ValueError(f'img_or_path must be a path or PIL.Image, instead got: {img_or_path}')

-        img = img.convert('L').convert('RGB')
-
-        x = self._preprocess(img)
-        x = self.model.generate(x[None].to(self.model.device), max_length=300)[0].cpu()
-        x = self.tokenizer.decode(x, skip_special_tokens=True)
-        x = post_process(x)
+        x = self.model(img)
        return x

-    def _preprocess(self, img):
-        pixel_values = self.processor(img, return_tensors="pt").pixel_values
-        return pixel_values.squeeze()
-
 class GoogleVision:
    name = "gvision"
    readable_name = "Google Vision"
--- a/manga_ocr/run.py
+++ b/manga_ocr/run.py
@@ -14,7 +14,7 @@ from loguru import logger
 from pynput import keyboard

 import inspect
-from manga_ocr import *
+from owocr import *


 def are_images_identical(img1, img2):
@@ -131,7 +131,7 @@ def run(read_from='clipboard',
    default_engine = ''

    logger.info(f'Parsing config file')
-    config_file = os.path.join(os.path.expanduser('~'),'.config','ocr_config.ini')
+    config_file = os.path.join(os.path.expanduser('~'),'.config','owocr_config.ini')
    config = configparser.ConfigParser()
    res = config.read(config_file)

@@ -139,7 +139,7 @@ def run(read_from='clipboard',
        logger.warning('No config file, defaults will be used')
    else:
        try:
-            for config_engine in config['common']['engines'].split(','):
+            for config_engine in config['general']['engines'].split(','):
                config_engines.append(config_engine.strip())
        except KeyError:
            pass
--- a/owocr_config.ini
+++ b/owocr_config.ini
@@ -0,0 +1,10 @@
+[general]
+; engines = avision,mangaocr
+[winrtocr]
+; url = http://aaa.xxx.yyy.zzz:8000
+[azure]
+; api_key = api_key_here
+; endpoint = https://YOURPROJECT.cognitiveservices.azure.com/
+[mangaocr]
+pretrained_model_name_or_path = kha-white/manga-ocr-base
+force_cpu = False
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,15 +1,8 @@
 fire
-fugashi
 jaconv
 loguru
 numpy
 Pillow>=10.0.0
 pyperclip
-torch>=1.0
-transformers>=4.25.0
 unidic_lite
-google-cloud-vision
-azure-cognitiveservices-vision-computervision
-pynput
-easyocr
-paddleocr
+pyinput
--- a/setup.py
+++ b/setup.py
@@ -4,35 +4,32 @@ from setuptools import setup
 long_description = (Path(__file__).parent / "README.md").read_text('utf-8').split('# Installation')[0]

 setup(
-    name="manga-ocr",
-    version='0.1.11',
-    description="OCR for Japanese manga",
+    name="owocr",
+    version='0.1',
+    description="Japanese OCR",
    long_description=long_description,
    long_description_content_type="text/markdown",
-    url="https://github.com/kha-white/manga-ocr",
-    author="Maciej Budyś",
-    author_email="kha-white@mail.com",
+    url="https://github.com/AuroraWright/owocr",
+    author="AuroraWright",
+    author_email="fallingluma@gmail.com",
    license="Apache License 2.0",
    classifiers=[
        "Programming Language :: Python :: 3",
    ],
-    packages=['manga_ocr'],
+    packages=['owocr'],
    include_package_data=True,
    install_requires=[
        "fire",
-        "fugashi",
        "jaconv",
        "loguru",
        "numpy",
        "Pillow>=10.0.0",
        "pyperclip",
-        "torch>=1.0",
-        "transformers>=4.25.0",
-        "unidic_lite",
+        "unidic_lite"
    ],
    entry_points={
        "console_scripts": [
-            "manga_ocr=manga_ocr.__main__:main",
+            "owocr=owocr.__main__:main",
        ]
    },
 )