diff --git a/README.md b/README.md index e6beaae..725d9e9 100644 --- a/README.md +++ b/README.md @@ -1,140 +1,37 @@ -# Manga OCR +# OwOCR -Optical character recognition for Japanese text, with the main focus being Japanese manga. -It uses a custom end-to-end model built with Transformers' [Vision Encoder Decoder](https://huggingface.co/docs/transformers/model_doc/vision-encoder-decoder) framework. - -Manga OCR can be used as a general purpose printed Japanese OCR, but its main goal was to provide a high quality -text recognition, robust against various scenarios specific to manga: -- both vertical and horizontal text -- text with furigana -- text overlaid on images -- wide variety of fonts and font styles -- low quality images - -Unlike many OCR models, Manga OCR supports recognizing multi-line text in a single forward pass, -so that text bubbles found in manga can be processed at once, without splitting them into lines. - -See also: -- [Poricom](https://github.com/bluaxees/Poricom), a GUI reader, which uses manga-ocr -- [mokuro](https://github.com/kha-white/mokuro), a tool, which uses manga-ocr to generate an HTML overlay for manga -- [Xelieu's guide](https://rentry.co/lazyXel), a comprehensive guide on setting up a reading and mining workflow with manga-ocr/mokuro (and many other useful tips) -- Development code, including code for training and synthetic data generation: [link](manga_ocr_dev) -- Description of synthetic data generation pipeline + examples of generated images: [link](manga_ocr_dev/synthetic_data_generator) +Command line client for several Japanese OCR providers derived from [Manga OCR](https://github.com/kha-white/manga-ocr). # Installation -You need Python 3.8, 3.9, 3.10 or 3.11. +This has been tested with Python 3.11. Newer/older versions might work. For now it can be installed with `pip install https://github.com/AuroraWright/owocr/archive/master.zip` -If you want to run with GPU, install PyTorch as described [here](https://pytorch.org/get-started/locally/#start-locally), -otherwise this step can be skipped. +# Supported providers -Run in command line: +## Local providers +- [Manga OCR](https://github.com/kha-white/manga-ocr): refer to the readme for installation ("m" key) +- [EasyOCR](https://github.com/JaidedAI/EasyOCR): refer to the readme for installation ("e" key) +- [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR): refer to the [wiki](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/doc/doc_en/quickstart_en.md) for installation ("o" key) +- Apple Vision framework: this will work on macOS Ventura or later if pyobjc (`pip install pyobjc`) is installed. In my experience, the best of the local providers for horizontal text ("a" key) +- WinRT OCR: this will work on Windows 10 or later if winocr (`pip install winocr`) is installed. It can also be used by installing winocr on a Windows virtual machine and running the server (`winocr_serve`), installing requests (`pip install requests`) and specifying the IP address of the Windows VM/machine in the config file (see below) ("w" key) -```commandline -pip3 install manga-ocr -``` - -## Troubleshooting - -- `ImportError: DLL load failed while importing fugashi: The specified module could not be found.` - might be because of Python installed from Microsoft Store, try installing Python from the [official site](https://www.python.org/downloads) -- problem with installing `mecab-python3` on ARM architecture - try [this workaround](https://github.com/kha-white/manga-ocr/issues/16) +## Cloud providers +- Google Vision: you need a service account .json file named google_vision.json in `user directory/.config/` and installing google-cloud-vision (`pip install google-cloud-vision`) +- Azure Computer Vision: you need to specify an api key and an endpoint in the config file (see below) and to install azure-cognitiveservices-vision-computervision (`pip install azure-cognitiveservices-vision-computervision`) # Usage -## Python API - -```python -from manga_ocr import MangaOcr - -mocr = MangaOcr() -text = mocr('/path/to/img') -``` - -or - -```python -import PIL.Image - -from manga_ocr import MangaOcr - -mocr = MangaOcr() -img = PIL.Image.open('/path/to/img') -text = mocr(img) -``` - -## Running in the background - -Manga OCR can run in the background and process new images as they appear. - -You might use a tool like [ShareX](https://getsharex.com/) or [Flameshot](https://flameshot.org/) to manually capture a region of the screen and let the -OCR read it either from the system clipboard, or a specified directory. By default, Manga OCR will write recognized text to clipboard, -from which it can be read by a dictionary like [Yomichan](https://github.com/FooSoft/yomichan). - -Clipboard mode on Linux requires `wl-copy` for Wayland sessions or `xclip` for X11 sessions. You can find out which one your system needs by running `echo $XDG_SESSION_TYPE` in the terminal. - -Your full setup for reading manga in Japanese with a dictionary might look like this: - -capture region with ShareX -> write image to clipboard -> Manga OCR -> write text to clipboard -> Yomichan - -https://user-images.githubusercontent.com/22717958/150238361-052b95d1-0152-485f-a441-48a957536239.mp4 - -- To read images from clipboard and write recognized texts to clipboard, run in command line: - ```commandline - manga_ocr - ``` -- To read images from ShareX's screenshot folder, run in command line: - ```commandline - manga_ocr "/path/to/sharex/screenshot/folder" - ``` -Note that when running in the clipboard scanning mode, any image that you copy to clipboard will be processed by OCR and replaced -by recognized text. If you want to be able to copy and paste images as usual, you should use the folder scanning mode instead -and define a separate task in ShareX just for OCR, which saves screenshots to some folder without copying them to clipboard. - -When running for the first time, downloading the model (~400 MB) might take a few minutes. -The OCR is ready to use after `OCR ready` message appears in the logs. - -- To see other options, run in command line: - ```commandline - manga_ocr --help - ``` - -If `manga_ocr` doesn't work, you might also try replacing it with `python -m manga_ocr`. - -## Usage tips - -- OCR supports multi-line text, but the longer the text, the more likely some errors are to occur. - If the recognition failed for some part of a longer text, you might try to run it on a smaller portion of the image. -- The model was trained specifically to handle manga well, but should do a decent job on other types of printed text, - such as novels or video games. It probably won't be able to handle handwritten text though. -- The model always attempts to recognize some text on the image, even if there is none. - Because it uses a transformer decoder (and therefore has some understanding of the Japanese language), - it might even "dream up" some realistically looking sentences! This shouldn't be a problem for most use cases, - but it might get improved in the next version. - -# Examples - -Here are some cherry-picked examples showing the capability of the model. - -| image | Manga OCR result | -|----------------------|------------------| -| ![](assets/examples/00.jpg) | 素直にあやまるしか | -| ![](assets/examples/01.jpg) | 立川で見た〝穴〟の下の巨大な眼は: | -| ![](assets/examples/02.jpg) | 実戦剣術も一流です | -| ![](assets/examples/03.jpg) | 第30話重苦しい闇の奥で静かに呼吸づきながら | -| ![](assets/examples/04.jpg) | よかったじゃないわよ!何逃げてるのよ!!早くあいつを退治してよ! | -| ![](assets/examples/05.jpg) | ぎゃっ | -| ![](assets/examples/06.jpg) | ピンポーーン | -| ![](assets/examples/07.jpg) | LINK!私達7人の力でガノンの塔の結界をやぶります | -| ![](assets/examples/08.jpg) | ファイアパンチ | -| ![](assets/examples/09.jpg) | 少し黙っている | -| ![](assets/examples/10.jpg) | わかるかな〜? | -| ![](assets/examples/11.jpg) | 警察にも先生にも町中の人達に!! | - -# Contact -For any inquiries, please feel free to contact me at kha-white@mail.com +It mostly functions like Manga OCR: https://github.com/kha-white/manga-ocr?tab=readme-ov-file#running-in-the-background +However: +- you can pause/unpause the clipboard image processing by pressing "p" or terminate the script with "t" or "q" +- you can switch OCR provider with its corresponding keyboard key (refer to the list above). You can also start the script paused with the -p option or with a specific provider with the -e option (refer to `owocr -h` for the list) +- holding ctrl or cmd at any time will pause the clipboard image processing temporarily +- for systems where text can be copied to the clipboard at the same time as images, if `*ocr_ignore*` is copied with an image, the image will be ignored +- a config file (located in `user directory/.config/owocr_config.ini`) can be used to limit providers (to reduce clutter/memory usage) as well as specifying provider settings such as api keys etc (a sample config file is provided) # Acknowledgments -This project was done with the usage of: -- [Manga109-s](http://www.manga109.org/en/download_s.html) dataset -- [CC-100](https://data.statmt.org/cc-100/) dataset +This uses code from/references these projects: +- [Manga OCR](https://github.com/kha-white/manga-ocr) +- [ocrmac](https://github.com/straussmaximilian/ocrmac) for the Apple Vision framework API +- [NadeOCR](https://github.com/Natsume-197/NadeOCR) for the Google Vision API diff --git a/manga_ocr/__init__.py b/manga_ocr/__init__.py deleted file mode 100644 index 2cc257a..0000000 --- a/manga_ocr/__init__.py +++ /dev/null @@ -1,9 +0,0 @@ -__version__ = '0.1.10' - -from manga_ocr.ocr import MangaOcr -from manga_ocr.ocr import GoogleVision -from manga_ocr.ocr import AppleVision -from manga_ocr.ocr import WinRTOCR -from manga_ocr.ocr import AzureComputerVision -from manga_ocr.ocr import EasyOCR -from manga_ocr.ocr import PaddleOCR diff --git a/owocr/__init__.py b/owocr/__init__.py new file mode 100644 index 0000000..00e7eb9 --- /dev/null +++ b/owocr/__init__.py @@ -0,0 +1,9 @@ +__version__ = '0.1.10' + +from owocr.ocr import MangaOcr +from owocr.ocr import GoogleVision +from owocr.ocr import AppleVision +from owocr.ocr import WinRTOCR +from owocr.ocr import AzureComputerVision +from owocr.ocr import EasyOCR +from owocr.ocr import PaddleOCR diff --git a/manga_ocr/__main__.py b/owocr/__main__.py similarity index 67% rename from manga_ocr/__main__.py rename to owocr/__main__.py index 18fb138..a7e8e5d 100644 --- a/manga_ocr/__main__.py +++ b/owocr/__main__.py @@ -1,6 +1,6 @@ import fire -from manga_ocr.run import run +from owocr.run import run def main(): diff --git a/manga_ocr/ocr.py b/owocr/ocr.py similarity index 86% rename from manga_ocr/ocr.py rename to owocr/ocr.py index 3c914d3..06c69e5 100644 --- a/manga_ocr/ocr.py +++ b/owocr/ocr.py @@ -8,12 +8,15 @@ import sys import platform import jaconv -import torch import numpy as np import json from PIL import Image from loguru import logger -from transformers import ViTImageProcessor, AutoTokenizer, VisionEncoderDecoderModel + +try: + from manga_ocr import MangaOcr as MOCR +except ImportError: + pass try: import Vision @@ -68,30 +71,22 @@ class MangaOcr: name = "mangaocr" readable_name = "Manga OCR" key = "m" - available = True + available = False def __init__(self, config={'pretrained_model_name_or_path':'kha-white/manga-ocr-base','force_cpu':'False'}, pretrained_model_name_or_path='', force_cpu=False): - if pretrained_model_name_or_path == '': - pretrained_model_name_or_path = config['pretrained_model_name_or_path'] - if config['force_cpu'] == 'True': - force_cpu = True - - logger.info(f'Loading Manga OCR model from {pretrained_model_name_or_path}') - self.processor = ViTImageProcessor.from_pretrained(pretrained_model_name_or_path) - self.tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path) - self.model = VisionEncoderDecoderModel.from_pretrained(pretrained_model_name_or_path) - - if not force_cpu and torch.cuda.is_available(): - logger.info('Using CUDA') - self.model.cuda() - elif not force_cpu and torch.backends.mps.is_available(): - logger.info('Using MPS') - warnings.filterwarnings("ignore", message=".*MPS: no support.*") - self.model.to('mps') + if 'manga_ocr' not in sys.modules: + logger.warning('manga-ocr not available, Manga OCR will not work!') else: - logger.info('Using CPU') + if pretrained_model_name_or_path == '': + pretrained_model_name_or_path = config['pretrained_model_name_or_path'] + if config['force_cpu'] == 'True': + force_cpu = True - logger.info('Manga OCR ready') + logger.disable("manga_ocr") + logger.info(f'Loading Manga OCR model') + self.model = MOCR(pretrained_model_name_or_path, force_cpu) + self.available = True + logger.info('Manga OCR ready') def __call__(self, img_or_path): if isinstance(img_or_path, str) or isinstance(img_or_path, Path): @@ -101,18 +96,9 @@ class MangaOcr: else: raise ValueError(f'img_or_path must be a path or PIL.Image, instead got: {img_or_path}') - img = img.convert('L').convert('RGB') - - x = self._preprocess(img) - x = self.model.generate(x[None].to(self.model.device), max_length=300)[0].cpu() - x = self.tokenizer.decode(x, skip_special_tokens=True) - x = post_process(x) + x = self.model(img) return x - def _preprocess(self, img): - pixel_values = self.processor(img, return_tensors="pt").pixel_values - return pixel_values.squeeze() - class GoogleVision: name = "gvision" readable_name = "Google Vision" diff --git a/manga_ocr/run.py b/owocr/run.py similarity index 96% rename from manga_ocr/run.py rename to owocr/run.py index df22c66..083fb81 100644 --- a/manga_ocr/run.py +++ b/owocr/run.py @@ -14,7 +14,7 @@ from loguru import logger from pynput import keyboard import inspect -from manga_ocr import * +from owocr import * def are_images_identical(img1, img2): @@ -131,7 +131,7 @@ def run(read_from='clipboard', default_engine = '' logger.info(f'Parsing config file') - config_file = os.path.join(os.path.expanduser('~'),'.config','ocr_config.ini') + config_file = os.path.join(os.path.expanduser('~'),'.config','owocr_config.ini') config = configparser.ConfigParser() res = config.read(config_file) @@ -139,7 +139,7 @@ def run(read_from='clipboard', logger.warning('No config file, defaults will be used') else: try: - for config_engine in config['common']['engines'].split(','): + for config_engine in config['general']['engines'].split(','): config_engines.append(config_engine.strip()) except KeyError: pass diff --git a/owocr_config.ini b/owocr_config.ini new file mode 100644 index 0000000..48940c3 --- /dev/null +++ b/owocr_config.ini @@ -0,0 +1,10 @@ +[general] +; engines = avision,mangaocr +[winrtocr] +; url = http://aaa.xxx.yyy.zzz:8000 +[azure] +; api_key = api_key_here +; endpoint = https://YOURPROJECT.cognitiveservices.azure.com/ +[mangaocr] +pretrained_model_name_or_path = kha-white/manga-ocr-base +force_cpu = False \ No newline at end of file diff --git a/requirements.txt b/requirements.txt index 7960522..b18c115 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,15 +1,8 @@ fire -fugashi jaconv loguru numpy Pillow>=10.0.0 pyperclip -torch>=1.0 -transformers>=4.25.0 unidic_lite -google-cloud-vision -azure-cognitiveservices-vision-computervision -pynput -easyocr -paddleocr \ No newline at end of file +pyinput \ No newline at end of file diff --git a/setup.py b/setup.py index fcae620..813bc6d 100644 --- a/setup.py +++ b/setup.py @@ -4,35 +4,32 @@ from setuptools import setup long_description = (Path(__file__).parent / "README.md").read_text('utf-8').split('# Installation')[0] setup( - name="manga-ocr", - version='0.1.11', - description="OCR for Japanese manga", + name="owocr", + version='0.1', + description="Japanese OCR", long_description=long_description, long_description_content_type="text/markdown", - url="https://github.com/kha-white/manga-ocr", - author="Maciej Budyś", - author_email="kha-white@mail.com", + url="https://github.com/AuroraWright/owocr", + author="AuroraWright", + author_email="fallingluma@gmail.com", license="Apache License 2.0", classifiers=[ "Programming Language :: Python :: 3", ], - packages=['manga_ocr'], + packages=['owocr'], include_package_data=True, install_requires=[ "fire", - "fugashi", "jaconv", "loguru", "numpy", "Pillow>=10.0.0", "pyperclip", - "torch>=1.0", - "transformers>=4.25.0", - "unidic_lite", + "unidic_lite" ], entry_points={ "console_scripts": [ - "manga_ocr=manga_ocr.__main__:main", + "owocr=owocr.__main__:main", ] }, )