Use manga_ocr as library, separate the projects, add some documentation
This commit is contained in:
151
README.md
151
README.md
@@ -1,140 +1,37 @@
|
|||||||
# Manga OCR
|
# OwOCR
|
||||||
|
|
||||||
Optical character recognition for Japanese text, with the main focus being Japanese manga.
|
Command line client for several Japanese OCR providers derived from [Manga OCR](https://github.com/kha-white/manga-ocr).
|
||||||
It uses a custom end-to-end model built with Transformers' [Vision Encoder Decoder](https://huggingface.co/docs/transformers/model_doc/vision-encoder-decoder) framework.
|
|
||||||
|
|
||||||
Manga OCR can be used as a general purpose printed Japanese OCR, but its main goal was to provide a high quality
|
|
||||||
text recognition, robust against various scenarios specific to manga:
|
|
||||||
- both vertical and horizontal text
|
|
||||||
- text with furigana
|
|
||||||
- text overlaid on images
|
|
||||||
- wide variety of fonts and font styles
|
|
||||||
- low quality images
|
|
||||||
|
|
||||||
Unlike many OCR models, Manga OCR supports recognizing multi-line text in a single forward pass,
|
|
||||||
so that text bubbles found in manga can be processed at once, without splitting them into lines.
|
|
||||||
|
|
||||||
See also:
|
|
||||||
- [Poricom](https://github.com/bluaxees/Poricom), a GUI reader, which uses manga-ocr
|
|
||||||
- [mokuro](https://github.com/kha-white/mokuro), a tool, which uses manga-ocr to generate an HTML overlay for manga
|
|
||||||
- [Xelieu's guide](https://rentry.co/lazyXel), a comprehensive guide on setting up a reading and mining workflow with manga-ocr/mokuro (and many other useful tips)
|
|
||||||
- Development code, including code for training and synthetic data generation: [link](manga_ocr_dev)
|
|
||||||
- Description of synthetic data generation pipeline + examples of generated images: [link](manga_ocr_dev/synthetic_data_generator)
|
|
||||||
|
|
||||||
# Installation
|
# Installation
|
||||||
|
|
||||||
You need Python 3.8, 3.9, 3.10 or 3.11.
|
This has been tested with Python 3.11. Newer/older versions might work. For now it can be installed with `pip install https://github.com/AuroraWright/owocr/archive/master.zip`
|
||||||
|
|
||||||
If you want to run with GPU, install PyTorch as described [here](https://pytorch.org/get-started/locally/#start-locally),
|
# Supported providers
|
||||||
otherwise this step can be skipped.
|
|
||||||
|
|
||||||
Run in command line:
|
## Local providers
|
||||||
|
- [Manga OCR](https://github.com/kha-white/manga-ocr): refer to the readme for installation ("m" key)
|
||||||
|
- [EasyOCR](https://github.com/JaidedAI/EasyOCR): refer to the readme for installation ("e" key)
|
||||||
|
- [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR): refer to the [wiki](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/doc/doc_en/quickstart_en.md) for installation ("o" key)
|
||||||
|
- Apple Vision framework: this will work on macOS Ventura or later if pyobjc (`pip install pyobjc`) is installed. In my experience, the best of the local providers for horizontal text ("a" key)
|
||||||
|
- WinRT OCR: this will work on Windows 10 or later if winocr (`pip install winocr`) is installed. It can also be used by installing winocr on a Windows virtual machine and running the server (`winocr_serve`), installing requests (`pip install requests`) and specifying the IP address of the Windows VM/machine in the config file (see below) ("w" key)
|
||||||
|
|
||||||
```commandline
|
## Cloud providers
|
||||||
pip3 install manga-ocr
|
- Google Vision: you need a service account .json file named google_vision.json in `user directory/.config/` and installing google-cloud-vision (`pip install google-cloud-vision`)
|
||||||
```
|
- Azure Computer Vision: you need to specify an api key and an endpoint in the config file (see below) and to install azure-cognitiveservices-vision-computervision (`pip install azure-cognitiveservices-vision-computervision`)
|
||||||
|
|
||||||
## Troubleshooting
|
|
||||||
|
|
||||||
- `ImportError: DLL load failed while importing fugashi: The specified module could not be found.` - might be because of Python installed from Microsoft Store, try installing Python from the [official site](https://www.python.org/downloads)
|
|
||||||
- problem with installing `mecab-python3` on ARM architecture - try [this workaround](https://github.com/kha-white/manga-ocr/issues/16)
|
|
||||||
|
|
||||||
# Usage
|
# Usage
|
||||||
|
|
||||||
## Python API
|
It mostly functions like Manga OCR: https://github.com/kha-white/manga-ocr?tab=readme-ov-file#running-in-the-background
|
||||||
|
However:
|
||||||
```python
|
- you can pause/unpause the clipboard image processing by pressing "p" or terminate the script with "t" or "q"
|
||||||
from manga_ocr import MangaOcr
|
- you can switch OCR provider with its corresponding keyboard key (refer to the list above). You can also start the script paused with the -p option or with a specific provider with the -e option (refer to `owocr -h` for the list)
|
||||||
|
- holding ctrl or cmd at any time will pause the clipboard image processing temporarily
|
||||||
mocr = MangaOcr()
|
- for systems where text can be copied to the clipboard at the same time as images, if `*ocr_ignore*` is copied with an image, the image will be ignored
|
||||||
text = mocr('/path/to/img')
|
- a config file (located in `user directory/.config/owocr_config.ini`) can be used to limit providers (to reduce clutter/memory usage) as well as specifying provider settings such as api keys etc (a sample config file is provided)
|
||||||
```
|
|
||||||
|
|
||||||
or
|
|
||||||
|
|
||||||
```python
|
|
||||||
import PIL.Image
|
|
||||||
|
|
||||||
from manga_ocr import MangaOcr
|
|
||||||
|
|
||||||
mocr = MangaOcr()
|
|
||||||
img = PIL.Image.open('/path/to/img')
|
|
||||||
text = mocr(img)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Running in the background
|
|
||||||
|
|
||||||
Manga OCR can run in the background and process new images as they appear.
|
|
||||||
|
|
||||||
You might use a tool like [ShareX](https://getsharex.com/) or [Flameshot](https://flameshot.org/) to manually capture a region of the screen and let the
|
|
||||||
OCR read it either from the system clipboard, or a specified directory. By default, Manga OCR will write recognized text to clipboard,
|
|
||||||
from which it can be read by a dictionary like [Yomichan](https://github.com/FooSoft/yomichan).
|
|
||||||
|
|
||||||
Clipboard mode on Linux requires `wl-copy` for Wayland sessions or `xclip` for X11 sessions. You can find out which one your system needs by running `echo $XDG_SESSION_TYPE` in the terminal.
|
|
||||||
|
|
||||||
Your full setup for reading manga in Japanese with a dictionary might look like this:
|
|
||||||
|
|
||||||
capture region with ShareX -> write image to clipboard -> Manga OCR -> write text to clipboard -> Yomichan
|
|
||||||
|
|
||||||
https://user-images.githubusercontent.com/22717958/150238361-052b95d1-0152-485f-a441-48a957536239.mp4
|
|
||||||
|
|
||||||
- To read images from clipboard and write recognized texts to clipboard, run in command line:
|
|
||||||
```commandline
|
|
||||||
manga_ocr
|
|
||||||
```
|
|
||||||
- To read images from ShareX's screenshot folder, run in command line:
|
|
||||||
```commandline
|
|
||||||
manga_ocr "/path/to/sharex/screenshot/folder"
|
|
||||||
```
|
|
||||||
Note that when running in the clipboard scanning mode, any image that you copy to clipboard will be processed by OCR and replaced
|
|
||||||
by recognized text. If you want to be able to copy and paste images as usual, you should use the folder scanning mode instead
|
|
||||||
and define a separate task in ShareX just for OCR, which saves screenshots to some folder without copying them to clipboard.
|
|
||||||
|
|
||||||
When running for the first time, downloading the model (~400 MB) might take a few minutes.
|
|
||||||
The OCR is ready to use after `OCR ready` message appears in the logs.
|
|
||||||
|
|
||||||
- To see other options, run in command line:
|
|
||||||
```commandline
|
|
||||||
manga_ocr --help
|
|
||||||
```
|
|
||||||
|
|
||||||
If `manga_ocr` doesn't work, you might also try replacing it with `python -m manga_ocr`.
|
|
||||||
|
|
||||||
## Usage tips
|
|
||||||
|
|
||||||
- OCR supports multi-line text, but the longer the text, the more likely some errors are to occur.
|
|
||||||
If the recognition failed for some part of a longer text, you might try to run it on a smaller portion of the image.
|
|
||||||
- The model was trained specifically to handle manga well, but should do a decent job on other types of printed text,
|
|
||||||
such as novels or video games. It probably won't be able to handle handwritten text though.
|
|
||||||
- The model always attempts to recognize some text on the image, even if there is none.
|
|
||||||
Because it uses a transformer decoder (and therefore has some understanding of the Japanese language),
|
|
||||||
it might even "dream up" some realistically looking sentences! This shouldn't be a problem for most use cases,
|
|
||||||
but it might get improved in the next version.
|
|
||||||
|
|
||||||
# Examples
|
|
||||||
|
|
||||||
Here are some cherry-picked examples showing the capability of the model.
|
|
||||||
|
|
||||||
| image | Manga OCR result |
|
|
||||||
|----------------------|------------------|
|
|
||||||
|  | 素直にあやまるしか |
|
|
||||||
|  | 立川で見た〝穴〟の下の巨大な眼は: |
|
|
||||||
|  | 実戦剣術も一流です |
|
|
||||||
|  | 第30話重苦しい闇の奥で静かに呼吸づきながら |
|
|
||||||
|  | よかったじゃないわよ!何逃げてるのよ!!早くあいつを退治してよ! |
|
|
||||||
|  | ぎゃっ |
|
|
||||||
|  | ピンポーーン |
|
|
||||||
|  | LINK!私達7人の力でガノンの塔の結界をやぶります |
|
|
||||||
|  | ファイアパンチ |
|
|
||||||
|  | 少し黙っている |
|
|
||||||
|  | わかるかな〜? |
|
|
||||||
|  | 警察にも先生にも町中の人達に!! |
|
|
||||||
|
|
||||||
# Contact
|
|
||||||
For any inquiries, please feel free to contact me at kha-white@mail.com
|
|
||||||
|
|
||||||
# Acknowledgments
|
# Acknowledgments
|
||||||
|
|
||||||
This project was done with the usage of:
|
This uses code from/references these projects:
|
||||||
- [Manga109-s](http://www.manga109.org/en/download_s.html) dataset
|
- [Manga OCR](https://github.com/kha-white/manga-ocr)
|
||||||
- [CC-100](https://data.statmt.org/cc-100/) dataset
|
- [ocrmac](https://github.com/straussmaximilian/ocrmac) for the Apple Vision framework API
|
||||||
|
- [NadeOCR](https://github.com/Natsume-197/NadeOCR) for the Google Vision API
|
||||||
|
|||||||
@@ -1,9 +0,0 @@
|
|||||||
__version__ = '0.1.10'
|
|
||||||
|
|
||||||
from manga_ocr.ocr import MangaOcr
|
|
||||||
from manga_ocr.ocr import GoogleVision
|
|
||||||
from manga_ocr.ocr import AppleVision
|
|
||||||
from manga_ocr.ocr import WinRTOCR
|
|
||||||
from manga_ocr.ocr import AzureComputerVision
|
|
||||||
from manga_ocr.ocr import EasyOCR
|
|
||||||
from manga_ocr.ocr import PaddleOCR
|
|
||||||
9
owocr/__init__.py
Normal file
9
owocr/__init__.py
Normal file
@@ -0,0 +1,9 @@
|
|||||||
|
__version__ = '0.1.10'
|
||||||
|
|
||||||
|
from owocr.ocr import MangaOcr
|
||||||
|
from owocr.ocr import GoogleVision
|
||||||
|
from owocr.ocr import AppleVision
|
||||||
|
from owocr.ocr import WinRTOCR
|
||||||
|
from owocr.ocr import AzureComputerVision
|
||||||
|
from owocr.ocr import EasyOCR
|
||||||
|
from owocr.ocr import PaddleOCR
|
||||||
@@ -1,6 +1,6 @@
|
|||||||
import fire
|
import fire
|
||||||
|
|
||||||
from manga_ocr.run import run
|
from owocr.run import run
|
||||||
|
|
||||||
|
|
||||||
def main():
|
def main():
|
||||||
@@ -8,12 +8,15 @@ import sys
|
|||||||
import platform
|
import platform
|
||||||
|
|
||||||
import jaconv
|
import jaconv
|
||||||
import torch
|
|
||||||
import numpy as np
|
import numpy as np
|
||||||
import json
|
import json
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
from transformers import ViTImageProcessor, AutoTokenizer, VisionEncoderDecoderModel
|
|
||||||
|
try:
|
||||||
|
from manga_ocr import MangaOcr as MOCR
|
||||||
|
except ImportError:
|
||||||
|
pass
|
||||||
|
|
||||||
try:
|
try:
|
||||||
import Vision
|
import Vision
|
||||||
@@ -68,29 +71,21 @@ class MangaOcr:
|
|||||||
name = "mangaocr"
|
name = "mangaocr"
|
||||||
readable_name = "Manga OCR"
|
readable_name = "Manga OCR"
|
||||||
key = "m"
|
key = "m"
|
||||||
available = True
|
available = False
|
||||||
|
|
||||||
def __init__(self, config={'pretrained_model_name_or_path':'kha-white/manga-ocr-base','force_cpu':'False'}, pretrained_model_name_or_path='', force_cpu=False):
|
def __init__(self, config={'pretrained_model_name_or_path':'kha-white/manga-ocr-base','force_cpu':'False'}, pretrained_model_name_or_path='', force_cpu=False):
|
||||||
|
if 'manga_ocr' not in sys.modules:
|
||||||
|
logger.warning('manga-ocr not available, Manga OCR will not work!')
|
||||||
|
else:
|
||||||
if pretrained_model_name_or_path == '':
|
if pretrained_model_name_or_path == '':
|
||||||
pretrained_model_name_or_path = config['pretrained_model_name_or_path']
|
pretrained_model_name_or_path = config['pretrained_model_name_or_path']
|
||||||
if config['force_cpu'] == 'True':
|
if config['force_cpu'] == 'True':
|
||||||
force_cpu = True
|
force_cpu = True
|
||||||
|
|
||||||
logger.info(f'Loading Manga OCR model from {pretrained_model_name_or_path}')
|
logger.disable("manga_ocr")
|
||||||
self.processor = ViTImageProcessor.from_pretrained(pretrained_model_name_or_path)
|
logger.info(f'Loading Manga OCR model')
|
||||||
self.tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path)
|
self.model = MOCR(pretrained_model_name_or_path, force_cpu)
|
||||||
self.model = VisionEncoderDecoderModel.from_pretrained(pretrained_model_name_or_path)
|
self.available = True
|
||||||
|
|
||||||
if not force_cpu and torch.cuda.is_available():
|
|
||||||
logger.info('Using CUDA')
|
|
||||||
self.model.cuda()
|
|
||||||
elif not force_cpu and torch.backends.mps.is_available():
|
|
||||||
logger.info('Using MPS')
|
|
||||||
warnings.filterwarnings("ignore", message=".*MPS: no support.*")
|
|
||||||
self.model.to('mps')
|
|
||||||
else:
|
|
||||||
logger.info('Using CPU')
|
|
||||||
|
|
||||||
logger.info('Manga OCR ready')
|
logger.info('Manga OCR ready')
|
||||||
|
|
||||||
def __call__(self, img_or_path):
|
def __call__(self, img_or_path):
|
||||||
@@ -101,18 +96,9 @@ class MangaOcr:
|
|||||||
else:
|
else:
|
||||||
raise ValueError(f'img_or_path must be a path or PIL.Image, instead got: {img_or_path}')
|
raise ValueError(f'img_or_path must be a path or PIL.Image, instead got: {img_or_path}')
|
||||||
|
|
||||||
img = img.convert('L').convert('RGB')
|
x = self.model(img)
|
||||||
|
|
||||||
x = self._preprocess(img)
|
|
||||||
x = self.model.generate(x[None].to(self.model.device), max_length=300)[0].cpu()
|
|
||||||
x = self.tokenizer.decode(x, skip_special_tokens=True)
|
|
||||||
x = post_process(x)
|
|
||||||
return x
|
return x
|
||||||
|
|
||||||
def _preprocess(self, img):
|
|
||||||
pixel_values = self.processor(img, return_tensors="pt").pixel_values
|
|
||||||
return pixel_values.squeeze()
|
|
||||||
|
|
||||||
class GoogleVision:
|
class GoogleVision:
|
||||||
name = "gvision"
|
name = "gvision"
|
||||||
readable_name = "Google Vision"
|
readable_name = "Google Vision"
|
||||||
@@ -14,7 +14,7 @@ from loguru import logger
|
|||||||
from pynput import keyboard
|
from pynput import keyboard
|
||||||
|
|
||||||
import inspect
|
import inspect
|
||||||
from manga_ocr import *
|
from owocr import *
|
||||||
|
|
||||||
|
|
||||||
def are_images_identical(img1, img2):
|
def are_images_identical(img1, img2):
|
||||||
@@ -131,7 +131,7 @@ def run(read_from='clipboard',
|
|||||||
default_engine = ''
|
default_engine = ''
|
||||||
|
|
||||||
logger.info(f'Parsing config file')
|
logger.info(f'Parsing config file')
|
||||||
config_file = os.path.join(os.path.expanduser('~'),'.config','ocr_config.ini')
|
config_file = os.path.join(os.path.expanduser('~'),'.config','owocr_config.ini')
|
||||||
config = configparser.ConfigParser()
|
config = configparser.ConfigParser()
|
||||||
res = config.read(config_file)
|
res = config.read(config_file)
|
||||||
|
|
||||||
@@ -139,7 +139,7 @@ def run(read_from='clipboard',
|
|||||||
logger.warning('No config file, defaults will be used')
|
logger.warning('No config file, defaults will be used')
|
||||||
else:
|
else:
|
||||||
try:
|
try:
|
||||||
for config_engine in config['common']['engines'].split(','):
|
for config_engine in config['general']['engines'].split(','):
|
||||||
config_engines.append(config_engine.strip())
|
config_engines.append(config_engine.strip())
|
||||||
except KeyError:
|
except KeyError:
|
||||||
pass
|
pass
|
||||||
10
owocr_config.ini
Normal file
10
owocr_config.ini
Normal file
@@ -0,0 +1,10 @@
|
|||||||
|
[general]
|
||||||
|
; engines = avision,mangaocr
|
||||||
|
[winrtocr]
|
||||||
|
; url = http://aaa.xxx.yyy.zzz:8000
|
||||||
|
[azure]
|
||||||
|
; api_key = api_key_here
|
||||||
|
; endpoint = https://YOURPROJECT.cognitiveservices.azure.com/
|
||||||
|
[mangaocr]
|
||||||
|
pretrained_model_name_or_path = kha-white/manga-ocr-base
|
||||||
|
force_cpu = False
|
||||||
@@ -1,15 +1,8 @@
|
|||||||
fire
|
fire
|
||||||
fugashi
|
|
||||||
jaconv
|
jaconv
|
||||||
loguru
|
loguru
|
||||||
numpy
|
numpy
|
||||||
Pillow>=10.0.0
|
Pillow>=10.0.0
|
||||||
pyperclip
|
pyperclip
|
||||||
torch>=1.0
|
|
||||||
transformers>=4.25.0
|
|
||||||
unidic_lite
|
unidic_lite
|
||||||
google-cloud-vision
|
pyinput
|
||||||
azure-cognitiveservices-vision-computervision
|
|
||||||
pynput
|
|
||||||
easyocr
|
|
||||||
paddleocr
|
|
||||||
21
setup.py
21
setup.py
@@ -4,35 +4,32 @@ from setuptools import setup
|
|||||||
long_description = (Path(__file__).parent / "README.md").read_text('utf-8').split('# Installation')[0]
|
long_description = (Path(__file__).parent / "README.md").read_text('utf-8').split('# Installation')[0]
|
||||||
|
|
||||||
setup(
|
setup(
|
||||||
name="manga-ocr",
|
name="owocr",
|
||||||
version='0.1.11',
|
version='0.1',
|
||||||
description="OCR for Japanese manga",
|
description="Japanese OCR",
|
||||||
long_description=long_description,
|
long_description=long_description,
|
||||||
long_description_content_type="text/markdown",
|
long_description_content_type="text/markdown",
|
||||||
url="https://github.com/kha-white/manga-ocr",
|
url="https://github.com/AuroraWright/owocr",
|
||||||
author="Maciej Budyś",
|
author="AuroraWright",
|
||||||
author_email="kha-white@mail.com",
|
author_email="fallingluma@gmail.com",
|
||||||
license="Apache License 2.0",
|
license="Apache License 2.0",
|
||||||
classifiers=[
|
classifiers=[
|
||||||
"Programming Language :: Python :: 3",
|
"Programming Language :: Python :: 3",
|
||||||
],
|
],
|
||||||
packages=['manga_ocr'],
|
packages=['owocr'],
|
||||||
include_package_data=True,
|
include_package_data=True,
|
||||||
install_requires=[
|
install_requires=[
|
||||||
"fire",
|
"fire",
|
||||||
"fugashi",
|
|
||||||
"jaconv",
|
"jaconv",
|
||||||
"loguru",
|
"loguru",
|
||||||
"numpy",
|
"numpy",
|
||||||
"Pillow>=10.0.0",
|
"Pillow>=10.0.0",
|
||||||
"pyperclip",
|
"pyperclip",
|
||||||
"torch>=1.0",
|
"unidic_lite"
|
||||||
"transformers>=4.25.0",
|
|
||||||
"unidic_lite",
|
|
||||||
],
|
],
|
||||||
entry_points={
|
entry_points={
|
||||||
"console_scripts": [
|
"console_scripts": [
|
||||||
"manga_ocr=manga_ocr.__main__:main",
|
"owocr=owocr.__main__:main",
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
)
|
)
|
||||||
|
|||||||
Reference in New Issue
Block a user