diff --git a/README.md b/README.md index 7e2a76a..ded9636 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,18 @@ -OCR for Japanese manga +# Manga OCR + +Optical character recognition for Japanese text, with the main focus being Japanese manga. +It uses a custom end-to-end model built with Transformers' [Vision Encoder Decoder](https://huggingface.co/docs/transformers/model_doc/visionencoderdecoder) framework. + +Manga OCR can be used as a general purpose printed Japanese OCR, but its main goal was to provide a high quality +text recognition, robust against various scenarios specific to manga: +- both vertical and horizontal text +- text with furigana +- text overlaid on images +- wide variety of fonts and font styles +- low quality images + +Unlike many OCR models, Manga OCR supports recognizing multi-line text in a single forward pass, +so that text bubbles found in manga can be processed at once, without splitting them into lines. # Installation @@ -9,12 +23,38 @@ otherwise this step can be skipped. Run: -``` +```commandline pip install manga-ocr ``` # Usage +## Running in the background + +Manga OCR can run in the background, processing new images as they appear. + +You might then use a tool like [ShareX](https://getsharex.com/) to manually capture a region of the screen and let the +OCR read it either from the system clipboard, or a specified directory. + +For example: + +- To read images from clipboard and write recognized texts to clipboard, run: + ```commandline + manga_ocr + ``` +- To read images from ShareX's screenshot folder, run: + ```commandline + manga_ocr "/path/to/sharex/screenshot/folder" + ``` +- To see other options, run: + ```commandline + manga_ocr --help + ``` + +If `manga_ocr` doesn't work, you might also try replacing it with `python -m manga_ocr`. + +## Python API + ```python from manga_ocr import MangaOcr @@ -33,3 +73,39 @@ mocr = MangaOcr() img = PIL.Image.open('/path/to/img') text = mocr(img) ``` + +## Usage tips +- OCR supports multi-line text, but the longer the text, the more likely some errors are to occur. + If the recognition failed for some part of a longer text, you might try to run it on a smaller portion of the image. +- The model was trained specifically to handle manga well, but should do a decent job on other types of printed text, + such as novels or video games. It probably won't be able to handle handwritten text though. +- The model always attempts to recognize some text on the image, even if there is none. + Because it uses a transformer decoder (and therefore has some understanding of the Japanese language), + it might even "dream up" some realistically looking sentences! This shouldn't be a problem for most use cases, + but it might get improved in the next version. + +# Examples + +Here are some cherry-picked examples showing the capability of the model. + +| image | Manga OCR result | +|----------------------|------------------| +| ![](examples/00.jpg) | 素直にあやまるしか | +| ![](examples/01.jpg) | 立川で見た〝穴〟の下の巨大な眼は: | +| ![](examples/02.jpg) | 実戦剣術も一流です | +| ![](examples/03.jpg) | 第30話重苦しい闇の奥で静かに呼吸づきながら | +| ![](examples/04.jpg) | よかったじゃないわよ!何逃げてるのよ!!早くあいつを退治してよ! | +| ![](examples/05.jpg) | ぎゃっ | +| ![](examples/06.jpg) | ピンポーーン | +| ![](examples/07.jpg) | LINK!私達7人の力でガノンの塔の結界をやぶります | +| ![](examples/08.jpg) | ファイアパンチ | +| ![](examples/09.jpg) | 少し黙っている | +| ![](examples/10.jpg) | わかるかな〜? | +| ![](examples/11.jpg) | 警察にも先生にも町中の人達に!! | + + + + +# Acknowledgments + +This project was done with the usage of [Manga109-s](http://www.manga109.org/en/download_s.html) dataset. diff --git a/examples/00.jpg b/examples/00.jpg new file mode 100644 index 0000000..faef4b4 Binary files /dev/null and b/examples/00.jpg differ diff --git a/examples/01.jpg b/examples/01.jpg new file mode 100644 index 0000000..0bd3c27 Binary files /dev/null and b/examples/01.jpg differ diff --git a/examples/02.jpg b/examples/02.jpg new file mode 100644 index 0000000..9ed906a Binary files /dev/null and b/examples/02.jpg differ diff --git a/examples/03.jpg b/examples/03.jpg new file mode 100644 index 0000000..65f4c1a Binary files /dev/null and b/examples/03.jpg differ diff --git a/examples/04.jpg b/examples/04.jpg new file mode 100644 index 0000000..e7439d8 Binary files /dev/null and b/examples/04.jpg differ diff --git a/examples/05.jpg b/examples/05.jpg new file mode 100644 index 0000000..c202c7e Binary files /dev/null and b/examples/05.jpg differ diff --git a/examples/06.jpg b/examples/06.jpg new file mode 100644 index 0000000..34cd7b8 Binary files /dev/null and b/examples/06.jpg differ diff --git a/examples/07.jpg b/examples/07.jpg new file mode 100644 index 0000000..91048e0 Binary files /dev/null and b/examples/07.jpg differ diff --git a/examples/08.jpg b/examples/08.jpg new file mode 100644 index 0000000..95ce304 Binary files /dev/null and b/examples/08.jpg differ diff --git a/examples/09.jpg b/examples/09.jpg new file mode 100644 index 0000000..91537a2 Binary files /dev/null and b/examples/09.jpg differ diff --git a/examples/10.jpg b/examples/10.jpg new file mode 100644 index 0000000..2ed92cb Binary files /dev/null and b/examples/10.jpg differ diff --git a/examples/11.jpg b/examples/11.jpg new file mode 100644 index 0000000..e51e5e0 Binary files /dev/null and b/examples/11.jpg differ