Use manga_ocr as library, separate the projects, add some documentation

2023-12-16 04:34:19 +01:00
parent c0826b1837
commit 2b208b9288
9 changed files with 75 additions and 192 deletions
--- a/README.md
+++ b/README.md
@@ -1,140 +1,37 @@
-# Manga OCR
+# OwOCR

-Optical character recognition for Japanese text, with the main focus being Japanese manga.
-It uses a custom end-to-end model built with Transformers' [Vision Encoder Decoder](https://huggingface.co/docs/transformers/model_doc/vision-encoder-decoder) framework. 
-
-Manga OCR can be used as a general purpose printed Japanese OCR, but its main goal was to provide a high quality
-text recognition, robust against various scenarios specific to manga:
- both vertical and horizontal text
- text with furigana
- text overlaid on images
- wide variety of fonts and font styles
- low quality images
-
-Unlike many OCR models, Manga OCR supports recognizing multi-line text in a single forward pass,
-so that text bubbles found in manga can be processed at once, without splitting them into lines.
-
-See also:
- [Poricom](https://github.com/bluaxees/Poricom), a GUI reader, which uses manga-ocr
- [mokuro](https://github.com/kha-white/mokuro), a tool, which uses manga-ocr to generate an HTML overlay for manga
- [Xelieu's guide](https://rentry.co/lazyXel), a comprehensive guide on setting up a reading and mining workflow with manga-ocr/mokuro (and many other useful tips)
- Development code, including code for training and synthetic data generation: [link](manga_ocr_dev)
- Description of synthetic data generation pipeline + examples of generated images: [link](manga_ocr_dev/synthetic_data_generator)
+Command line client for several Japanese OCR providers derived from [Manga OCR](https://github.com/kha-white/manga-ocr).

 # Installation

-You need Python 3.8, 3.9, 3.10 or 3.11.
+This has been tested with Python 3.11. Newer/older versions might work. For now it can be installed with `pip install https://github.com/AuroraWright/owocr/archive/master.zip`

-If you want to run with GPU, install PyTorch as described [here](https://pytorch.org/get-started/locally/#start-locally),
-otherwise this step can be skipped.
+# Supported providers

-Run in command line:
+## Local providers
+- [Manga OCR](https://github.com/kha-white/manga-ocr): refer to the readme for installation ("m" key)
+- [EasyOCR](https://github.com/JaidedAI/EasyOCR): refer to the readme for installation ("e" key)
+- [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR): refer to the [wiki](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/doc/doc_en/quickstart_en.md) for installation ("o" key)
+- Apple Vision framework: this will work on macOS Ventura or later if pyobjc (`pip install pyobjc`) is installed. In my experience, the best of the local providers for horizontal text ("a" key)
+- WinRT OCR: this will work on Windows 10 or later if winocr (`pip install winocr`) is installed. It can also be used by installing winocr on a Windows virtual machine and running the server (`winocr_serve`), installing requests (`pip install requests`) and specifying the IP address of the Windows VM/machine in the config file (see below) ("w" key)

-```commandline
-pip3 install manga-ocr
-```
-
-## Troubleshooting
-
- `ImportError: DLL load failed while importing fugashi: The specified module could not be found.` - might be because of Python installed from Microsoft Store, try installing Python from the [official site](https://www.python.org/downloads)
- problem with installing `mecab-python3` on ARM architecture - try [this workaround](https://github.com/kha-white/manga-ocr/issues/16)
+## Cloud providers
+- Google Vision: you need a service account .json file named google_vision.json in `user directory/.config/` and installing google-cloud-vision (`pip install google-cloud-vision`)
+- Azure Computer Vision: you need to specify an api key and an endpoint in the config file (see below) and to install azure-cognitiveservices-vision-computervision (`pip install azure-cognitiveservices-vision-computervision`)

 # Usage

-## Python API
-
-```python
-from manga_ocr import MangaOcr
-
-mocr = MangaOcr()
-text = mocr('/path/to/img')
-```
-
-or
-
-```python
-import PIL.Image
-
-from manga_ocr import MangaOcr
-
-mocr = MangaOcr()
-img = PIL.Image.open('/path/to/img')
-text = mocr(img)
-```
-
-## Running in the background
-
-Manga OCR can run in the background and process new images as they appear.
-
-You might use a tool like [ShareX](https://getsharex.com/) or [Flameshot](https://flameshot.org/) to manually capture a region of the screen and let the
-OCR read it either from the system clipboard, or a specified directory. By default, Manga OCR will write recognized text to clipboard,
-from which it can be read by a dictionary like [Yomichan](https://github.com/FooSoft/yomichan).
-
-Clipboard mode on Linux requires `wl-copy` for Wayland sessions or `xclip` for X11 sessions. You can find out which one your system needs by running `echo $XDG_SESSION_TYPE` in the terminal.
-
-Your full setup for reading manga in Japanese with a dictionary might look like this:
-
-capture region with ShareX -> write image to clipboard -> Manga OCR -> write text to clipboard -> Yomichan
-
-https://user-images.githubusercontent.com/22717958/150238361-052b95d1-0152-485f-a441-48a957536239.mp4
-
- To read images from clipboard and write recognized texts to clipboard, run in command line:
-    ```commandline
-    manga_ocr
-    ```
- To read images from ShareX's screenshot folder, run in command line:
-    ```commandline
-    manga_ocr "/path/to/sharex/screenshot/folder"
-    ```
-Note that when running in the clipboard scanning mode, any image that you copy to clipboard will be processed by OCR and replaced
-by recognized text. If you want to be able to copy and paste images as usual, you should use the folder scanning mode instead
-and define a separate task in ShareX just for OCR, which saves screenshots to some folder without copying them to clipboard.
-
-When running for the first time, downloading the model (~400 MB) might take a few minutes.
-The OCR is ready to use after `OCR ready` message appears in the logs.
-
- To see other options, run in command line:
-    ```commandline
-    manga_ocr --help
-    ```
-
-If `manga_ocr` doesn't work, you might also try replacing it with `python -m manga_ocr`.
-
-## Usage tips
-
- OCR supports multi-line text, but the longer the text, the more likely some errors are to occur.
-  If the recognition failed for some part of a longer text, you might try to run it on a smaller portion of the image.
- The model was trained specifically to handle manga well, but should do a decent job on other types of printed text,
-  such as novels or video games. It probably won't be able to handle handwritten text though. 
- The model always attempts to recognize some text on the image, even if there is none.
-  Because it uses a transformer decoder (and therefore has some understanding of the Japanese language),
-  it might even "dream up" some realistically looking sentences! This shouldn't be a problem for most use cases,
-  but it might get improved in the next version.
-
-# Examples
-
-Here are some cherry-picked examples showing the capability of the model. 
-
-| image                | Manga OCR result |
-|----------------------|------------------|
-| ![](assets/examples/00.jpg) | 素直にあやまるしか |
-| ![](assets/examples/01.jpg) | 立川で見た〝穴〟の下の巨大な眼は： |
-| ![](assets/examples/02.jpg) | 実戦剣術も一流です |
-| ![](assets/examples/03.jpg) | 第３０話重苦しい闇の奥で静かに呼吸づきながら |
-| ![](assets/examples/04.jpg) | よかったじゃないわよ！何逃げてるのよ！！早くあいつを退治してよ！ |
-| ![](assets/examples/05.jpg) | ぎゃっ |
-| ![](assets/examples/06.jpg) | ピンポーーン |
-| ![](assets/examples/07.jpg) | ＬＩＮＫ！私達７人の力でガノンの塔の結界をやぶります |
-| ![](assets/examples/08.jpg) | ファイアパンチ |
-| ![](assets/examples/09.jpg) | 少し黙っている |
-| ![](assets/examples/10.jpg) | わかるかな〜？ |
-| ![](assets/examples/11.jpg) | 警察にも先生にも町中の人達に！！ |
-
-# Contact
-For any inquiries, please feel free to contact me at kha-white@mail.com
+It mostly functions like Manga OCR: https://github.com/kha-white/manga-ocr?tab=readme-ov-file#running-in-the-background
+However:
+- you can pause/unpause the clipboard image processing by pressing "p" or terminate the script with "t" or "q"
+- you can switch OCR provider with its corresponding keyboard key (refer to the list above). You can also start the script paused with the -p option or with a specific provider with the -e option (refer to `owocr -h` for the list)
+- holding ctrl or cmd at any time will pause the clipboard image processing temporarily
+- for systems where text can be copied to the clipboard at the same time as images, if `*ocr_ignore*` is copied with an image, the image will be ignored
+- a config file (located in `user directory/.config/owocr_config.ini`) can be used to limit providers (to reduce clutter/memory usage) as well as specifying provider settings such as api keys etc (a sample config file is provided)

 # Acknowledgments

-This project was done with the usage of:
- [Manga109-s](http://www.manga109.org/en/download_s.html) dataset
- [CC-100](https://data.statmt.org/cc-100/) dataset
+This uses code from/references these projects:
+- [Manga OCR](https://github.com/kha-white/manga-ocr)
+- [ocrmac](https://github.com/straussmaximilian/ocrmac) for the Apple Vision framework API
+- [NadeOCR](https://github.com/Natsume-197/NadeOCR) for the Google Vision API