JMTrans/README.md

42 lines
2.5 KiB
Markdown
Raw Normal View History

2020-08-19 18:03:17 +00:00
# manga translator
2020-08-24 01:50:57 +00:00
get japanese manga from url to translate manga image using SickZil(tensorflow model), ocr(pytesseract ocr and nhocr) and googletrans
2020-08-19 14:30:51 +00:00
2020-08-24 02:02:42 +00:00
This project code is still under processing on the colab environment. For use, run colab in below url.
2020-08-19 14:30:51 +00:00
https://colab.research.google.com/drive/1XbR7fNXtT4TGlLI1FBcCQv7Gj5mlDvwb?usp=sharing
2020-08-19 14:34:05 +00:00
# result
2020-08-19 17:21:24 +00:00
![result](doc/result1.png)
2020-08-19 17:21:46 +00:00
![result](doc/result2.png)
![result](doc/result3.png)
![result](doc/result4.png)
![result](doc/result5.png)
![result](doc/result6.png)
2020-08-19 17:47:36 +00:00
![result](doc/result7.png)
2020-08-19 14:39:32 +00:00
2020-08-24 02:08:26 +00:00
# Workflow
2020-08-24 02:10:15 +00:00
- use gallery-dl to get managa from url
- do text segmentation from manga image using SickZil
2020-08-24 02:11:23 +00:00
- use opencv to crop text image based on text segmentation results
2020-08-24 02:10:15 +00:00
- get text from image using pytesseract ocr and nhocr
- translating using googletrans
- use pil to place translated text
2020-08-24 02:08:26 +00:00
# Supported Languages (for destination translation)
For translating, it uses google trans. It supports :
afrikaans, albanian, amharic, arabic, armenian, azerbaijani, basque, belarusian, bengali, bosnian, bulgarian, catalan, chichewa, chinese (simplified), chinese (traditional), corsican, croatian, czech, danish, dutch, english, esperanto, estonian, filipino, filipino, finnish, french, frisian, galician, georgian, german, greek, gujarati, haitian creole, hausa, hawaiian, hindi, hungarian, icelandic, igbo, indonesian, irish, italian, japanese, kazakh, khmer, korean, kurdish (kurmanji), kyrgyz, lao, latin, latvian, lithuanian, luxembourgish, macedonian, malagasy, malay, malayalam, maltese, maori, marathi, mongolian, myanmar (burmese), nepali, norwegian, pashto, persian, polish, portuguese, punjabi, romanian, russian, samoan, scots gaelic, serbian, sesotho, shona, sindhi, sinhala, slovak, slovenian, somali, spanish, sundanese, swahili, swedish, tajik, tamil, thai, turkish, ukrainian, urdu, uzbek, vietnamese, welsh, xhosa, yiddish, yoruba, zulu
# Supported url
gallery-dl is used to download. Its support sites are:
- [text-detection](https://github.com/mikf/gallery-dl/blob/master/docs/supportedsites.rst)
2020-08-19 17:34:35 +00:00
# Acknowledgement and References
2020-08-19 17:35:34 +00:00
- [SickZil-Machine](https://github.com/KUR-creative/SickZil-Machine)
2020-08-19 17:34:35 +00:00
- [OpenCV with Python wrapper](https://pypi.org/project/opencv-python/)
- [Google Translate API for Python](https://pypi.org/project/googletrans/)
2020-08-19 17:35:34 +00:00
- [Tesseract](https://github.com/tesseract-ocr/tesseract)
2020-08-19 17:34:35 +00:00
- [Pytesseract](https://pypi.python.org/pypi/pytesseract)
- [nhocr](https://github.com/fireae/nhocr)
2020-08-23 15:43:14 +00:00
- [text-detection](https://github.com/qzane/text-detection)
2020-08-19 17:35:34 +00:00