2020-07-08 01:12:27 +00:00
|
|
|
# Kindai-OCR
|
|
|
|
OCR system for recognizing modern Japanese magazines
|
2020-07-08 01:54:30 +00:00
|
|
|
|
|
|
|
## About
|
|
|
|
|
|
|
|
This repo contains an OCR sytem for converting modern Japanese images to text.
|
2020-07-08 01:57:35 +00:00
|
|
|
This is a result of [N2I project](http://codh.rois.ac.jp/collaboration/#n2i) for digitization of modern Japanese documents.
|
2020-07-08 01:54:30 +00:00
|
|
|
|
|
|
|
The system has 2 main modules: text line extraction and text line recognition. The overall architechture is shown in the below figure.
|
|
|
|
|
|
|
|
For text line extraction, we retrain the CRAFT (Character Region Awareness for Text Detection) on our dataset.
|
|
|
|
For text line recognition, we employ the attention-based encoder-decoder on our previous publication.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Installing Kindai OCR
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Running Kindai OCR
|
|
|
|
|
|
|
|
|