This repo contains scripts and datasets for processing Telugu language data.
Checkout module docstrings of individual scripts on how to use them.
te.pyrnn.gz - Telugu language model(LSTM + CTC) trained with ocropy
Sample training data. You can use scripts to generate customized training data.
Isolated Handwritten Telugu Character Dataset
Telugu and other south asian language data
tessaract-te - Tesseract Open Source OCR Engine
banti_telugu_ocr - End to end OCR system for Telugu. Based on Convolutional Neural Networks.
Chamanti_ocr - Telugu OCR framework using RNN, CTC in Theano & Python3.
http://docs.cltk.org/en/latest/telugu.html
http://www.tdil-dc.in/index.php?option=com_download&task=showresourceDetails&toolid=264&lang=en
http://www.tdil-dc.in/index.php?option=com_download&task=showresourceDetails&toolid=1892&lang=en