This repo contains scripts and datasets for processing Telugu language data.
Checkout module docstrings of individual scripts on how to use them.
te.pyrnn.gz - Telugu language model(LSTM + CTC) trained with ocropy
Sample training data. You can use scripts to generate customized training data.
Isolated Handwritten Telugu Character Dataset
Telugu and other south asian language data
tessaract-te - Tesseract Open Source OCR Engine
banti_telugu_ocr - End to end OCR system for Telugu. Based on Convolutional Neural Networks.
Chamanti_ocr - Telugu OCR framework using RNN, CTC in Theano & Python3.