IJIGSP Vol.4, No.6, Jul. 2012

Segmentation of Ancient Telugu Text Documents

Srinivasa Rao A.V

Index Terms

Segmentation, Profile, Line, Syllable, Gaussian derivative kernel


OCR of ancient document images remains a challenging task till date. Scanning process itself introduces deformation of document images. Cleaning process of these document images will result in information loss. Segmentation contributes an invariance process in OCR. Complex scripts, like derivatives of Brahmi, encounter many problems in the segmentation process. Segmentation of meaningful units, (instead of isolated patterns), revealed interesting trends. A segmentation technique for the ancient Telugu document image into meaningful units is proposed. The topological features of the meaningful units within the script line are adopted as a basis, while segmenting the text line. Horizontal profile pattern is convolved with Gaussian kernel. The statistical properties of meaningful units are explored by extensively analyzing the geometrical patterns of the meaningful unit. The efficiency of the proposed algorithm involving segmentation process is found to be 73.5% for the case of uncleaned document images.

Srinivasa Rao A.V,"Segmentation of Ancient Telugu Text Documents", IJIGSP, vol.4, no.6, pp.8-14, 2012.


