Feedback on Ph.D. Thesis

Review comments (snippets) on Shiva's thesis by Prof. George Nagy, Professor Emeritus, Rensselaer Polytechnic Institute (RPI), thesis examiner:

    • It contains original contributions in document image enhancement, layout analysis, line/word/character detection, feature extraction, classification, feedback correction, and post-processing.

    • The many design choices that were made in the course of implementing such a large and complex system are sound, and often original. They include (1) re-digitization after skew-correction; (2) global local and, where necessary, multi-alternative binarization; (3) an initially simple and elegant line segmentation scheme based on an efficient tree data structure (that must, however, be supplemented by regionalization and many Kannada-specific routines); (4) systematic combination of horizontal lines in adjacent regions; (5) component level symbol and sub-symbol extraction based on detailed analysis of the multitude of Kannada consonant-vowel allograph combinations; (6) Haar and correlation coefficient features augmented by size, position and aspect ratio; (7) off-the-shelf SVM classifier; (8) word recognition based on bigram transitions and maximum path probability; (9) rule-based reordering of the output according to Unicode conventions; and, worthy of special attention, feedback from word-level analysis of the output Unicode sequence to (10) re-digitization, (11) gray-scale re-segmentation and (12) (alternative classifier for Latin script.

    • The research is informed by a thorough examination of Kannada script morphology as well as of other Dravidian scripts and languages. Latin script and European language processing is incorporated without a hitch. The extensive I/O facilities developed to support transcription include (a) interactive text-region selection, (b) simultaneous synchronized display of text-image and image-text for editing the output, (c) audio, (d) E-book and (e) Braille outputs. The inclusion of a user-friendly editing interface is important because the reported error rates still imply several dozen errors on a densely printed page. To promote ease of adoption by lay users, there are ready-to-run versions for Windows, Unix, and Linux platforms. Also described are examples of actual applications of the system, primarily for the benefit of the blind, by various organizations, schools, government, and charitable agencies.

    • Every aspect of the system is carefully evaluated on benchmarks of the pan-Indian OCR consortium or previous document image analysis competitions. Most of the evaluations are based on string matching with Levenshtein distance for alignment. The results of comparisons of partial and final output with ground truth and competitive systems are reported in a sensible and uniform manner in narrative and graphic forms.

    • There is enough interesting and original material here for at least three dissertations.

    • The narrative is clear and easy to read. The illustrations are outstanding, including the many excellent flowcharts. The examples are chosen with care and the figure captions are clear and simple. The candidate is a talented communicator.

One of the reviewers of Shiva's journal publication “Lipi Gnani - A Versatile OCR for Documents in any Language Printed in Kannada Script.” commented that “This paper can be considered as a benchmark for Kannada document recognition”.