: Venu Govindaraju, Srirangaraj Setlur
: Venu Govindaraju, Srirangaraj Ranga Setlur
: Guide to OCR for Indic Scripts Document Recognition and Retrieval
: Springer-Verlag
: 9781848003309
: Advances in Computer Vision and Pattern Recognition
: 1
: CHF 135.30
:
: Anwendungs-Software
: English
: 325
: Wasserzeichen
: PC/MAC/eReader/Tablet
: PDF

This is the first comprehensive text on Optical Character Recognition for Indic scripts. It covers many topics and describes OCR systems for eight different scripts-Bangla, Devanagari, Gurmukhi, Gujarti, Kannada, Malayalam, Tamil and Urdu.

Foreword4
Preface6
1 Part I: Recognition of Indic Scripts9
2 Part II: Retrieval of Indic Documents11
3 Target Audience11
Acknowledgments13
Contents14
Contributors16
Part I Recognition of Indic Scripts19
Building Data Sets for Indian Language OCR Research20
1 Introduction20
2 Datasets21
2.1 Image Corpus21
2.1.1 Digitization22
2.1.2 Processing and Storage22
2.2 Text Corpus23
2.3 Annotated Data Sets23
3 Annotation24
3.1 Hierarchical Annotation26
3.1.1 Different Levels of Annotation26
3.1.2 Methods of Annotation27
3.2 Annotation Process28
3.2.1 Segmentation28
3.2.2 Components Labeling29
3.2.3 Annotation Tools31
4 Representation and Access32
4.1 Sources of Metainformation33
4.2 Recognizer-Specific Metainformation34
4.3 Digitization Meta Information34
4.4 Annotation Data35
4.4.1 Page Structure Information36
4.4.2 Text Block Structure Information36
4.4.3 Akshara Structure Information37
4.5 Representation Issues37
4.5.1 Complex Layout37
4.5.2 Indian Language Script Issues37
4.6 Data Access38
5 Implementation and Execution39
5.1 Organization of Tasks39
5.2 Status of the Data Sets40
6 Conclusions40
References41
On OCR of Major Indian Scripts: Bangla and Devanagari43
1 Introduction43
2 Basic OCR System45
2.1 Group and Individual Character Classifiers48
3 Quantification of Errors50
4 Post-recognition Error Correction52
4.1 Forward--Backward Error Correction Scheme53
5 Discussion57
References57
A Complete Machine-Printed Gurmukhi OCR System59
1 Introduction59
2 Characteristics of Gurmukhi Script60
2.1 Character Set60
2.2 Connectivity of Symbols60
2.3 Word Partitioning into Zones61
2.4 Frequently Touching Characters62
2.5 Broken Characters and Headlines62
2.6 Similarity of Group of Symbols62
3 System Overview62
4 Digitization and Pre-processing62
5 Splitting Text into Horizontal Text Strips64
6 Word Segmentation67
7 Sub-division of Strips into Smaller Units68
8 Repairing the Word Shape69
9 Thinning70
10 Repairing Broken Characters72
11 Character Segmentation74
11.1 Touching Characters77
12 Recognition Stage78
12.1 Feature Extraction78
12.2 Classification80
12.2.1 Design of the Binary Tree Classifier81
12.3 Merging Sub-symbols81
13 Post-Processing84
13.1 Check for the Existence of a Word in the Corpus84
13.2 Perform Holistic Recognition of a Word84
14 Experimental Results85
15 Conclusion86
References87
Progress in Gujarati Document Processing and Character Recognition88
1 Introduction88
2 Gujarati Script: OCR Perspective89
3 Segmentation91
4 Zone Boundary Identification92
4.1 Using Slopes of the Imaginary Lines Joining Top Left (Bottom Right) Corners93
4.2 Dynamic Programming Approach95
5 Extracting Recognizable Units98
6 Recognition98
6.1 Feature Extraction99
6.1.1 Fringe Map100
6.1.2 Discrete Cosine Transform100
6.1.3 Wavelet Transform101
6.1.4 Zone Information102
6.1.5 Aspect Ratio102
6.2 Classification102
6.2.1 Nearest Neighbor Classifier102
6.2.2 Artificial Neural Networks [ 25 , 26 ]103
6.2.3 Multi-layer Perceptron (MLP) [ 25 ]103
6.2.4 Radial Basis Functions (RBF) networks103
6.2.5 General Regression Neural Network (GRNN)104
6.3 Experimental Setup and Results106
7 Text Generation107
8 Post-processing108
9 Conclusion108
References109
Design of a Bilingual KannadaEnglish OCR111
1