ebook ebooks e-book e-books downloaden bei MyEbooks.ch downloaden

Guide to OCR for Indic Scripts Document Recognition and Retrieval

:	Venu Govindaraju, Srirangaraj Setlur
:	Venu Govindaraju, Srirangaraj Ranga Setlur
:	Guide to OCR for Indic Scripts Document Recognition and Retrieval
:	Springer-Verlag
:	9781848003309
:	Advances in Computer Vision and Pattern Recognition
:	1
:	CHF 135.30
:

:	Anwendungs-Software
:	English

:	325
:	Wasserzeichen
:	PC/MAC/eReader/Tablet
:	PDF

This is the first comprehensive text on Optical Character Recognition for Indic scripts. It covers many topics and describes OCR systems for eight different scripts-Bangla, Devanagari, Gurmukhi, Gujarti, Kannada, Malayalam, Tamil and Urdu.

	Foreword	4
	Preface	6
	1 Part I: Recognition of Indic Scripts	9
	2 Part II: Retrieval of Indic Documents	11
	3 Target Audience	11
	Acknowledgments	13
	Contents	14
	Contributors	16
	Part I Recognition of Indic Scripts	19
	Building Data Sets for Indian Language OCR Research	20
	1 Introduction	20
	2 Datasets	21
	2.1 Image Corpus	21
	2.1.1 Digitization	22
	2.1.2 Processing and Storage	22
	2.2 Text Corpus	23
	2.3 Annotated Data Sets	23
	3 Annotation	24
	3.1 Hierarchical Annotation	26
	3.1.1 Different Levels of Annotation	26
	3.1.2 Methods of Annotation	27
	3.2 Annotation Process	28
	3.2.1 Segmentation	28
	3.2.2 Components Labeling	29
	3.2.3 Annotation Tools	31
	4 Representation and Access	32
	4.1 Sources of Metainformation	33
	4.2 Recognizer-Specific Metainformation	34
	4.3 Digitization Meta Information	34
	4.4 Annotation Data	35
	4.4.1 Page Structure Information	36
	4.4.2 Text Block Structure Information	36
	4.4.3 Akshara Structure Information	37
	4.5 Representation Issues	37
	4.5.1 Complex Layout	37
	4.5.2 Indian Language Script Issues	37
	4.6 Data Access	38
	5 Implementation and Execution	39
	5.1 Organization of Tasks	39
	5.2 Status of the Data Sets	40
	6 Conclusions	40
	References	41
	On OCR of Major Indian Scripts: Bangla and Devanagari	43
	1 Introduction	43
	2 Basic OCR System	45
	2.1 Group and Individual Character Classifiers	48
	3 Quantification of Errors	50
	4 Post-recognition Error Correction	52
	4.1 Forward--Backward Error Correction Scheme	53
	5 Discussion	57
	References	57
	A Complete Machine-Printed Gurmukhi OCR System	59
	1 Introduction	59
	2 Characteristics of Gurmukhi Script	60
	2.1 Character Set	60
	2.2 Connectivity of Symbols	60
	2.3 Word Partitioning into Zones	61
	2.4 Frequently Touching Characters	62
	2.5 Broken Characters and Headlines	62
	2.6 Similarity of Group of Symbols	62
	3 System Overview	62
	4 Digitization and Pre-processing	62
	5 Splitting Text into Horizontal Text Strips	64
	6 Word Segmentation	67
	7 Sub-division of Strips into Smaller Units	68
	8 Repairing the Word Shape	69
	9 Thinning	70
	10 Repairing Broken Characters	72
	11 Character Segmentation	74
	11.1 Touching Characters	77
	12 Recognition Stage	78
	12.1 Feature Extraction	78
	12.2 Classification	80
	12.2.1 Design of the Binary Tree Classifier	81
	12.3 Merging Sub-symbols	81
	13 Post-Processing	84
	13.1 Check for the Existence of a Word in the Corpus	84
	13.2 Perform Holistic Recognition of a Word	84
	14 Experimental Results	85
	15 Conclusion	86
	References	87
	Progress in Gujarati Document Processing and Character Recognition	88
	1 Introduction	88
	2 Gujarati Script: OCR Perspective	89
	3 Segmentation	91
	4 Zone Boundary Identification	92
	4.1 Using Slopes of the Imaginary Lines Joining Top Left (Bottom Right) Corners	93
	4.2 Dynamic Programming Approach	95
	5 Extracting Recognizable Units	98
	6 Recognition	98
	6.1 Feature Extraction	99
	6.1.1 Fringe Map	100
	6.1.2 Discrete Cosine Transform	100
	6.1.3 Wavelet Transform	101
	6.1.4 Zone Information	102
	6.1.5 Aspect Ratio	102
	6.2 Classification	102
	6.2.1 Nearest Neighbor Classifier	102
	6.2.2 Artificial Neural Networks [ 25 , 26 ]	103
	6.2.3 Multi-layer Perceptron (MLP) [ 25 ]	103
	6.2.4 Radial Basis Functions (RBF) networks	103
	6.2.5 General Regression Neural Network (GRNN)	104
	6.3 Experimental Setup and Results	106
	7 Text Generation	107
	8 Post-processing	108
	9 Conclusion	108
	References	109
	Design of a Bilingual KannadaEnglish OCR	111
	1