| Foreword | 6 |
---|
| Personal Note | 9 |
---|
| Contents | 10 |
---|
| Contributors | 12 |
---|
| Part I Introduction | 14 |
---|
| 1 Riding the Rough Waves of Genre on the Web | 15 |
| 1.1 Why Is Genre Important? | 15 |
| 1.1.1 Zooming In: Information on the Web | 16 |
| 1.2 Trying to Grasp the Ungraspable? | 18 |
| 1.2.1 In Quest of a Definition of Web Genre for Empirical Studies and Computational Applications | 20 |
| 1.3 Empirical and Computational Approaches to Genre: Open Issues | 21 |
| 1.3.1 Web Documents | 21 |
| 1.3.2 Corpora, Genres and the Web | 26 |
| 1.3.3 Empirical and Computational Models of Web Genres | 30 |
| 1.4 Conclusions | 34 |
| 1.5 Outline of the Volume | 35 |
| References | 37 |
| Part II Identifying the Sources of Web Genres | 43 |
---|
| 2 Conventions and Mutual Expectations | 44 |
| 2.1 Genres Are Not Rule-Bound | 44 |
| 2.2 So, Let's Ask the Readers | 46 |
| 2.3 An Editorial, Third Party, View of Genres on the Web | 51 |
| 2.4 Data Source: Observation of User Actions | 53 |
| 2.5 Conclusions | 56 |
| References | 56 |
| 3 Identification of Web Genres by User Warrant | 58 |
| 3.1 Introduction | 58 |
| 3.2 Criteria for the Identification of Web Genre | 60 |
| 3.3 Operationalizing Traditional Genre Theory for the World Wide Web | 61 |
| 3.3.1 A Genre's User Group | 61 |
| 3.3.2 Genre: Function, Form and Substance | 63 |
| 3.3.3 Genres on the Web: Further Implications for Research | 66 |
| 3.4 Developing a Web Genre Palette | 66 |
| 3.4.1 Collecting Genre Terminology in the Users' Own Words | 67 |
| 3.4.2 Users Choose the Best of the Collected Genre Terminology | 69 |
| 3.4.3 User Validation of the Genre Palette | 72 |
| 3.4.4 A Fourth Study: Determining the Genres' Usefulness for Web Search | 75 |
| 3.5 Conclusion | 76 |
| References | 77 |
| 4 Problems in the Use-Centered Development of a Taxonomy of Web Genres | 79 |
| 4.1 Introduction | 79 |
| 4.1.1 What Is the Purpose of a Genre Taxonomy? | 80 |
| 4.2 Why Is It Hard to Develop a Web Genre Taxonomy? | 81 |
| 4.2.1 Difficulties in Defining Genres | 81 |
| 4.2.2 Difficulties in Developing the Scope and Expressiveness of the Taxonomy | 83 |
| 4.3 A Use-Centered Development of a Taxonomy of Web Genres | 85 |
| 4.3.1 Research Design: Naturalistic Field Study | 85 |
| 4.3.2 Research Informants | 85 |
| 4.3.3 Data Elicitation | 86 |
| 4.3.4 Data Analysis | 87 |
| 4.4 Results | 88 |
| 4.5 Discussion | 89 |
| 4.6 Conclusions | 92 |
| References | 93 |
| Part III Automatic Web Genre Identification | 95 |
---|
| 5 Cross-Testing a Genre Classification Model for the Web | 96 |
| 5.1 Introduction | 96 |
| 5.2 Approximating Genre Population on the Web | 99 |
| 5.2.1 Noise | 100 |
| 5.2.2 Description of the Corpora Used for Cross-Testing | 101 |
| 5.3 The Web as Communication | 105 |
| 5.3.1 Genre Palette | 105 |
| 5.3.2 Linguistically- and Functionally-Motivated Features | 107 |
| 5.4 The Genre Model | 107 |
| 5.4.1 Methodology | 110 |
| 5.4.2 Flow and Hypotheses | 111 |
| 5.5 Results | 113 |
| 5.5.1 Cross-Testing Performance on Single Labels: BBC and 7-Webgenre Collections | 114 |
| 5.5.2 Performances of Other Single-Label Models on the 7-Webgenre Collection | 117 |
| 5.5.3 Cross-Testing Performance on Single Labels: Mapped Web Genres | 120 |
| 5.5.4 Cross-Testing Performance on Single Labels: HCG and MCG in Isolation | 122 |
| 5.5.5 The SPIRIT Sample: An Attempt to Assess Multilabelling | 122 |
| 5.6 Discussion | 126 |
| 5.7 Conclusion and Future Work | 127 |
| References | 135 |
| 6 Formulating Representative Features with Respect to Genre Classification | 138 |
| 6.1 Introduction | 138 |
| 6.2 Defining Genre Classification | 141 |
| 6.2.1 Document Representation in Conventional Text Classification | 141 |
| 6.2.2 Harmonic Descriptor Representation (HDR) of Documents | 141 |
| 6.2.3 Defining Genre | 145 |
| 6.3 Classifiers | 146 |
| 6.4 Dataset | 147 |
| 6.5 Features | 149 |
| 6.6 Results | 151 |
| 6.6.1 Overall Accuracy | 151 |
| 6.6.2 Precision and Recall | 152 |
| 6.7 Conclusions | 154 |
| References | 155 |
| 7 In the Garden and in the Jungle | 157 |
| 7.1 Introduction | 157 |
| 7.2 Text Typology for the Web | 159 |
| 7.3 An Experiment in Automatic Classification of the Web | 163 |
| 7.4 Analysis of Results | 167 |
| 7.4.1 Qualitative Assessment of Texts in Each Category | 167 |
| 7.4.2 Assessing the Composition of ukWac | 169 |
| 7.5 Conclusions and Future Research | 170 |
| References | 173 |
| 8 Web Genre Analysis: Use Cases, Retrieval Models, and Implementation Issues | 175 |
| 8.1 Introduction | 175 |
| 8.1.1 Contributions | 176 |
| 8.2 Use Cases: Genre Analysis in the Retrieval Practice | 176 |
| 8.2.1 Genre-Enabled Web Search | 177 |
| 8.2.2 Information Extraction Based on Genre Information | 177 |
| 8.2.3 Organizing Collections in Both Topic and Genre Dimensions<
|