1 Introduction /1
1.1 XML Data Model /1
1.2 Emergenceof XML Database /3
1.2.1 Flat File Storage /3
1.2.2 Relational and Object Relational Storage /3
1.2.3 Native Storage of XML Data /4
1.3 XML Query Languageand Processing /5
1.4 XML Keyword Search/5
1.5 Book Outline /6
References /7
2 XML Labeling Scheme /9
2.1 IntroducingXML LabelingScheme /9
2.2 Region EncodingScheme /10
2.3 Dewey and Extended Dewey Scheme/11
2.3.1 Dewey ID Labeling Scheme /11
2.3.2 ExtendedDewey and FST /12
2.4 Dynamic Labeling Scheme/16
2.4.1 Region-Based Dynamic Labeling Scheme/17
2.4.2 Pre.x-Based Dynamic Labeling Scheme /18
2.4.3 PrimeLabelingScheme/19
2.4.4 The EncodingSchemes /20
2.5 Summary /31
References /31
3 XML Data Indexing /33
3.1 IntroducingXML Data Indexing/33
3.2 IndexesonXMLTreeStructure/34
3.2.1 DataGuides /34
3.2.2 1-Index/39
3.2.3 F&B-Index/43
3.3 Index Based on XML Sequencing /54
3.3.1 PRIX: Indexing and Querying XML Using Prufer Sequences /54
3.3.2 ViST: A Dynamic Index Method for Querying XML Data by Tree Structures /65
3.3.3 APEX: An AdaptivePath Index for XML Data /75
3.4 Summary /87
References /88
4 XML Tree Pattern Processing /91
4.1 IntroducingXML Tree Pattern Processing /91
4.2 XML Structural Join /92
4.2.1 Tree-MergeJoinAlgorithms/94
4.2.2 Stack-TreeJoinAlgorithms/97
4.3 XML Holistic Twig Pattern Processing /103
4.3.1 PathStack /104
4.3.2 TwigStack/108
4.3.3 TwigStackList /112
4.3.4 TJFast /122
4.3.5 ExperimentalEvaluation/128
4.4 XML Query Processing Based on VariousStreaming Schemes/134
4.4.1 TagCLevelStreamingandPre.x-PathStreaming(PPS)/135
4.4.2 iTwigJoin Algorithm /144
4.5 Summary /155
References /155
5 Ordered and Generalized XML Tree Pattern Processing /157
5.1 IntroducingOrdered and GeneralizedXML Tree Pattern Processing 157
5.2 XML Ordered Query Processing/158
5.2.1 Data Model and Ordered Twig Pattern /159
5.2.2 XML Ordered Query Processing Algorithm/160
5.2.3 Analysis of OrderedTJ /163
5.2.4 ExperimentalEvaluation/165
5.3 XML Generalized XML Tree Pattern/167
5.3.1 GTJFast Algorithm/168
5.3.2 AnalysisofGTJFast/171
5.3.3 Experiments /173
5.4 ExtendedXML Tree Pattern/174
5.4.1 ExtendedTree Pattern Query /176
5.4.2 MatchingCross/177
5.4.3 Holistic Algorithms /183
5.4.4 Experiments /193
5.5 Summary /200
References /200
Contents
6 Effective XML Keyword Search /203
6.1 IntroducingEffective XML Keyword Search/203
6.2 XMLKeywordSearchSemantics/204
6.2.1 LCA and the Meet Operator /205
6.2.2 MLCA and MLCAS /205
6.2.3 SLCA /206
6.2.4 GDMCT /207
6.2.5 ICA (Interested Common Ancestor) and IRA (InterestedRelatedAncestors)/209
6.2.6 ELCA (Exclusive Lowest Common Ancestor) /210
6.2.7 VLCA (Valuable Lowest Common Ancestor) /210
6.2.8 MCN /211
6.2.9 MeaningfulSLCA/211
6.2.10 LCEA (Lowest Common Entity Ancestor) /213
6.2.11 MLCEA (MeaningfulLCEA) /213
6.3 XML Keyword Search Algorithms/213
6.3.1 DIL (Dewey InvertedList) Query Processing Algorithm /213
6.3.2 The Stack Algorithm /216
6.3.3 Basic Multiway-SLCA Algorithm (BMS) /218
6.3.4 IncrementalMultiway-SLCA Algorithm(IMS) /220
6.3.5 IndexedStack Algorithm/221
6.3.6 Stack-Based Query Re.nement Algorithm /223
6.4 XML Keyword Search Ranking Strategy/224
6.4.1 TF*IDF Cosine Similarity /225
6.4.2 Data Model /226
6.4.3 XML TF&DF/226
6.4.4 Inferringthe Node Type to Search For /227
6.4.5 Inferringthe Node Types to Search Via /227
6.4.6 Capturing KeywordCo-occurrence /228
6.4.7 Relevance-OrientedRanking /228
6.5 Summary /231
References /231
7 XML Keyword Pattern Re.nement/233
7.1 IntroducingXML Keyword Pattern Re.nement/233
7.2 Related Work/237
7.3 Preliminaries /238
7.3.1 MeaningfulSLCA/238
7.3.2 Re.nementOperations/240
7.4 Ranking of Re.ned Queries /242
7.4.1 Similarity Score of a RQ/243
7.4.2 DependenceScore of a RQ /245
7.5 Exploringthe Re.ned Query /247
7.5.1 ProblemFormulation/247
7.5.2 Subproblems/247
7.5.3 Notations/247
7.5.4 Initialization /248
7.5.5 RecurrenceFunction /248
7.5.6 Time Complexity/248
7.6 Content-Aware Query Re.nement /249
7.6.1 Partition-Based Algorithm/250
7.6.2 Short-ListEagerAlgorithm/253
7.7 Experiments /256
7.7.1 Equipment /256
7.7.2 Dataset and Query Set/256
7.7.3 Ef.ciency /257
7.7.4 Scalability/259
7.7.5 Effectivenessof Query Re.nement/261
7.8 Summary /264
References /264
8 LCRA, XML Keyword Search System, and LotusX, Graphical Query Processing System /267
8.1 Introductionof LCRA and LotusX/267
8.2 LCRA: Search Semantics/268
8.2.1 SLCA and LRA /268
8.2.2 Backgroundand Data Model /269
8.2.3 SearchSemantics/270
8.3 LCRA, System Architecture, and Ranking Techniques/272
8.3.1 Tree Model/272
8.3.2 RankingTechniques/273
8.3.3 SystemArchitecture/274
8.3.4 OverviewofOnlineDemoFeatures/274
8.4 A Position-Aware XML Graphical Search System with Auto-completion/276
8.4.1 System Features /276
8.4.2 LotusX:ArchitectureandAlgorithms/278
8.5 Summary /282
References /283
9 Summary and the Road Ahead /285
9.1 Summary of This Book /285
9.2 Future Work /286
9.2.1 Full-FledgedXML Query Engine /286
9.2.2 Directed Graph XML Model/286
9.2.3 ExtendedDewey Labeling Scheme for OrderedQuery /287
9.2.4 IndexStructure Based on TJFast /287
9.2.5 MapReduce-BasedXML Twig Pattern Matching /288
References /288
Index /289