搜索引擎信息检索实践(英文版)/经典原版书库(美)克罗夫特机械工业出版社豆瓣PDF电子书bt网盘迅雷下载人文社科-社会科学-社会科学总论-霍普软件下载网

1 Search Engines and Information Retrieval

　1.1 What Is Information Retrieval？

　1.2 The Big Issues

　1.3 Search Engines

　1.4 Search Engineers

2 Architecture of a Search Engine

　2.1 What Is an Architecture？

　2.2 Basic Building Blocks

　2.3 Breaking It Down

　　2.3.1 Text Acquisition

　　2.3.2 Text Transformation

　　2.3.3 Index Creation

　　2.3.4 User Interaction

　　2.3.5 Ranking

　　2.3.6 Evaluation

　2.4 How Does It Really Work？

3 Crawls and Feeds

　3.1 Deciding What to Search

　3.2 Crawling the Web

　　3.2.1 Retrieving Web Pages

　　3.2.2 The Web Crawler

　　3.2.3 Freshness

　　3.2.4 Focused Crawling

　　3.2.5 Deep Web

　 3.2.6 Sitemaps

　 3.2.7 Distributed Crawling

3.3 Crawling Documents and Email

3.4 Document Feeds

3.5 The Conversion Problem

　 3.5.1 Character Encodings

3.6 Storing the Documents

　 3.6.1 Using a Database System

　 3.6.2 Random Access

　 3.6.3 Compression and Large Files

　 3.6.4 Update

　 3.6.5 BigTable

　3.7 Detecting Duplicates

　3.8 Removing Noise

4 Processing Text

　4.1 From Words to Terms

　4.2 Text Statistics

　 4.2.1 Vocabulary Growth

　 4.2.2 Estimating Collection and Result Set Sizes

4.3 Document Parsing

　 4.3.1 Overview

　 4.3.2 Tokenizing

　 4.3.3 Stopping

　 4.3.4 Stemming

　 4.3.5 Phrases and N-grams

4.4 Document Structure and Markup

4.5 Link Analysis

　 4.5.1 Anchor Text

　 4.5.2 PageRank

　 4.5.3 Link Quality

4.6 Information Extraction

　 4.6.1 Hidden Markov Models for Extraction

4.7 Internationalization

5 Ranking with Indexes

5.1 Overview

5.2 Abstract Model of Ranking

5.3 Inverted Indexes

5.3.1 Documents

5.3.2 Counts

5.3.3 Positions

5.3.4 Fields and Extents

5.3.5 Scores

5.3.6 Ordering

5.4 Compression

5.4.1 Entropy and Ambiguity

5.4.2 Delta Encoding

5.4.3 Bit-Aligned Codes

5.4.4 Byte-Aligned Codes

5.4.5 Compression in Practice

5.4.6 Looking Ahead

5.4.7 Skipping and Skip Pointers

5.5 Auxiliary Structures

5.6 Index Construction

5.6.1 Simple Construction

5.6.2 Merging

5.6.3 Parallelism and Distribution

5.6.4 Update

5.7 Query Processing

5.7.1 Document-at-a-time Evaluation

5.7.2 Term-at-a-time Evaluation

5.7.3 Optimization Techniques

5.7.4 Structured Queries

5.7.5 Distributed Evaluation

5.7.6 Caching

6 Queries and Interfaces

6.1 Information Needs and Queries

6.2 Query Transformation and Refinement

6.2.1 Stopping and Stemming Revisited

6.2.2 Spell Checking and Suggestions

6.2.3 Query Expansion

6.2.4 Relevance Feedback

6.2.5 Context and Personalization

6.3 Showing the Results

6.3.1 Result Pages and Snippets

6.3.2 Advertising and Search

6.3.3 Clustering the Results

6.4 Cross-Language Search

7 Retrieval Models

7.1 Overview of Retrieval Models

7.1.1 Boolean Retrieval

7.1.2 The Vector Space Model

7.2 Probabilistic Models

7.2.1 Information Retrieval as Classification

7.2.2 The BM25 Ranking Algorithm

7.3 Ranking Based on Language Models

7.3.1 Query Likelihood Ranking

7.3.2 Relevance Models and Pseudo-Relevance Feedback

7.4 Complex Queries and Combining Evidence

7.4.1 The Inference Network Model

7.4.2 The Galago Query Language

7.5 Web Search

7.6 Machine Learning and Information Retrieval

7.6.1 Learningto Rank

7.6.2 Topic Models and Vocabulary Mismatch

7.7 Application-Based Models

8 Evaluating Search Engines

8.1 Why Evaluate ？

8.2 The Evaluation Corpus

8.3 Logging

8.4 Effectiveness Metrics

8.4.1 Recall and Precision

8.4.2 Averaging and Interpolation

8.4.3 Focusing on the Top Documents

8.4.4 Using Preferences

8.5 Efficiency Metrics

8.6 Training, Testing, and Statistics

8.6.1 Significance Tests

8.6.2 Setting Parameter Values

8.6.3 Online Testing

8.7 The Bottom Line

9 Classification and Clustering

9.1 Classification and Categorization

9.1.1 Naive Bayes

9.1.2 Support Vector Machines

9.1.3 Evaluation

9.1.4 Classifier and Feature Selection

9.1.5 Spam, Sentiment, and Online Advertising

9.2 Clustering

9.2.1 Hierarchical and K-Means Clustering

9.2.2 K Nearest Neighbor Clustering

9.2.3 Evaluation

9.2.4 How to Choose K

9.2.5 Clustering and Search

10 Social Search

10.1 What Is Social Search？

10.2 User Tags and Manual Indexing

10.2.1 Searching Tags

10.2.2 Inferring Missing Tags

10.2.3 Browsing and Tag Clouds

10.3 Searching with Communities

10.3.1 What Is a Community？

10.3.2 Finding Communities

10.3.3 Community-Based Question Answering

10.3.4 Collaborative Searching

10.4 Filtering and Recommending

10.4.1 Document Filtering

10.4.2 Collaborative Filtering

10.5 Peer-to-Peer and Metasearch

10.5.1 Distributed Search

10.5.2 P2P Networks

11 Beyond Bag of Words

11.1 Overview

11.2 Feature-Based Retrieval Models

11.3 Term Dependence Models

11.4 Structure Revisited

11.4.1 XML Retrieval

11.4.2 Entity Search

11.5 Longer Questions, Better Answers

11.6 Words, Pictures, and Music

11.7 One Search Fits All？

References

Index

书名	搜索引擎信息检索实践(英文版)/经典原版书库
分类	人文社科-社会科学-社会科学总论
作者	(美)克罗夫特
出版社	机械工业出版社
下载
简介	编辑推荐这是本全英文版本的信息检索知识读本。主要介绍了信息检索(IR)中的11个关键问题以及其如何影响搜索引擎的设计与实现，并且用数学模型强化了重要的概念。本书内容丰富，针对性、实用性较强，适合作为高等院校计算机科学或计算机工程专业的本科生、研究生的教材使用。内容推荐本书介绍了信息检索(IR)中的关键问题，以及这些问题如何影响搜索引擎的设计与实现，并且用数学模型强化了重要的概念。对于网络搜索引擎这一重要的话题，书中主要涵盖了在网络上广泛使用的搜索技术。本书适用于高等院校计算机科学或计算机工程专业的本科生、研究生，对于专业人士而言，本书也不失为一本理想的入门教材。目录 1 Search Engines and Information Retrieval 　1.1 What Is Information Retrieval？　1.2 The Big Issues 　1.3 Search Engines 　1.4 Search Engineers 2 Architecture of a Search Engine 　2.1 What Is an Architecture？　2.2 Basic Building Blocks 　2.3 Breaking It Down 　　2.3.1 Text Acquisition 　　2.3.2 Text Transformation 　　2.3.3 Index Creation 　　2.3.4 User Interaction 　　2.3.5 Ranking 　　2.3.6 Evaluation 　2.4 How Does It Really Work？ 3 Crawls and Feeds 　3.1 Deciding What to Search 　3.2 Crawling the Web 　　3.2.1 Retrieving Web Pages 　　3.2.2 The Web Crawler 　　3.2.3 Freshness 　　3.2.4 Focused Crawling 　　3.2.5 Deep Web 　 3.2.6 Sitemaps 　 3.2.7 Distributed Crawling 3.3 Crawling Documents and Email 3.4 Document Feeds 3.5 The Conversion Problem 　 3.5.1 Character Encodings 3.6 Storing the Documents 　 3.6.1 Using a Database System 　 3.6.2 Random Access 　 3.6.3 Compression and Large Files 　 3.6.4 Update 　 3.6.5 BigTable 　3.7 Detecting Duplicates 　3.8 Removing Noise 4 Processing Text 　4.1 From Words to Terms 　4.2 Text Statistics 　 4.2.1 Vocabulary Growth 　 4.2.2 Estimating Collection and Result Set Sizes 4.3 Document Parsing 　 4.3.1 Overview 　 4.3.2 Tokenizing 　 4.3.3 Stopping 　 4.3.4 Stemming 　 4.3.5 Phrases and N-grams 4.4 Document Structure and Markup 4.5 Link Analysis 　 4.5.1 Anchor Text 　 4.5.2 PageRank 　 4.5.3 Link Quality 4.6 Information Extraction 　 4.6.1 Hidden Markov Models for Extraction 4.7 Internationalization 5 Ranking with Indexes 5.1 Overview 5.2 Abstract Model of Ranking 5.3 Inverted Indexes 5.3.1 Documents 5.3.2 Counts 5.3.3 Positions 5.3.4 Fields and Extents 5.3.5 Scores 5.3.6 Ordering 5.4 Compression 5.4.1 Entropy and Ambiguity 5.4.2 Delta Encoding 5.4.3 Bit-Aligned Codes 5.4.4 Byte-Aligned Codes 5.4.5 Compression in Practice 5.4.6 Looking Ahead 5.4.7 Skipping and Skip Pointers 5.5 Auxiliary Structures 5.6 Index Construction 5.6.1 Simple Construction 5.6.2 Merging 5.6.3 Parallelism and Distribution 5.6.4 Update 5.7 Query Processing 5.7.1 Document-at-a-time Evaluation 5.7.2 Term-at-a-time Evaluation 5.7.3 Optimization Techniques 5.7.4 Structured Queries 5.7.5 Distributed Evaluation 5.7.6 Caching 6 Queries and Interfaces 6.1 Information Needs and Queries 6.2 Query Transformation and Refinement 6.2.1 Stopping and Stemming Revisited 6.2.2 Spell Checking and Suggestions 6.2.3 Query Expansion 6.2.4 Relevance Feedback 6.2.5 Context and Personalization 6.3 Showing the Results 6.3.1 Result Pages and Snippets 6.3.2 Advertising and Search 6.3.3 Clustering the Results 6.4 Cross-Language Search 7 Retrieval Models 7.1 Overview of Retrieval Models 7.1.1 Boolean Retrieval 7.1.2 The Vector Space Model 7.2 Probabilistic Models 7.2.1 Information Retrieval as Classification 7.2.2 The BM25 Ranking Algorithm 7.3 Ranking Based on Language Models 7.3.1 Query Likelihood Ranking 7.3.2 Relevance Models and Pseudo-Relevance Feedback 7.4 Complex Queries and Combining Evidence 7.4.1 The Inference Network Model 7.4.2 The Galago Query Language 7.5 Web Search 7.6 Machine Learning and Information Retrieval 7.6.1 Learningto Rank 7.6.2 Topic Models and Vocabulary Mismatch 7.7 Application-Based Models 8 Evaluating Search Engines 8.1 Why Evaluate ？ 8.2 The Evaluation Corpus 8.3 Logging 8.4 Effectiveness Metrics 8.4.1 Recall and Precision 8.4.2 Averaging and Interpolation 8.4.3 Focusing on the Top Documents 8.4.4 Using Preferences 8.5 Efficiency Metrics 8.6 Training, Testing, and Statistics 8.6.1 Significance Tests 8.6.2 Setting Parameter Values 8.6.3 Online Testing 8.7 The Bottom Line 9 Classification and Clustering 9.1 Classification and Categorization 9.1.1 Naive Bayes 9.1.2 Support Vector Machines 9.1.3 Evaluation 9.1.4 Classifier and Feature Selection 9.1.5 Spam, Sentiment, and Online Advertising 9.2 Clustering 9.2.1 Hierarchical and K-Means Clustering 9.2.2 K Nearest Neighbor Clustering 9.2.3 Evaluation 9.2.4 How to Choose K 9.2.5 Clustering and Search 10 Social Search 10.1 What Is Social Search？ 10.2 User Tags and Manual Indexing 10.2.1 Searching Tags 10.2.2 Inferring Missing Tags 10.2.3 Browsing and Tag Clouds 10.3 Searching with Communities 10.3.1 What Is a Community？ 10.3.2 Finding Communities 10.3.3 Community-Based Question Answering 10.3.4 Collaborative Searching 10.4 Filtering and Recommending 10.4.1 Document Filtering 10.4.2 Collaborative Filtering 10.5 Peer-to-Peer and Metasearch 10.5.1 Distributed Search 10.5.2 P2P Networks 11 Beyond Bag of Words 11.1 Overview 11.2 Feature-Based Retrieval Models 11.3 Term Dependence Models 11.4 Structure Revisited 11.4.1 XML Retrieval 11.4.2 Entity Search 11.5 Longer Questions, Better Answers 11.6 Words, Pictures, and Music 11.7 One Search Fits All？ References Index
随便看	语文期末分项复习21天(6下)/核心素养天天练小提琴演奏教程(高职卷全国艺术职业教育系列教材) 科学(7下ZJ)/教材解读遗产(第3辑) 政府会计准则制度重点难点精解--政府会计实务有问必答/公共部门财务与会计问答丛书中国农产品加工业年鉴(2019)(精) 脾好命就好基本乐理实战强化训练/普通高校艺术专业招生统一考试复习备考丛书音乐欣赏(高等学校通识课程系列教材) 计算机辅助旅游景观规划设计思想政治/创新设计学业水平考试五年级数学(下BS版)/星级口算天天练兽医外科学(第5版)/全国高等院校兽医专业教材经典系列语文(2下)/名师点拨默写练习册高中数学名师百问百答(浙江卷临门一脚) 历史与社会道德与法治(丽水专版2021)/初中学业考试总复习广西金融前沿报告(2020) 城市轨道交通工程BIM技术应用/城市轨道交通工程建设技术丛书数学分层课课练(7下ZH配浙教版教科书使用) 政府会计制度实务案例详解(2021年版科目使用规则+会计分录编制+特殊业务处理政府会计制度培训用书) 金字塔故事书/探索之旅外星人故事书/探索之旅电子商务创业实战(中等职业教育实战型电子商务规划教材) 科学(9下ZJ)/教材解读历史(必修中外历史纲要下浙江省2020-2021学年)/高中全程学习导与练名爵MGLive在线车友互动平台个人隐私保险箱百合婚恋天天基金app官方最新版方正手迹造字2022版抖音极速版抱抱直播平安证券手机版美团2022最新版创元财讯通 Steam桌面令牌 v1.0.10.0 尼尔机械纪元尼尔转生Blade of Early Spring之刃MOD v1.77 纪元1800装饰性建筑合集MOD v2.36 尼尔机械纪元Grass草地环境高清Mod v3.55 拳皇15练场采集拳皇98江坂站MOD v1.43 尼尔机械纪元命运2中的遏制镰刀MOD v3.67 怪物猎人崛起泥翁龙Z大修MOD v3.36 艾尔登法环侍刃替换长牙MOD v2.0 艾尔登法环神圣的君主和狂热的狂热盔甲和盾牌MOD v2.76 尼尔机械纪元原神夜阑替换2B人物MOD v3.52 blender bless blessed blessing blether blew blight blighter blimey blimp [BT下载][了不起的麦瑟尔夫人第二季][全10集][WEB-MKV/12.02G][中文字幕][1080P][H265][BitsTV] 剧集 2018 美国剧情打包 [BT下载][了不起的麦瑟尔夫人第三季][全8集][WEB-MKV/10.48G][中文字幕][1080P][H265][BitsTV] 剧集 2019 美国剧情打包 [BT下载][星际之门：亚特兰蒂斯第一季][全20集][BD-MKV/20.12G][中文字幕][H265][蓝光压制][BitsTV] 剧集更早美国剧情打包 [BT下载][星际之门：亚特兰蒂斯第二季][全20集][BD-MKV/20.18G][中文字幕][H265][蓝光压制][BitsTV] 剧集更早美国剧情打包 [BT下载][星际之门：亚特兰蒂斯第三季][全20集][BD-MKV/20.19G][中文字幕][H265][蓝光压制][BitsTV] 剧集 2006 美国剧情打包 [BT下载][星际之门：亚特兰蒂斯第四季][全20集][BD-MKV/20.13G][中文字幕][H265][蓝光压制][BitsTV] 剧集 2007 美国剧情打包 [BT下载][星际之门：亚特兰蒂斯第五季][全20集][BD-MKV/20.32G][中文字幕][H265][蓝光压制][BitsTV] 剧集 2008 美国剧情打包 [BT下载][体育老师笑传第一季][全6集][BD-MKV/7.55G][中文字幕][1080P][H265][蓝光压制][BitsTV] 剧集 2009 美国喜剧打包 [BT下载][体育老师笑传第二季][全7集][BD-MKV/9.04G][中文字幕][1080P][H265][蓝光压制][BitsTV] 剧集 2010 美国喜剧打包 [BT下载][体育老师笑传第三季][全8集][BD-MKV/10.34G][中文字幕][1080P][H265][蓝光压制][BitsTV] 剧集 2012 美国喜剧打包趣头条如何领取收徒红包？趣头条收徒红包领取方法 Win7如何恢复注册表？Win7恢复注册表的方法 Win7文本文档乱码怎么解决？Win7文本文档乱码的解决方法下厨房app如何设置密码下厨房app设置密码步骤 Win10小娜没有声音怎么办 Win10小娜没有声音解决办法抖音里面的音乐如何下载到手机把抖音上音乐下载到手机里的方法悟空问答红包明细怎么看悟空问答红包明细如何看抖音资料不合法是怎么回事抖音资料不合法解决办法 Win7自己动手清理系统垃圾文件的方法 Win8如何显示并删除隐藏文件夹MSOCache