本書介紹了信息檢索(1R)中的關鍵問題。以及這些問題如何影響搜索引擎的設計與實現(xiàn),并且用數(shù)學模型強化了重要的概念。對于網絡搜索引擎這一重要的話題,書中主要涵蓋了在網絡上廣泛使用的搜索技術。 本書適用于高等院校計算機科學或計算機工程專業(yè)的本科生、研究生,對于專業(yè)人士而言,本書也不失為一本理想的入門教材。 作者簡介: W.BruceCroft馬薩諸塞大學阿默斯特分校計算機科學特聘教授、ACM會士。他創(chuàng)建了智能信息檢索研究中心,發(fā)表了200余篇論文,多次獲獎,其中包括2003年由ACMSIGIR頒發(fā)的GerardSalton獎。 目錄: 1SearchEnginesandInformationRetrieva l 1.1WhatIsInformationRetrieva l? 1.2TheBigIssues 1.3SearchEngines 1.4SearchEngineers 2ArchitectureofaSearchEngine 2.1WhatIsanArchitecture? 2.2BasicBuildingBlocks 2.3BreakingItDown 2.3.1TextAcquisition 2.3.2TextTransformation 2.3.3IndexCreation 2.3.4UserInteraction 2.3.5Ranking 2.3.6eva luation1SearchEnginesandInformationRetrieva l 1.1WhatIsInformationRetrieva l? 1.2TheBigIssues 1.3SearchEngines 1.4SearchEngineers 2ArchitectureofaSearchEngine 2.1WhatIsanArchitecture? 2.2BasicBuildingBlocks 2.3BreakingItDown 2.3.1TextAcquisition 2.3.2TextTransformation 2.3.3IndexCreation 2.3.4UserInteraction 2.3.5Ranking 2.3.6eva luation 2.4HowDoesItReallyWork? 3CrawlsandFeeds 3.1DecidingWhattoSearch 3.2CrawlingtheWeb 3.2.1RetrievingWebPages 3.2.2TheWebCrawler 3.2.3Freshness 3.2.4FocusedCrawling 3.2.5DeepWeb 3.2.6Sitemaps 3.2.7DistributedCrawling 3.3CrawlingDocumentsandEmail 3.4DocumentFeeds 3.5TheConversionProblem 3.5.1CharacterEncodings 3.6StoringtheDocuments 3.6,1UsingaDatabaseSystem 3.6.2RandomAccess 3.6.3CompressionandLargeFiles 3.6.4Update 3.6.5BigTable 3.7DetectingDuplicates 3.8RemovingNoise 4ProcessingText 4.1FromWordstoTerms 4.2TextStatistics 4.2.1VocabularyGrowth 4.2.2EstimatingCollectionandResultSetSizes 4.3DocumentParsing 4.3.1Overview 4.3.2Tokenizing 4.3.3Stopping 4.3.4Stemming 4.3.5PhrasesandN-grams 4.4DocumentStructureandMarkup 4.5LinkAnalysis 4.5.1AnchorText 4.5.2PageRank 4.5.3LinkQuality 4.6InformationExtraction 4.6.1HiddenMarkovModelsforExtraction 4.7Internationalization 5RankingwithIndexes 6QueriesandInterfaces 7Retrieva lModels 8eva luatingSearchEngines 9ClassificationandClustering 10SocialSearch 11BeyondBagofWords Reverences Index
|