title: Inverted Index Compression and Query Processing with Optimized Document Ordering
creator: Yan, Hao
creator: Ding, Shuai
creator: Suel, Torsten
description: Web search engines use highly optimized compression schemes  to decrease inverted index size and improve query through-  put, and many index compression techniques have been stud-  ied in the literature. One approach taken by several recent  studies [7, 23, 25, 6, 24] first performs a renumbering of the  document IDs in the collection that groups similar documents  together, and then applies standard compression techniques.  It is known that this can significantly improve index com-  pression compared to a random document ordering.     We study index compression and query processing tech-  niques for such reordered indexes. Previous work has focused  on determining the best possible ordering of documents. In  contrast, we assume that such an ordering is already given,  and focus on how to optimize compression methods and query  processing for this case. We perform an extensive study of  compression techniques for document IDs and present new  optimizations of existing techniques which can achieve signif-  icant improvement in both compression and decompression  performances. We also propose and evaluate techniques for  compressing frequency values for this case. Finally, we study  the effect of this approach on query processing performance.  Our experiments show very significant improvements in in-  dex size and query processing speed on the TREC GOV2  collection of 25.2 million web pages.  
date: 2009-04
type: Conference or Workshop Item
type: PeerReviewed
format: application/pdf
identifier: http://www2009.eprints.org/41/1/p401.pdf
format: application/vnd.ms-powerpoint
identifier: http://www2009.eprints.org/41/2/comp-www.ppt
identifier: Yan, Hao <http://www2009.eprints.org/view/author/Yan=3AHao=3A=3A.html> and Ding, Shuai <http://www2009.eprints.org/view/author/Ding=3AShuai=3A=3A.html> and Suel, Torsten <http://www2009.eprints.org/view/author/Suel=3ATorsten=3A=3A.html> (2009) Inverted Index Compression and Query Processing with Optimized Document Ordering. In: 18th International World Wide Web Conference, April 20th-24th, 2009, Madrid, Spain.
relation: http://www2009.eprints.org/41/