creators_name: Yan, Hao
creators_name: Ding, Shuai
creators_name: Suel, Torsten
type: conference_item
datestamp: 2009-04-06 19:09:33
lastmod: 2009-05-11 11:11:52
metadata_visibility: show
title: Inverted Index Compression and Query Processing with Optimized Document Ordering
ispublished: pub
full_text_status: public
pres_type: paper
abstract: Web search engines use highly optimized compression schemes
to decrease inverted index size and improve query through-
put, and many index compression techniques have been stud-
ied in the literature. One approach taken by several recent
studies [7, 23, 25, 6, 24] first performs a renumbering of the
document IDs in the collection that groups similar documents
together, and then applies standard compression techniques.
It is known that this can significantly improve index com-
pression compared to a random document ordering.
   We study index compression and query processing tech-
niques for such reordered indexes. Previous work has focused
on determining the best possible ordering of documents. In
contrast, we assume that such an ordering is already given,
and focus on how to optimize compression methods and query
processing for this case. We perform an extensive study of
compression techniques for document IDs and present new
optimizations of existing techniques which can achieve signif-
icant improvement in both compression and decompression
performances. We also propose and evaluate techniques for
compressing frequency values for this case. Finally, we study
the effect of this approach on query processing performance.
Our experiments show very significant improvements in in-
dex size and query processing speed on the TREC GOV2
collection of 25.2 million web pages.

date: 2009-04
pagerange: 401-401
event_title: 18th International World Wide Web Conference
event_location: Madrid, Spain
event_dates: April 20th-24th, 2009
event_type: conference
refereed: TRUE
citation: Yan, Hao <http://www2009.eprints.org/view/author/Yan=3AHao=3A=3A.html> and Ding, Shuai <http://www2009.eprints.org/view/author/Ding=3AShuai=3A=3A.html> and Suel, Torsten <http://www2009.eprints.org/view/author/Suel=3ATorsten=3A=3A.html> (2009) Inverted Index Compression and Query Processing with Optimized Document Ordering. In: 18th International World Wide Web Conference, April 20th-24th, 2009, Madrid, Spain.
document_url: http://www2009.eprints.org/41/1/p401.pdf
document_url: http://www2009.eprints.org/41/2/comp-www.ppt