Number of items: 2.
Yan, Hao and
Ding, Shuai and
Suel, Torsten Inverted Index Compression and Query Processing with Optimized Document Ordering. Web search engines use highly optimized compression schemes to decrease inverted index size and improve query through- put, and many index compression techniques have been stud- ied in the literature. One approach taken by several recent studies [7, 23, 25, 6, 24] first performs a renumbering of the document IDs in the collection that groups similar documents together, and then applies standard compression techniques. It is known that this can significantly improve index com- pression compared to a random document ordering. We study index compression and query processing tech- niques for such reordered indexes. Previous work has focused on determining the best possible ordering of documents. In contrast, we assume that such an ordering is already given, and focus on how to optimize compression methods and query processing for this case. We perform an extensive study of compression techniques for document IDs and present new optimizations of existing techniques which can achieve signif- icant improvement in both compression and decompression performances. We also propose and evaluate techniques for compressing frequency values for this case. Finally, we study the effect of this approach on query processing performance. Our experiments show very significant improvements in in- dex size and query processing speed on the TREC GOV2 collection of 25.2 million web pages.
Ding, Shuai and
He, Jinru and
Yan, Hao and
Suel, Torsten Using Graphics Processors for High Performance IR Query Processing. Web search engines are facing formidable performance challenges due to data sizes and query loads. The major engines have to process tens of thousands of queries per second over tens of billions of documents. To deal with this heavy workload, such engines employ massively parallel systems consisting of thousands of machines. The significant cost of operating these systems has motivated a lot of recent research into more efficient query processing mechanisms. We investigate a new way to build such high performance IR systems using graphical processing units (GPUs). GPUs were originally designed to accelerate computer graphics applications through massive on-chip parallelism. Recently a number of researchers have studied how to use GPUs for other problem domains such as databases and scientific computing [9, 8, 12]. Our contribution here is to design a basic system architecture for GPU-based high-performance IR, to develop suitable algorithms for subtasks such as inverted list compression, list intersection, and top-k scoring, and to show how to achieve highly efficient query processing on GPUbased systems. Our experimental results for a prototype GPU-based system on 25.2 million web pages shows promising gains in query throughput.
This list was generated on Fri Feb 15 08:46:08 2019 GMT.
About this site
This website has been set up for WWW2009 by Christopher Gutteridge of the University of Southampton, using our EPrints software.
Preservation
We (Southampton EPrints Project) intend to preserve the files and HTML pages of this site for many years, however we will turn it into flat files for long term preservation. This means that at some point in the months after the conference the search, metadata-export, JSON interface, OAI etc. will be disabled as we "fossilize" the site. Please plan accordingly. Feel free to ask nicely for us to keep the dynamic site online longer if there's a rally good (or cool) use for it... [this has now happened, this site is now static]