The Hardware/Software Balancing Act for Information Retrieval on Symmetric Multiprocessors

Zhihong Lu and Kathryn S. McKinley and Brendon Cahoon

Abstract
Web search engines, such as AltaVista and Infoseek, handletremendous loads by exploiting the parallelism implicit in their tasksand using symmetric multiprocessors to support their services.The web searching problem that they solve is a special case ofthe more general information retrieval (IR) problem of locating documents relevant to the information need of users.In this paper, we investigate how to exploit a symmetricmultiprocessor to build high performance IR servers. Although the problem can be solved by throwing lots of CPU and diskresources at it, the important questions are how much of which hardware and what software structure is needed to effectively exploit hardware resources.We have found, to our surprise, that in some cases adding hardware degrades performance rather than improves it. We show that multiple threads are needed to fully utilize hardwareresources. Our investigation is based on InQuery, a state-of-the-artfull-text information retrieval engine, that is widely used in Websearch engines, large libraries, companies, and government agenciessuch as Infoseek, Library of Congress, White House, West Publishing,and Lotus.
Contact
Zhihong Lu
Department of Computer Science,University of Massachusetts at Amherst,Amherst, MA 01002,U.S.A,
zlu@cs.umass.edu