creators_name: Zheng, Shuyi
creators_name: Dmitriev, Pavel
creators_name: Lee Giles, C.
type: conference_item
datestamp: 2009-04-06 19:12:27
lastmod: 2009-04-14 04:37:10
metadata_visibility: show
title: Graph Based Crawler Seed Selection
ispublished: pub
full_text_status: public
pres_type: poster
abstract: This paper identifies and explores the problem of seed selection in a web-scale crawler. We argue that seed selection is not a trivial but very important problem. Selecting proper seeds can increase the number of pages a crawler will discover, and can result in a collection with more “good” and less “bad” pages. Based on the analysis of the graph structure of the web, we propose several seed selection algorithms. Effectiveness of these algorithms is proved by our experimental results on real web data. 
date: 2009-04
pagerange: 1089-1089
event_title: 18th International World Wide Web Conference
event_location: Madrid, Spain
event_dates: April 20th-24th, 2009
event_type: conference
refereed: TRUE
citation: Zheng, Shuyi <http://www2009.eprints.org/view/author/Zheng=3AShuyi=3A=3A.html> and Dmitriev, Pavel <http://www2009.eprints.org/view/author/Dmitriev=3APavel=3A=3A.html> and Lee Giles, C. <http://www2009.eprints.org/view/author/Lee_Giles=3AC=2E=3A=3A.html> (2009) Graph Based Crawler Seed Selection. In: 18th International World Wide Web Conference, April 20th-24th, 2009, Madrid, Spain.
document_url: http://www2009.eprints.org/125/1/p1089.pdf