This item is a Poster.
- Zheng, Shuyi - The Pennsylvania State University
- Dmitriev, Pavel - Yahoo! Laboratories
- Lee Giles, C. - The Pennsylvania State University
Published Version
| PDF (576Kb) |
Abstract
This paper identifies and explores the problem of seed selection in a web-scale crawler. We argue that seed selection is not a trivial but very important problem. Selecting proper seeds can increase the number of pages a crawler will discover, and can result in a collection with more “good” and less “bad” pages. Based on the analysis of the graph structure of the web, we propose several seed selection algorithms. Effectiveness of these algorithms is proved by our experimental results on real web data.
Export Record As...
- HTML Citation
- ASCII Citation
- Resource Map
- OpenURL ContextObject
- EndNote
- BibTeX
- OpenURL ContextObject in Span
- MODS
- DIDL
- EP3 XML
- JSON
- Dublin Core
- Reference Manager
- Eprints Application Profile
- Simple Metadata
- Refer
- METS