Publication
WWW 2001
Conference paper
An adaptive model for optimizing performance of an incremental web crawler
Abstract
This paper outlines the design of a web crawler implemented for IBM Almaden's WebFountain project and describes an optimization model for controlling the crawl strategy. This crawler is scalable and incremental. The model makes no assumptions about the statistical behaviour of web page changes, but rather uses an adaptive approach to maintain data on actual change rates which are in turn used as inputs for the optimization. Computational results with simulated but realistic data show that there is no magic bullet'-dif-ferent, but equally plausible, objectives lead to conicting òptimal' strategies. However, we find that there are com-promise objectives which lead to good strategies that are robust against a number of criteria.