Publication
WWW 2001
Conference paper

An adaptive model for optimizing performance of an incremental web crawler

View publication

Abstract

This paper outlines the design of a web crawler implemented for IBM Almaden's WebFountain project and describes an optimization model for controlling the crawl strategy. This crawler is scalable and incremental. The model makes no assumptions about the statistical behaviour of web page changes, but rather uses an adaptive approach to maintain data on actual change rates which are in turn used as inputs for the optimization. Computational results with simulated but realistic data show that there is no magic bullet'-dif-ferent, but equally plausible, objectives lead to conicting òptimal' strategies. However, we find that there are com-promise objectives which lead to good strategies that are robust against a number of criteria.

Date

Publication

WWW 2001

Authors

Share