Publication
ACM HT 2004
Conference paper
Automatic categorization of Web sites based on source types
Abstract
An important issue with the Web is verification of the accuracy, currency and authenticity of the information associated with Web sites. One way to address this problem is to identify the "source" or "sponsor" of the Web site. However, source identification is non-trivial because the source of a Web site cannot always be determined by the URL or content of the site. In this paper, we propose a method for source identification that uses various types of inbound, outbound and internal interactions that arise due to hyperlinks between and within Web sites.