Publication
CIKM 1999
Conference paper

Mining the web for acronyms using the duality of patterns and relations

Download paper

Abstract

The Web is a ric h source of information, but this infor-mation is scattered and hidden in the diversity of web pages. Searc hengines are windows to the web. How-ev er, the current searc h engines, designed to identify pages with specifipehdrases, ha ve very limited pow er.F or example, they cannot search for phrases related in a particular way (e.g. books and their authors).In this paper w e present a solution for iden tifying a set of inter-related information on the web using the duality concept. Duality problems arise when one tries to identify a pair of inter-related phrases such as (book, author), (name, email) or (acronym, expansion) rela- tions. We propose a solution to this problem that it- erativ ely refines mutually dependent approximations to their identifications. Specifically, we iterativ ely refinei) pairs of phrases related in a specific way, andii) the pat- terns of their occurrences in web pages, i.e. the ways in which the related phrases are marked in the pages. We cast ligh t on the general solution of the duality prob- lems in the web by concentrating on one paradigmatic duality problemi.,e. iden tifying (acronym, expansion) pairs in terms of the patterns of their occurrences in the w ebpages. The solution to this problem involv es tw o mutually dependent duality problems of 1) the duality between the related pairs and their patterns, and 2) the duality betw een the related pairs and the acronym formulation rules.

Date

Publication

CIKM 1999

Authors

Resources

Share