Scalable topic-specific influence analysis on microblogs
Bin Bi, Yuanyuan Tian, et al.
WSDM 2014
Wikipedia infoboxes is an example of a seemingly structured, yet extraordinarily heterogenous dataset, where any given record has only a tiny fraction of all possible fields. Such data cannot be queried using traditional means without a massive a priori integration effort, since even for a simple request the result values span many record types and fields. On the other hand, the solutions based on keyword search are too imprecise to capture user's intent. To address these limitations, we propose a system, referred to herein as WIKIANALYTICS, that utilizes a novel search paradigm in order to derive tables of precise and complete results from Wikipedia infobox records. The user starts with a keyword search query that finds a superset of the result records, and then browses clusters of records deciding which are and are not relevant. WIKIANALYTICS uses three categories of clustering features based on record types, fields, and values that matched the query keywords, respectively. Since the system cannot predict which combination of features will be important to the user, it efficiently generates all possible clusters of records by all sets of features. We utilize a novel data structure, universal navigational lattice (UNL), that compactly encodes all possible clusters. WIKIANALYTICS provides a dynamic and intuitive interface that lets the user explore the UNL and construct homogeneous structured tables, which can be further queried and aggregated using the conventional tools.
Bin Bi, Yuanyuan Tian, et al.
WSDM 2014
Alain Biem, Eric Bouillet, et al.
SIGMOD 2010
Laura Chiticariu, Yunyao Li, et al.
SIGMOD 2010
Andrey Balmin, Tim Kaldewey, et al.
SIGMOD 2012