Comparing apples to oranges: A scalable solution with heterogeneous hashing
Abstract
Although hashing techniques have been popular for the large scale similarity search problem, most of the existing methods for designing optimal hash functions focus on homogeneous similarity assessment, i.e., the data entities to be indexed are of the same type. Realizing that heterogeneous entities and relationships are also ubiquitous in the real world applications, there is an emerging need to retrieve and search similar or relevant data entities from multiple heterogeneous domains, e.g., recommending relevant posts and images to a certain Facebook user. In this paper, we address the problem of "comparing apples to oranges" under the large scale setting. Specifically, we propose a novel Relation-Aware Heterogeneous Hashing (RaHH), which provides a general framework for generating hash codes of data entities sitting in multiple heterogeneous domains. Unlike some existing hashing methods that map heterogeneous data in a common Hamming space, the RaHH approach constructs a Hamming space for each type of data entities, and learns optimal mappings between them simultaneously. This makes the learned hash codes flexibly cope with the characteristics of different data domains. Moreover, the RaHH framework encodes both homogeneous and heterogeneous relationships between the data entities to design hash functions with improved accuracy. To validate the proposed RaHH method, we conduct extensive evaluations on two large datasets; one is crawled from a popular social media sites, Tencent Weibo, and the other is an open dataset of Flickr(NUS-WIDE). The experimental results clearly demonstrate that the RaHH outperforms several state-of-The-Art hashing methods with significant performance gains.