Abstract
Sparse regression methods such as l1-regularized linear regression, or Lasso [18], are commonly used for analysis of high-dimensional, small-sample datasets, due to their good generalization and feature-selection properties. However, predictive accuracy of sparse regression can be further improved by incorporating more realistic data-modeling assumptions (e.g., nonlinearity) and by fully exploiting all available data, as suggested by the transductive approach [6], which makes instance-specific predictions based on both labeled (training) data and unlabeled (test) data, instead of learning a single fixed model from training data. Based on these ideas, we develop a novel method, called Transductive HSIC Lasso, that incorporates transduction into a nonlinear sparse regression approach known as HSIC Lasso [19]. Unlike the existing transductive Lasso algorithm of [1], our approach does not rely on imputation, i.e., oil estimation of unknown labels using a predictor built on training data; the latter may sometimes result in poor overall performance due to unreliable label estimates. Instead, our method exploits the structure of the HSIC Lasso, which maximizes the relevance between the selected features and the label, while minimizing the redundancy between the selected features; transduction is achieved by including unlabeled samples into the redundancy computation. Our experiments demonstrate advantages of the proposed method over the state-of-the-art approaches, both on simulated and real-life data, such as prediction of phenotypic traits from genomic data, and prediction of subject's pain level from his/her functional MRI data.