A parallel, block greedy method for sparse inverse covariance estimation for ultra-high dimensions
Abstract
Discovering the graph structure of a Gaus- sian Markov Random Field is an important problem in application areas such as com- putational biology and atmospheric sciences. This task, which translates to estimating the sparsity pattern of the inverse covariance ma- trix, has been extensively studied in the lit- erature. However, the existing approaches are unable to handle ultra-high dimensional datasets and there is a crucial need to de- velop methods that are both highly scal- able and memory-efficient. In this paper, we present GINCO, a blocked greedy method for sparse inverse covariance matrix estima- tion. We also present detailed description of a highly-scalable and memory-efficient imple- mentation of GINCO, which is able to oper- ate on both shared- and distributed-memory architectures. Our implementation is able re- cover the sparsity pattern of 25, 000 vertex random and chain graphs with 87% and 84% accuracy in ≤ 5 minutes using ≤ 10GB of memory on a single 8-core machine. Fur- thermore, our method is statistically consis- tent in recovering the sparsity pattern of the inverse covariance matrix, which we demon- strate through extensive empirical studies.