Making REML computationally feasible for large data sets: Use of the Gibbs sampler
Abstract
REML (restricted maximum likelihood) has become the preferred method for estimating variance components. Except for relatively simple special cases, the computation of REML estimates requires the use of an iterative algorithm. A number of algorithms have been proposed; they can be classified as derivative-free, first-order, or second-order. The computational requirements of a first-order algorithm are only moderately greater than those of a derivative-free algorithm and are considerably less than those of a second-order algorithm. First-order algorithms include the EM algorithm and various algorithms derived from the REML likelihood equations by the method of successive approximations. They also include so-called linearized algorithms, which appear to have superior convergence properties. With conventional numerical methods, the computations required to obtain the REML iterates can be very extensive, so much so as to be infeasible for very large data sets (with very large numbers of random effects). The Gibbs sampler can be used to compute the iterates of a first-order REML algorithm. This is accomplished by adapting, extending, and enhancing results on the use of the Gibbs sampler to invert positive definite matrices. In computing the REML iterates for large data sets, the use of the Gibbs sampler provides an appealing alternative to the use of conventional numerical methods.