Faster and cheaper: Parallelizing large-scale matrix factorization on GPUsWei TanLiangliang Caoet al.2016HPDC 2016