A nonmonotone learning rate strategy for SGD training of deep neural networksNitish Shirish KeskarGeorge Saon2015ICASSP 2015