Guaranteed Convergence of Training Convolutional Neural Networks via Accelerated Gradient Descent
Abstract
In this paper, we study the linear regression problem of training an one-hidden-layer non-overlapping convolutional neural networks (ConvNNs) with the rectified linear unit (ReLU) activation functions. Given a set of training data that contains the inputs (feature vectors) and outputs (labels), the outputs are assumed to be generated from a ConvNN with unknown weights, and our goal is to recover the ground-truth weights by minimizing a non-convex optimization problem whose object function is the empirical loss function. We have proved that if the inputs belong to Gaussian distribution, then the optimization problem can be solved by accelerated gradient descent (AGD) algorithm with a well-designed initial point and enough samples, and the iterates via AGD algorithm converge linearly to the ground-truth weights.