Scaling Up the Training of Deep CNNs for Human Action Recognition
Abstract
Convolutional deep neural networks (CNNs) has been shown to perform well in difficult learning tasks such as object recognition. They are gaining huge importance in recent times but are computationally intensive. Typically trained on massive datasets, two-dimensional CNNs are used for image classification and recognition purposes and consume huge computational time. For applications like human action recognition involving video inputs, their 3D counterparts termed as 3D convolutional neural networks (3D-CNNs) are employed. Scaling up the computations to support large datasets and accelerating the training on these models for high performance has been the need of the hour especially in 3D deep learning models since the extended connectivity of CNN in the time domain takes huge time for training the model. Also there is a need to look at the model parameters and hyper parameters that determine both the computational performance as well as the accuracy of the deep neural network. Accelerators such as Graphics Processing Units (GPUs) and multi-cores provide a means for speeding up the training of CNNs and achieve higher performance by parallelizing the training of these models by taking advantage of data and model parallelism. In this work we use multi-core CPUs and GPUs to scale-up the training of 3D-CNNs. We achieve a faster implementation as well as report how various network parameters affect the performance of the model thereby providing recommendations on initializing the values of the same. The code scales up well on multi-cores and GPUs, with a speedup of 10x on CPUs and achieves almost 12x on GPUs compared to the serial version. Our work infers that 3D-CNN code scales up best on CPUs when the convolution step is implemented with a highly parallel FFT based approach, thereby achieving the performance comparable to GPUs using OpenMP.