Design of Novel Deep Learning Models for Real-time Human Activity Recognition with Mobile Phones
Abstract
In this paper we present deep learning based techniques for human activity classification that are designed to run in real time on mobile devices. Our methods minimize the size of the model and computational overhead in order to run on the embedded processor and preserve battery life. Prior work shows that the inertial measurement unit (IMU) data from waist-mounted mobile phones can be used to develop accurate classification models for various human activities such as walking, running, stair-climbing, etc. However, these models have largely been based on hand crafted features derived from temporal and spectral statistics. More recently, deep learning has been applied to IMU sensor data, but have not been optimized for resourceconstrained devices. We present a detailed study of the traditional hand-crafted features used for shallow/statistical models that consist of a over 561 manually chosen set of dimensions. We show, through principal component analysis (PCA) and application of a published support vector machine (SVM) pipeline, that the number of features can be significantly reduced - less than 100 features that give the same performance. In addition, we show that features derived from frequency-domain transformations do not contribute to the accuracy of these models. Finally, we provide details of our learning technique which creates 2D signal images from windowed samples of IMU data. Our pipeline includes a convolutional neural network (CNN) with several layers (1 convolutional layer and 1 averaging layer and a fully connected layer). We show that by removing the steps in the pipeline and layers in the CNN, we can still achieve 0.98 F1 score but with a much smaller memory footprint and corresponding computational cost. To increase the classification accuracy of our pipeline we added a hybrid bi-class support vector machine (SVM) that was trained using the labeled and flattened convolutional layer after each training image was processed. The learned feature set is almost half the size of the original hand crafted feature set and combining the CNN with the SVM results in 0.99 F1 score. We also investigate a novel application of transfer learning by using the time series 2D signal images to re-train two different publicly available networks, Inception/ImageNet and MobileNet. We find that re-trained ImageNet networks could be created < 5.5 MB (suitable for mobile phones) and classification accuracy ranging from 0.83 to 0.93 (F1 score), thus indicating that retraining can be a useful future direction to build new classifiers for continuously evolving activities quickly while also being applicable to mobile device classification. Finally, we show that these deep learning models may be generalizable enough such that classifiers built from a given set of users for a specified set of activities can be used for a new user/subject as well.