A Human activity classifier to categorize actions into 6 classes like WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING based on the dataset hosted in Kaggle
The Human Activity Recognition database was built from the recordings of 30 study participants performing activities of daily living (ADL) while carrying a waist-mounted smartphone with embedded inertial sensors. The objective is to classify activities into one of the six activities performed. Description of experiment
The experiments have been carried out with a group of 30 volunteers within an age bracket of 19-48 years. Each person performed six activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING) wearing a smartphone (Samsung Galaxy S II) on the waist. Using its embedded accelerometer and gyroscope, we captured 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50Hz. The experiments have been video-recorded to label the data manually. The obtained dataset has been randomly partitioned into two sets, where 70% of the volunteers was selected for generating the training data and 30% the test data.
The sensor signals (accelerometer and gyroscope) were pre-processed by applying noise filters and then sampled in fixed-width sliding windows of 2.56 sec and 50% overlap (128 readings/window). The sensor acceleration signal, which has gravitational and body motion components, was separated using a Butterworth low-pass filter into body acceleration and gravity. The gravitational force is assumed to have only low frequency components, therefore a filter with 0.3 Hz cutoff frequency was used. From each window, a vector of features was obtained by calculating variables from the time and frequency domain.
For each record in the dataset the following is provided:
-
Triaxial acceleration from the accelerometer (total acceleration) and the estimated body acceleration.
-
Triaxial Angular velocity from the gyroscope.
-
A
561-featurevector with time and frequency domain variables. -
Its activity label.
-
An identifier of the subject who carried out the experiment.
The project is divided into 5 parts:
- Part 1: Data Analysis
- Part 2: Dimensionality Reduction using PCA
- Part 3: Further Dimensionality Reduction using LDA
- Part 4: Classification using KNN
- Part 5: Report including the results of our project and some implementations from sklearn library
- The dataset is loaded using pandas library and the data is analyzed using pandas and matplotlib library. We immediately start by doing some general data analysis to get a better understanding of the dataset like pie chart of the number of samples in each class, bar chart of the number of samples for each subject, etc.
training_set = pd.read_csv("dataset/train.csv")
print(training_set.shape)
(7352, 563)- By using PCA as a dimensionality reduction we manage to go from 561 to 155. The implementation is found the in file
pca.pyand returns the projection matrix and the number of components that retain 99% of the variance.
projection_matrix, component_num = PCA(x, show_plots=True)
pca = x @ projection_matrix.TInside the PCA function we calculate the number of components that retain 99% of the variance and we plot the cumulative sum of the explained variance ratio.
# returns the #num of components that retain 99% of the variance
component_num = np.argmax(cumulative_variance >= 0.99) + 1We do a scatter plot of the first 3 components to have a better understanding of the data.
We can see that the data is still not clearly separable, so we do further dimensionality reduction using LDA which is a supervised dimensionality reduction method where we use the labels to find the best projection matrix.
- The implementation of LDA is in the file
lda.pyand similarly to pca it takes in input the data and the labels and returns the projection matrix.Since we have k = 6 classes, we will be projecting on k − 1 = 5 axes.
lda_proj = LDA(pca, y_train, n_classes=6)
lda = np.matmul(pca, lda_proj.T)We have managed to reduce the number of features from 561 to 5. We do a scatter plot of the first 3 components to look at the current situation.
We can see that the data is now clearly separable (except SITTING and STANDING which are still close). So we can start the classification process where i decided to use KNN.
- The implementation of KNN is in the file
knn.pyand it takes in input the data, the labels and the number of neighbors and returns the predicted class. The execution of the code takes a while (around 2min), but the accuracy is pretty good.
knn = KNN(lda, y_train, n_neighbors=21)The number of neighbors is chosen empirically and it is 21. We can see that the accuracy is 0.95.... which looks pretty good.
Accuracy: 0.9548693586698337I decided to give a try to the famous library of sklearn and compare the results with my implementation.
Accuracy: 0.9548693586698337 (me)
Accuracy: 0.9562266711910418 (RandomForestClassifier)
Accuracy: 0.9416355615880556 (DecisionTreeClassifier)
Accuracy: 0.9548693586698337 (KNeighborsClassifier)
Accuracy: 0.9572446555819477 (SVMClassifier)-
Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. Human Activity Recognition on Smartphones using a Multiclass Hardware-Friendly Support Vector Machine. International Workshop of Ambient Assisted Living (IWAAL 2012). Vitoria-Gasteiz, Spain. Dec 2012
-
Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra, Jorge L. Reyes-Ortiz. Energy Efficient Smartphone-Based Activity Recognition using Fixed-Point Arithmetic. Journal of Universal Computer Science. Special Issue in Ambient Assisted Living: Home Care. Volume 19, Issue 9. May 2013
-
Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. Human Activity Recognition on Smartphones using a Multiclass Hardware-Friendly Support Vector Machine. 4th International Workshop of Ambient Assited Living, IWAAL 2012, Vitoria-Gasteiz, Spain, December 3-5, 2012. Proceedings. Lecture Notes in Computer Science 2012, pp 216-223.
-
Jorge Luis Reyes-Ortiz, Alessandro Ghio, Xavier Parra-Llanas, Davide Anguita, Joan Cabestany, Andreu Català. Human Activity and Motion Disorder Recognition: Towards Smarter Interactive Cognitive Environments. 21st European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2013. Bruges, Belgium 24-26 April 2013.




