The aim of this university project is to implement a CNN (convolutional neural network) for gesture recognition on ESP32-CAM. Keras is used for model's training and TensorFlowLite allows to implement the model on a microcontroler such as ESP32. An American Sign Language's data set is used to train the model, several optimizations were made to improve the precision in practice (merging labels, artificial augmentation of the data set).
Input: 28*28 = 784 pixelsOutput: 24 labels (but merged to 4 due to optimization)
Hidden layers
- 3
Convolution + ReLu + Max-pooling - 1
Flatten - 1
Fully-connected + ReLu - 1
Fully-connected + Softmax
At the end of training in the notebook, ASL_256_lite.tflite is created by TensorFlowLite. To create binary model's file model_data.cc to implement on ESP32 (p_det_model.cpp in source code), entry this command in the folder terminal :
xxd -i ASL_256_lite.tflite > model_data.cc
- ~100% with test data (ASL data set) but biased
- ~70% with real capture (ESP32-CAM)
Board: AI Thinker ESP32-CAMData Set: MNIST ASLCNN modeling and training: KerasCNN microcontroler's post-training implementation: TensorFlowLite