End-to-end keyword spotting project built on the Google Speech Commands dataset. The project supports:
- training a baseline KWS model
- evaluation on the test split
- single-file prediction
- TensorFlow Lite export for Float32, Int8, and experimental Int4 variants
- robustness evaluation under local noise samples at multiple SNR levels
The 12 output classes are:
yes, no, up, down, left, right, on, off, stop, go, unknown, silence.
train.py: trains the baseline Keras model and saves metrics/plots.evaluate.py: evaluates the saved Keras model on the clean test split.predict.py: predicts the class for one WAV file.quantize.py: exports the trained model to Float32, Int8, and experimental Int4 TFLite formats.evaluate_robustness.py: evaluates TFLite models on clean audio and noisy audio at multiple SNR levels.src/config.py: paths and model/audio constants.src/dataset.py: dataset extraction, manifests, TensorFlow datasets, and log-mel preprocessing.src/model.py: CNN model definition.src/noise.py: local noise loading and SNR-based audio mixing helpers.
You need:
- The Google Speech Commands archive.
- Optional local noise files for robustness testing.
Default archive path expected by the project:
D:\...\speech_commands_v0.02.tar.gz
Local noise folder expected by the robustness script:
D:\...\data\noise_samples
Example noise files already supported:
Babble_1.wavCafe_1.wavTraffic_1.wavAirConditioner_1.wav
Create a "data" subfolder and extract it.
Open the project folder in VS Code and make sure the terminal is already inside:
D:\CVprojects\DL-Project
pip install -r requirements.txtQuick verification run:
python train.py --epochs 3 --limit-train 6000 --limit-val 1200 --limit-test 1200Full training run:
python train.py --epochs 12This saves:
artifacts/models/best_model.kerasartifacts/plots/training_history.pngartifacts/plots/confusion_matrix.pngartifacts/reports/metrics.jsonartifacts/reports/classification_report.txt
python evaluate.pypython predict.py --wav "D:\CVprojects\DL-Project\data\yes\004ae714_nohash_0.wav" --plotpython quantize.pyThis writes:
artifacts/models/model_float32.tfliteartifacts/models/model_int8.tfliteartifacts/models/model_int4.tfliteartifacts/models/quantization_summary.json
Note: the Int4 path is experimental and depends on TensorFlow Lite support in your local build.
Quick demo run:
python evaluate_robustness.py --limit-test 100Full robustness run:
python evaluate_robustness.pyThis writes:
artifacts/reports/robustness_results.csvartifacts/reports/robustness_accuracy_table.csvartifacts/reports/deployment_summary_table.csvartifacts/reports/robustness_summary.json
If you want to demonstrate the full project quickly, run these in order:
python quantize.pypython evaluate_robustness.py --limit-test 100If your instructor wants baseline training too, run this before the above two commands:
python train.py --epochs 3 --limit-train 6000 --limit-val 1200 --limit-test 1200artifacts/models/best_model.kerasartifacts/models/model_float32.tfliteartifacts/models/model_int8.tfliteartifacts/models/model_int4.tfliteartifacts/plots/training_history.pngartifacts/plots/confusion_matrix.pngartifacts/reports/metrics.jsonartifacts/reports/classification_report.txtartifacts/reports/robustness_accuracy_table.csvartifacts/reports/deployment_summary_table.csv
- Audio is processed as 16 kHz mono and padded or trimmed to 1 second.
- Features are normalized log-mel spectrograms.
- The baseline model is a CNN trained with TensorFlow/Keras.
- The robustness script evaluates clean audio and noisy audio at 20 dB, 10 dB, 0 dB, and -5 dB SNR.
- The TFLite Int4 path should be treated as experimental unless separately validated in your exact TensorFlow build.