This repository contains data and code used in the paper "Predicting Moral Values in Lyrics Through Audio" submitted to the 2025 Content-Based Multimedia Indexing (CBMI) conference.
- We utilise a dataset of 200 English language song lyrics, annotated with 10 moral values (virtue/vice polarities treated as separate labels) by two bilingual annotators link here.
- The preview URLs are gathered using the
utils_get_previewsscript, and the previews themselves are downloaded using theutils_download_previewsscript.
The utils_construct_dataset script extracts features and constructs the dataset, using the audiofeatureextractor class:
- A combination of custom-designed, Essentia, and MELODIA features are extracted.
- The extracted features are saved in a dictionary format, categorized by type for easier filtering or elimination.
- The class includes functionality to convert these dictionaries into Pandas DataFrames, making them ready for use with XGBoost.
- The
audio-mft.ipynbnotebook is used to predict moral foundations based on the extracted audio features.