Project Title : Make your song Billboard Top 100

Authors : Boseong Kang, Geonho Lee

Description of Question and Research Topic

This project aims to comprehensively analyze lyric embeddings and the characteristics of commonly used instruments for Billboard's Top 100 songs, revealing trends over time.
For example, if an artist is tryting to compose a song, our model can predict which instruments with lyrics would be used to enter a Billboard Top 100.
We analyze trends by clustering lyrics and examining word similarity.
Then analyze the instrument usage ratios by spectralizing the songs.
Finally, we combine these two analyses to build a machine learning model that predicts chart entry probability.

Project Outline/Plan

Data Collection Plan (two parts, one for each author)

Part 1: Boseong Kang
Use billboard.py Python library to get Billboard top 100 song's title, rank, and artist.
Using lyrics from websites have copyright issue so use lyrics data from kaggle.
~~https://www.kaggle.com/datasets/bwandowando/spotify-songs-with-attributes-and-lyrics~~
~~This data set has License CC BY-NC-SA 4.0 which means we can free to share, adapt if we use as NonCommercial and give appropriate credit.~~
https://www.kaggle.com/datasets/suparnabiswas/billboard-hot-1002000-2023-data-with-features (new dataset)
New dataset License CC0: Public Domain, -> You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission
Part 2: Geonho Lee
Use the songs collected in Part 1 to analyze their audio characteristics.
Convert each song’s WAV data into a frequency spectrum using NumPy and SciPy, and extract features such as band energy ratios (bass, vocal, cymbal).
Estimate the instrument approximation energy ratios to identify which sound ranges are dominant in each song.
Visualize the results using Matplotlib to compare how different tracks emphasize different sound bands.
Compare audio patterns with lyrical patterns to analyze overall music trends.

Model Plans (two parts, one for each author)

Part 1: Boseong Kang
Logistic Regression: Similar to the MNIST Dataset, after preprocessing the data, use one-hot encoding or TF-IDF from scikit-learn to classify from the top 10 songs and others.
MLP: With preprocessed words, we can classify the top 10 songs vs other songs using ReLu and the sigmoid activation function
Part 2: Geonho Lee
Visualization: Use Matplotlib to visualize each song’s frequency spectrum and energy distribution.
Model: Apply Logistic Regression from scikit-learn to analyze relationships between extracted audio features (band energy ratios) and data from Part 1. Additionally, since non-linear relationships can’t be properly captured by Logistic Regression, we will use MLP implemented with PyTorch to explore non-linear sound patterns across frequency bands.

Project Timeline

Our Project Roadmap link

It may take time to load our Roadmap.
Open the Roadmap on GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Title : Make your song Billboard Top 100

Authors : Boseong Kang, Geonho Lee

Description of Question and Research Topic

Project Outline/Plan

Model Plans (two parts, one for each author)

Project Timeline

Our Project Roadmap link

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project Title : Make your song Billboard Top 100

Authors : Boseong Kang, Geonho Lee

Description of Question and Research Topic

Project Outline/Plan

Model Plans (two parts, one for each author)

Project Timeline

Our Project Roadmap link

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages