Kaggle Projects

This repository contains notebooks with data from Kaggle or from different APIs. I try to take up a new dataset each week and work on the following:

Data Cleaning
Exploratory Data Analysis
Time Series Analysis
2D and 3D Data Visualization (with Matplotlib, Seaborn and Plotly)
Detection and Removal of Outliers and Influential Points (using statistical measures).
Feature Engineering
Training ML algorithms

Tech Stacks

Python (pandas, matplotlib, Plotly)
MySQL
R (caret)
Flask

Projects

As I took up more datasets, I experimented with new methods, learnt new skills and dived into new challenges. I'll mention some of them here and you can read about the in detail in the projects.

Cyclist

This was part of the capstone project for the Google Data Analytics Certificate. It involved heavy data cleaning and coursing through multiple csv files to extract useful features and perform EDA. While this was not a complicated project, it tested me on my analysis skills and how to extract useful business analytical decisions from the data.

For this project, I answered questions on how to improve the marketing and outreach of the product based on its sales and production strategy. I also

Stroke Prediction

This project was my first dive into Imbalanced Data and how to work with it. Only 5% of the data provided was positive which would lead to an ML model overfitting. It involved extensive data cleaning and data visualization to understand the correlations between variables.

This project also involved intense research into dealing with Class Imbalances and efficient outlier detection which led me to use the SMOTE technique (Class Imbalance) and Z-Score (Outlier Detection). Following this, I performed some data analysis and feature engineering before I delved into building the ML algorithm.

F1

Extracted from the F1 API, this includes data from 1950 till 2021 on all races, drivers, pit stops, etc. Again, I performed some data cleaning and EDA to extract useful insights into the drivers, teams and how they have progressed over the years. I also tried to find correlations between pit stop times, race results, qualifying times and other factors.

Notably, this project saw me integrate these visualisations using Plotly in interactive 3D charts which was a useful skill in several other projects in the future. I

Kaggle Playground

Every month, Kaggle releases a "Playground" dataset for data scientists to test their skills and try new models. I participated in the Playground for July 2021 and this project records my progress.

Name		Name	Last commit message	Last commit date
Latest commit History 134 Commits
314 Project		314 Project
Bank Customer Segmentation		Bank Customer Segmentation
CO2 Emissions		CO2 Emissions
CryptoML		CryptoML
Cyclist		Cyclist
Data Science Books:Papers		Data Science Books:Papers
Formula 1		Formula 1
Influenza analysis		Influenza analysis
Kaggle ML Survey - Competition		Kaggle ML Survey - Competition
Kaggle Playground		Kaggle Playground
Mals		Mals
March ML Mania 2022		March ML Mania 2022
Santander Customer Transaction		Santander Customer Transaction
Stroke Prediction		Stroke Prediction
Youtube		Youtube
foursquare		foursquare
soccer_rankings		soccer_rankings
.gitignore		.gitignore
README.md		README.md
ensembling-1-0.ipynb		ensembling-1-0.ipynb
f1-xgb.ipynb		f1-xgb.ipynb
mish-nn-tps-may2022.ipynb		mish-nn-tps-may2022.ipynb
pca-and-selunn.ipynb		pca-and-selunn.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kaggle Projects

Tech Stacks

Projects

Cyclist

Stroke Prediction

F1

Kaggle Playground

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Kaggle Projects

Tech Stacks

Projects

Cyclist

Stroke Prediction

F1

Kaggle Playground

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages