Skip to content

Harsh-2420/kaggle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

134 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kaggle Projects

This repository contains notebooks with data from Kaggle or from different APIs. I try to take up a new dataset each week and work on the following:

  • Data Cleaning
  • Exploratory Data Analysis
  • Time Series Analysis
  • 2D and 3D Data Visualization (with Matplotlib, Seaborn and Plotly)
  • Detection and Removal of Outliers and Influential Points (using statistical measures).
  • Feature Engineering
  • Training ML algorithms

Tech Stacks

  • Python (pandas, matplotlib, Plotly)
  • MySQL
  • R (caret)
  • Flask

Projects

As I took up more datasets, I experimented with new methods, learnt new skills and dived into new challenges. I'll mention some of them here and you can read about the in detail in the projects.

Cyclist

This was part of the capstone project for the Google Data Analytics Certificate. It involved heavy data cleaning and coursing through multiple csv files to extract useful features and perform EDA. While this was not a complicated project, it tested me on my analysis skills and how to extract useful business analytical decisions from the data.

For this project, I answered questions on how to improve the marketing and outreach of the product based on its sales and production strategy. I also

Stroke Prediction

This project was my first dive into Imbalanced Data and how to work with it. Only 5% of the data provided was positive which would lead to an ML model overfitting. It involved extensive data cleaning and data visualization to understand the correlations between variables.

This project also involved intense research into dealing with Class Imbalances and efficient outlier detection which led me to use the SMOTE technique (Class Imbalance) and Z-Score (Outlier Detection). Following this, I performed some data analysis and feature engineering before I delved into building the ML algorithm.

F1

Extracted from the F1 API, this includes data from 1950 till 2021 on all races, drivers, pit stops, etc. Again, I performed some data cleaning and EDA to extract useful insights into the drivers, teams and how they have progressed over the years. I also tried to find correlations between pit stop times, race results, qualifying times and other factors.

Notably, this project saw me integrate these visualisations using Plotly in interactive 3D charts which was a useful skill in several other projects in the future. I

Kaggle Playground

Every month, Kaggle releases a "Playground" dataset for data scientists to test their skills and try new models. I participated in the Playground for July 2021 and this project records my progress.

About

Working on different Kaggle notebooks, exploring the data and building algorithms

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors