Stroke Prediction Model

This repository contains the code and resources used to build a Stroke Prediction model and deploy it as a Streamlit web application. The model predicts the risk of stroke based on medical and demographic information provided by users.

Streamlit App Link: Stroke Prediction App

Overview The Stroke Prediction model was developed using Python and various machine learning libraries. The following sections provide an overview of the steps taken to create the model:

Data Collection

The dataset used for this project was obtained from kaggle. It contains a collection of medical and demographic data points, including age, hypertension, heart disease, glucose levels, BMI, gender, marital status, employment type, residency, and smoking status. This dataset served as the foundation for training and evaluating the model.

Exploratory Data Analysis (EDA)

Before building the model, exploratory data analysis (EDA) was conducted to gain insights into the dataset. This involved:

Visualizing data distributions. Identifying missing values. Analyzing correlations between features. Exploring the balance of the target variable (stroke/no-stroke).

Data Cleaning

To prepare the data for modeling, essential data cleaning steps were performed:

Handling missing values by imputation or removal. Encoding categorical variables for machine learning compatibility (e.g., one-hot encoding).

Preprocessing

Data preprocessing was a crucial step to ensure the data was ready for machine learning. This included:

Scaling numeric features to a standard range. Handling class imbalance through oversampling or undersampling techniques. Preparing the data for model input.

Model Selection

Choosing the right machine learning algorithm is essential. Several models were evaluated, including Logistic Regression and XGBoost. The model that demonstrated the best performance in terms of accuracy and other relevant metrics was selected as the final model.

Pipeline Creation

A machine learning pipeline was constructed to streamline the entire process from data preprocessing to model training. This pipeline included:

Data preprocessing steps. Feature scaling. Model training and evaluation.

Streamlit App Deployment

The selected model was deployed as a Streamlit web application. Users can input their medical and demographic information, and the app provides a real-time prediction of stroke risk. The app's user interface was designed to be intuitive and user-friendly.

Notice

It's important to note that the Stroke Prediction model has limitations:

The model's performance is significantly affected by the imbalance in the dataset. It is far more likely to classify individuals as at risk of stroke due to the dataset's class distribution. Due to this, the model is not a valid indicator of stroke predictions in real-life scenarios. The model's predictions should not be considered a definitive diagnosis. They are intended for informational purposes only.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
app		app
data		data
models		models
.gitignore		.gitignore
README.md		README.md
StrokePred.ipynb		StrokePred.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stroke Prediction Model

Data Collection

Exploratory Data Analysis (EDA)

Data Cleaning

Preprocessing

Model Selection

Pipeline Creation

Streamlit App Deployment

Notice

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Stroke Prediction Model

Data Collection

Exploratory Data Analysis (EDA)

Data Cleaning

Preprocessing

Model Selection

Pipeline Creation

Streamlit App Deployment

Notice

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages