Project untuk Mata Kuliah Fundamental Big Data
Repositori ini berisi project yang dikembangkan untuk mata kuliah Fundamental Big Data. Project ini berfokus pada analisis dataset stroke untuk mengeksplorasi faktor risiko dan pola yang berhubungan dengan kejadian stroke.
- Melakukan preprocessing dan pembersihan data.
- Melakukan exploratory data analysis (EDA).
- Memvisualisasikan hubungan antara faktor kesehatan dan gaya hidup dengan risiko stroke.
- Mengidentifikasi wawasan dan pola dari dataset.
- Python (Jupyter Notebook)
- Pandas, NumPy → Manipulasi data
- Matplotlib, Seaborn → Visualisasi data
Dataset yang digunakan dalam project ini berisi atribut kesehatan dan gaya hidup yang berhubungan dengan faktor risiko stroke, seperti:
- Jenis kelamin, usia, dan status pernikahan
- Hipertensi dan penyakit jantung
- Jenis pekerjaan dan tempat tinggal
- BMI, status merokok, dan rata-rata kadar glukosa
- Kejadian stroke (variabel target)
Berdasarkan hasil eksplorasi data:
- Usia memiliki hubungan kuat dengan risiko stroke (semakin tua semakin tinggi risiko).
- Hipertensi dan penyakit jantung berhubungan dengan peningkatan risiko stroke.
- Rata-rata kadar glukosa tinggi juga berkorelasi positif dengan stroke.
- Faktor gaya hidup seperti status merokok menunjukkan variasi pada risiko stroke.
- Analisis ini menunjukkan bahwa variabel medis dan gaya hidup tertentu dapat menjadi indikator penting dalam memahami risiko stroke.
- Clone repositori ini
git clone https://github.com/ruslialwin/stroke-analysis.git- Masuk ke folder project
cd stroke-analysis- Install Jupyter Notebook
pip install notebook- Install Requirements
- Isi file requirements.txt
numpy pandas matplotlib seaborn jupyter
- Perintah instalasi
pip install -r requirements.txt
- Jalankan Jupyter Notebook
jupyter notebook===================================================================================
Project for Fundamental Big Data Course
This repository contains a project developed for the Fundamental Big Data course. The project focuses on analyzing a stroke dataset to explore potential risk factors and patterns associated with stroke occurrences.
- Perform data preprocessing and cleaning.
- Conduct exploratory data analysis (EDA).
- Visualize relationships between health and lifestyle factors with stroke risk.
- Identify insights and patterns from the dataset.
- Python (Jupyter Notebook)
- Pandas, NumPy → Data manipulation
- Matplotlib, Seaborn → Data visualization
The dataset used in this project contains health and lifestyle attributes related to stroke risk factors, such as:
- Gender, age, and marital status
- Hypertension and heart disease
- Work type and residence type
- BMI, smoking status, and average glucose level
- Stroke occurrence (target variable)
Based on the data exploration results:
- Age shows a strong relationship with stroke risk (the older the individual, the higher the risk).
- Hypertension and heart disease are linked to an increased risk of stroke.
- Higher average glucose levels are positively correlated with stroke.
- Lifestyle factors such as smoking status also show variations in stroke risk.
- The analysis highlights that medical and lifestyle variables can serve as important indicators for understanding stroke risk.
- Clone this repository
git clone https://github.com/ruslialwin/stroke-analysis.git- Navigate to project folder
cd stroke-analysis- Install Jupyter Notebook
pip install notebook- Install Requirements
- requirement.txt file
numpy pandas matplotlib seaborn jupyter
- install command
pip install -r requirements.txt
- Run Jupyter Notebook
jupyter notebook