Data Science Enthusiast | Informatics Student at the University of Washington
Welcome to my GitHub! I'm a passionate data science student at the University of Washington with a focus on Data Science and a minor in Statistics. My journey in tech has allowed me to develop a strong foundation in data analysis, machine learning, and software development.
- π Education: Pursuing a BS in Informatics (Data Science) with a minor in Statistics at the University of Washington, Seattle.
- π GPA: 3.95
- π§° Technical Skills:
- Languages: Python, SQL, R, JavaScript, Java
- Technologies: pandas, Scikit-learn, XGBoost, PyTorch, TensorFlow, Matplotlib, Streamlit, AWS, Azure, Docker
- Tools: Git, Linux, Jupyter, Tableau, Snowflake, MLFlow, DVC
- Developed a machine learning model to classify scoliosis from X-ray images using transfer learning with ResNet-50 CNN.
- Achieved 95% accuracy with various regularization techniques.
- Implemented a robust training pipeline with PyTorch, leveraging transfer learning techniques, model validation, and optimization.
- Deployed TensorBoard using AWS Fargate for scalable, cost-effective real-time monitoring.
- Created a predictive model using XGBoost to forecast house prices with 85% accuracy on the test set.
- Performed extensive feature engineering and automated hyperparameter optimization using Optuna.
- Hosted the model via Streamlit for real-time user inferences.
- Developed a machine learning model to predict the risk of stroke based on medical and demographic information.
- Utilized Logistic Regression and XGBoost algorithms for model training.
- Addressed the imbalance in the dataset, which made achieving an accurate model challenging.
- Implemented SMOTE (Synthetic Minority Over-sampling Technique) to balance the training dataset and improve model performance.
- Conducted extensive data preprocessing, including scaling numeric features and encoding categorical variables.
- Achieved improved model performance by tuning hyperparameters using Optuna.
- Deployed the model as a Streamlit web application for real-time stroke risk prediction.
- Developed an interactive web application for creating and managing vocabulary decks to enhance study routines.
- Implemented user authentication and data persistence using Firebase, ensuring personalized study experiences.
- Utilized React for building dynamic user interfaces and React Router for seamless navigation.
- Integrated Bootstrap and Material UI for responsive and visually appealing design.
- Created a robust feature set including deck creation, card management, and search functionality.
- Deployed the application with a focus on performance and user experience.
This project was developed for the INFO340: Client Side Web Development class at the University of Washington iSchool.
- Developed data clustering models in Python, leveraging K-means and dimensionality reduction techniques to segment dealers, leading to a predicted 20% increase in targeted marketing campaign effectiveness.
- Optimized data ingestion and cleaning processes using SQL within Snowflake and SQLAlchemy, resulting in a 20% reduction in data processing time for clustered dealer analysis.
- Developed Tableau dashboards enabling real-time filtering, saving 30 hours per month on warranty claim reporting and providing instant access to key summary statistics and KPIs.
- Presented final insights and impact to executive leadership, highlighting the effectiveness of data-driven solutions and the overall business value delivered.