A collection of machine learning projects completed during the Master's program at Lviv Polytechnic National University, demonstrating practical skills in data preprocessing, supervised learning, unsupervised learning, and anomaly detection.
Name: Intelligent Data Analysis
Institution: Lviv Polytechnic National University
Semester: 1st
This repository contains 5 laboratory works covering fundamental machine learning techniques:
- Data Preprocessing - Feature scaling, normalization, outlier detection
- Classification - Supervised learning with SVM and Naive Bayes
- Regression - Predictive modeling with regularization
- Clustering - Unsupervised pattern discovery
- Anomaly Detection - Fraud detection in imbalanced datasets
Intelligent-Data-Analysis/
├── lab1-data-preprocessing/
│ ├── data_preprocessing.ipynb
│ ├── CloudWatch_Traffic_Web_Attack_.csv
│ └── README.md
├── lab2-classification/
│ ├── mobile_price_classification.ipynb
│ ├── mobile_price.csv
│ └── README.md
├── lab3-regression/
│ ├── movie_revenue_regression.ipynb
│ ├── imdb_movies.csv
│ └── README.md
├── lab4-clustering/
│ ├── clustering_analysis.ipynb
│ ├── clustering_data.csv
│ └── README.md
├── lab5-anomaly-detection/
│ ├── credit_fraud_detection.ipynb
│ ├── README.md
│ └── (dataset auto-downloads from Google Drive)
├── requirements.txt
├── .gitignore
└── README.md
Open any lab directly in your browser - no installation required!
| Lab | Topic | Open in Colab |
|---|---|---|
| 1 | Data Preprocessing | |
| 2 | Classification | |
| 3 | Regression | |
| 4 | Clustering | |
| 5 | Anomaly Detection |
All datasets load automatically in Colab!
Focus: Feature scaling, outlier detection, normalization
Key Techniques: Z-score, statistical methods
Focus: Multi-class classification
Algorithms: LinearSVC, Gaussian Naive Bayes
Metrics: Accuracy, Precision, Recall, F1-Score
Focus: Revenue prediction
Algorithm: Ridge Regression with L2 regularization
Evaluation: Explained variance, MSE, R²
Focus: Unsupervised pattern discovery
Algorithms: K-Means, MiniBatchKMeans, GMM
Metrics: Silhouette score, visual analysis
Focus: Fraud detection in highly imbalanced data (0.172% fraud)
Algorithms: DBSCAN, Isolation Forest, statistical methods
Challenge: Real-world financial dataset
- Python 3.x - Primary programming language
- pandas - Data manipulation and analysis
- numpy - Numerical computing
- scikit-learn - Machine learning algorithms
- scipy - Scientific computing and statistics
- matplotlib - Data visualization
- seaborn - Statistical graphics
- Jupyter Notebook - Interactive development
- Google Colab - Cloud-based execution
This project is licensed under the MIT License - see the LICENSE file for details.