NAME : DAKSH LODHA
DOMAIN : MACHINE LEARNING
COMPANY : CODTECH IT SOLUTIONS
DURATION : JUNE TO JULY 2024
ID : CT04ML2337
Overview of Credit Card Fraud Detection Project Project Goal . The primary goal of this project is to develop a machine learning model that can accurately detect fraudulent credit card transactions. Given the highly imbalanced nature of the dataset, the project aims to handle data preprocessing, model training, and evaluation to achieve high performance in identifying fraudulent activities.
Dataset The dataset used for this project is the Credit Card Fraud Detection Dataset from Kaggle. It consists of 284,807 transactions, where only 492 are fraudulent, making it a highly imbalanced dataset. The features include:
. Time: Seconds elapsed between this transaction and the first transaction. . V1 to V28: Principal component analysis (PCA) components. . Amount: Transaction amount. Class: Target variable (1 for fraud, 0 for non-fraud). Methodology
- Data Preprocessing Data Cleaning: Handle missing values and remove irrelevant features. Data Normalization: Scale the numerical features to ensure they contribute equally to the model. Handling Imbalanced Data: Implement techniques like oversampling, undersampling, and SMOTE to address class imbalance.
- Exploratory Data Analysis (EDA) Perform EDA to understand data distribution and relationships. Visualize data using histograms, box plots, scatter plots, and correlation matrices to identify patterns and anomalies.
- Feature Engineering Feature Selection: Identify and select the most relevant features based on their importance and correlation with the target variable. Feature Creation: Create new features that might enhance model performance.
- Model Training . Train various machine learning models, such as: . Logistic Regression . Decision Trees . Random Forest . Gradient Boosting . Support Vector Machines (SVM) . Neural Networks . Use techniques like cross-validation to ensure the models generalize well to unseen data.
- Model Evaluation . Evaluate models using metrics including: . Accuracy . Precision . Recall . F1 Score . Area Under the Receiver Operating Characteristic Curve (ROC-AUC)