StanChiangTW · kirandeepgaur · Apr 1, 2025 · Apr 1, 2025 · Apr 1, 2025 · Apr 1, 2025
diff --git a/.DS_Store b/.DS_Store
diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,5 @@
 *.pyc
 .env
-models/
+models/
+mlops/
+*.pyc # Python virtual environment files
diff --git a/src/api/Dockerfile.api → Dockerfile.api b/src/api/Dockerfile.api → Dockerfile.api
@@ -1,15 +1,15 @@
-# This Dockerfile is meant to be run from the root of the repo
+# Base Python image (Python 3.9.6)
+FROM python:3.12.4
 
-# Base Python image (Python 3.12)
-FROM python:3.12-slim
 
 # Set the working directory in the container
 WORKDIR /app
 
 # Copy the current folder (on the host machine) that contains the whole app project into the container /app folder
 COPY . /app
 
-RUN pip install .
+# install the required packages
+RUN pip install --no-cache-dir -r requirements.txt
 
 # Expose the port on which FastAPI will run
 EXPOSE 8000
@@ -18,4 +18,4 @@ EXPOSE 8000
 ENV DSBA_MODELS_ROOT_PATH=/app/models
 
 # Define the default command run when starting the container: Run the FastAPI app using Uvicorn
-CMD ["uvicorn", "src.api.api:app", "--host", "0.0.0.0", "--port", "8000"]
+CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000"]
diff --git a/Project_Report.pdf b/Project_Report.pdf
diff --git a/README.md b/README.md
@@ -1,10 +1,9 @@
 # DSBA Platform
 
-A toy MLOps Platform for educational purposes
+A toy MLOps Project
 
 ## Project Structure
 
-This project has a fairly standard structure (but it is still adapted to be simplified compared to a slightly more typical structure):
 
 - a `pyproject.toml` file contains the project metadata, including the dependencies. It is common to see a "setup.py" file in Python projects but we use this more modern approach to define the project metadata.
 - The `src` folder contains the code code (dsba) as well as the code for the CLI, the API, the web app, the notebooks, as well as the Dockerfiles.
@@ -98,9 +97,52 @@ For windows, something of the sort may work:
 set DSBA_MODELS_ROOT_PATH="C:\path\to\your\models"
 ```
 
-## CLI
 
-List models registered on your system:
+
+
+
+# MLOps student project
+This project aimed to create an interface for a bank employees that allows them to choose a prediction model, a customer dataset, and then provide a result on whether customers will churn or not.
+
+To do this, we based our work on a machine learning project on bank churners that one of us had previously created. This project used the dataset of a kaggle challenge available at :  https://www.kaggle.com/datasets/thedevastator/predicting-credit-card-customer-attrition-with-m 
+
+
+## Important Folders and Files
+
+### Data
+
+```bash
+data/
+```
+
+This contains:
+- the original `BankChurners.csv` file of the Kaggle challenge used for the original ML project
+- the `X_test.csv` and the `y_test.csv` files created after the preprocessing of the original dataset, this files are the ones used for the rest of the MLOps project
+
+
+### Models
+
+```bash
+models/
+```
+
+This contains the different models trained and available to predict results. The 4 models are :
+- `lgbm_model.pkl`
+- `rf_model.pkl`
+- `svm_model.pkl`
+- `xgb_model.pkl`
+
+
+### Src
+
+```bash
+src/
+```
+This is the main source directory where the code resides.
+
+#### CLI
+
+Not used in the MLOps project, normally it's used to list models registered on your system:
 
 ```bash
 src/cli/dsba_cli list
@@ -112,17 +154,82 @@ Use a model to predict on a file:
 src/cli/dsba_cli predict --input /path/to/your/data/file.csv --output /path/to/your/output/file.csv --model-id your_model_id
 ```
 
-### Notebook
+### API
+An API is provided, it allows to interact with models. You can start the API by running:
+```bash
+uvicorn api:app --reload
+```
+
+Dockerized API
+To run the API in a Docker container, follow these steps:
+1. Build the Docker image:
+```bash
+docker build -f Dockerfile.api -t fastapi .
+```
+2. Run the Docker container:
+```bash
+docker run -d -p 8000:80 fastapi
+```
+The API will be available at http://127.0.0.1:8000/
+
+Note: Ensure Docker is installed on your machine.
+3. Tag the image 
+```bash
+docker tag fastapi stanchiangtw/fastapi
+```
+
+Note: Ensure Docker Desktop is logged in by using
+```bash
+docker login
+```
+
+4. Push the tag image to Docker Hub
+```bash
+docker push stanchiangtw/fastapi:latest
+```
+
+
+AWS ECS & EC2
+We successfully implemented the deployment and scaling of Docker containers on AWS ECS. The process involved the following steps:
+- Creating an ECS cluster
+- Defining a Task Definition
+- Configuring a Security Group
+- Setting up a service
+- Accessing the running service
+
+The API was successfully running at http://13.37.241.233:8000/.
+
+However, the service quickly exceeded the free tier quota, resulting in costs ($12.46). Before stopping the service, we took a screenshot to document the progress made.
+![](https://drive.google.com/uc?export=view&id=1YI7dOU-cK-AJIUZxjXoJGsSZiGhT2f55)
+
+
+### Templates
+The `templates/` directory contains HTML templates used for rendering web views in our application. The main file `dashboard.html` serves as the user interface for our ML model evaluation dashboard.
+![](./static/dashboard.gif)
+
+
+### static
+The `static/` directory stores static resources like generated plots (.png files). The image is served directly to the client browser and don't change during runtime. The visualization image created during model evaluation is stored here, allowing it to be displayed in the dashboard.
+
+
+### Dockerfile
+The .api file includes all the necessary instructions, organized in a way that ensures Docker executes them correctly.
+
+### Requirements
+The .txt file contains all the necessary Python packages required to process the data, train the model, and run the API.
+
+
+#### DSBA
+This contains the core functionality of the MLOps project, including model handling, data preprocessing, and utilities for training and predicting.
 
-...
 
-### API
 
-...
+#### Notebooks
+This contains:
+- `model_training_example.ipynb` the original example of Notebook of the MLOps platform project
+- `Bank_MLOps.ipynb` the notebook of an ML project from which we used it for our MLOps project. The original code from the ML project has not been modified; it may contain some elements from LLM. To use the notebook, navigate to the `notebooks/` folder and open the file. You can use the provided utilities to train models, preprocess data, and evaluate performance.
 
-### Dockerized API
 
-...
 
-### REMARKS
-remove extra preprocessing files 
+## REMARKS
+No remarks
diff --git a/api.py b/api.py
@@ -0,0 +1,182 @@
+from fastapi import FastAPI, UploadFile, File
+import pandas as pd
+import joblib
+import pickle
+from src.dsba.data_ingestion.data_ingestion import load_csv
+from src.dsba.preprocessing_a import preprocess_data
+from src.dsba.model_evaluation import evaluate_models
+import io
+from pathlib import Path
+from typing import Optional
+import sys
+from fastapi.responses import HTMLResponse
+from fastapi.staticfiles import StaticFiles
+from fastapi import Request
+import matplotlib.pyplot as plt
+from fastapi.templating import Jinja2Templates
+import numpy
+sys.modules['numpy._core'] = numpy.core
+
+
+
+
+
+df = load_csv("data/BankChurners.csv")
+X_train, X_test, y_train, y_test = preprocess_data("data/BankChurners.csv")
+
+lgbm_model = joblib.load('models/lgbm_model.pkl')
+rf_model = joblib.load('models/rf_model.pkl')
+xgb_model = joblib.load('models/xgb_model.pkl')
+svm_model = joblib.load('models/svm_model.pkl')
+
+models = {
+    "LGBM": lgbm_model,
+    "RandomForest": rf_model,
+    "XGBoost": xgb_model,
+    "SVM": svm_model
+}
+
+results, model_comparison = evaluate_models(models, X_train, y_train, X_test, y_test)
+
+png_path = "static/model_comparison.png"
+model_comparison.savefig(png_path)
+
+
+
+data_info =  df.iloc[:5].to_html(classes='table table-striped table-bordered', index=False)
+columns_df = pd.DataFrame({'Column Name': df.columns.tolist()})
+columns_info = columns_df.to_html(classes='table table-striped table-bordered', index=False)
+model_info = {"Show the model info": {}}
+for model_name, model in models.items():
+    model_info["Show the model info"][model_name] = model.__class__.__name__
+
+results_dict = results.to_dict(orient="records")
+metrics_summary = {}
+
+for record in results_dict:
+    model_name = record["Model"]
+    dataset_type = record["Dataset"]
+
+    if model_name not in metrics_summary:
+        metrics_summary[model_name] = {}
+
+    metrics_summary[model_name][dataset_type] = {
+        "accuracy": record["accuracy"],
+        "precision": record["precision"],
+        "recall": record["recall"],
+        "f1_score": record["f1_score"]
+    }
+
+app = FastAPI()
+
+template = Jinja2Templates(directory="templates")
+app.mount("/static", StaticFiles(directory="static"), name="static")
+
+async def tmp_save_file(upload_file: UploadFile) -> Path:
+    try:
+        temp_dir = Path("temp_uploads")
+        temp_dir.mkdir(exist_ok=True)
+
+        # Generate Temporary file path
+        file_path = temp_dir / f"{upload_file.filename}"
+
+        # Read the contents
+        contents = await upload_file.read()
+
+        # Write the temporary file
+        with open(file_path, "wb") as f:
+            f.write(contents)
+
+        return file_path
+
+    except Exception as e:
+        raise ValueError(f"Error when saving the file: {str(e)}")
+
+
+@app.get("/")
+async def read_root(request: Request):
+    return template.TemplateResponse("dashboard.html", {
+        "request": request,
+        "data_info": data_info,
+        "columns_info": columns_info,
+        "model_info": model_info,
+        "metrics_summary": metrics_summary,
+        "plot_image_path": "static/model_comparison.png"
+
+    })
+
+@app.post("/upload-and-evaluate/")
+async def upload_and_evaluate(request: Request, file: Optional[UploadFile] = None):
+    try:
+
+        if file is None:
+            return template.TemplateResponse("dashboard.html", {
+                "request": request,
+                "data_info": data_info,
+                "columns_info": columns_info,
+                "model_info": model_info,
+                "metrics_summary": metrics_summary,
+                 "plot_image_path": "static/model_comparison.png"
+
+            })
+
+
+        # Need HTML to design the layout to upload the file
+        temp_file_path = await tmp_save_file(file)
+
+        df_upload = pd.read_csv(temp_file_path)
+
+        X_train_upload, X_test_upload, y_train_upload, y_test_upload = preprocess_data(temp_file_path)
+
+
+        results_upload, model_comparison_upload = evaluate_models(models, X_train_upload, X_test_upload, y_train_upload, y_test_upload )
+
+        png_path_upload = "static/model_comparison_upload.png"
+        model_comparison_upload.savefig(png_path_upload)  
+
+        # Need HTML to show the plot for model_comparison_upload
+
+        data_upload = df_upload.iloc[:5].to_html(classes='table table-striped table-bordered', index=False)# Over 5 records, the layout will be messy
+
+
+        columns_upload_df = pd.DataFrame({'Column Name': df_upload.columns.tolist()})
+        columns_info_upload = columns_upload_df.to_html(classes='table table-striped table-bordered', index=False)
+
+        model_info_upload = {"Show the model info": {}}
+        for model_name, model in models.items():
+            model_info_upload["Show the model info"][model_name] = model.__class__.__name__
+
+        results_dict_upload = results_upload.to_dict(orient="records")
+        metrics_summary_upload = {}
+
+
+        for record in results_dict_upload:
+            model_name = record["Model"]
+            dataset_type = record["Dataset"]
+
+            if model_name not in metrics_summary_upload:
+                metrics_summary_upload[model_name] = {}
+
+            metrics_summary_upload[model_name][dataset_type] = {
+                "accuracy": record["accuracy"],
+                "precision": record["precision"],
+                "recall": record["recall"],
+                "f1_score": record["f1_score"]
+            }
+
+        return template.TemplateResponse("dashboard.html", {
+            "request": request,
+            "data_info": data_upload,
+            "columns_info": columns_info_upload,
+            "model_info": model_info_upload,
+            "metrics_summary": metrics_summary_upload,
+            "plot_image_path": "static/model_comparison_upload.png"
+
+    })
+
+    except ValueError as e:
+        return {"error": str(e)}
+
+    except Exception as e:
+        return {"error": f"Unexpected Error: {str(e)}"}
+