Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
6449526
Merge pull request #1 from StanChiangTW/kiran
kirandeepgaur Apr 1, 2025
7d2c331
Merge pull request for models pkl
kirandeepgaur Apr 1, 2025
511eb1b
Update model_prediction.py
Prashant160498 Apr 1, 2025
f0b9f34
model_prediction.py
Prashant160498 Apr 1, 2025
dc69218
model_evaluation.py
Prashant160498 Apr 4, 2025
1a96cbe
model_evaluation.py
Prashant160498 Apr 4, 2025
0d542ad
model_evaluation.py
Prashant160498 Apr 4, 2025
2f82644
fastapi
StanChiangTW Apr 5, 2025
9472b39
Delete mlops directory
StanChiangTW Apr 5, 2025
26b1559
Dockerfile.api
Prashant160498 Apr 5, 2025
ff1fc94
Update requirements.txt
Prashant160498 Apr 5, 2025
536aae1
requirements.txt
Prashant160498 Apr 5, 2025
d1ebd5f
Readme completed and cleaning of the code
JulienD00 Apr 6, 2025
07307eb
AWS Deployment
StanChiangTW Apr 7, 2025
a2cdc44
Docker&AWS Readme
StanChiangTW Apr 8, 2025
9e1dd8c
AWS Photo Update
StanChiangTW Apr 8, 2025
e87625e
Update README.md
StanChiangTW Apr 8, 2025
65cbc08
add dashboard
StanChiangTW Apr 9, 2025
14dd677
Merge pull request #3 from StanChiangTW/kiran
kirandeepgaur Apr 10, 2025
8e2de25
move dashboard.html location
StanChiangTW Apr 10, 2025
922dc1d
change json to html table
StanChiangTW Apr 10, 2025
8ac5b79
Changes in Model section
Apr 11, 2025
7fc1aa3
Merge pull request #6 from StanChiangTW/dashboard
kirandeepgaur Apr 11, 2025
c10db3d
Changes in Poetry to add Jinja2
Apr 11, 2025
cce698e
Merge pull request #7 from StanChiangTW/dashboard
kirandeepgaur Apr 11, 2025
f81ecc2
fix data info format
StanChiangTW Apr 11, 2025
b995a68
update README and fix html table overflow issue
StanChiangTW Apr 11, 2025
1c38b10
fix dashboard showing issue
StanChiangTW Apr 11, 2025
c443844
fix dashboard showing issue with Youtube
StanChiangTW Apr 11, 2025
c52b928
fix dashboard showing issue with gif
StanChiangTW Apr 11, 2025
0d23a45
fix dashboard showing issue with gif another method
StanChiangTW Apr 11, 2025
c063129
fix dashboard showing issue with imgur
StanChiangTW Apr 11, 2025
6ad1953
fix dashboard showing issue with repo
StanChiangTW Apr 11, 2025
8999c7c
Add files via upload
Prashant160498 Apr 11, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .DS_Store
Binary file not shown.
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
*.pyc
.env
models/
models/
mlops/
*.pyc # Python virtual environment files
10 changes: 5 additions & 5 deletions src/api/Dockerfile.api → Dockerfile.api
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
# This Dockerfile is meant to be run from the root of the repo
# Base Python image (Python 3.9.6)
FROM python:3.12.4

# Base Python image (Python 3.12)
FROM python:3.12-slim

# Set the working directory in the container
WORKDIR /app

# Copy the current folder (on the host machine) that contains the whole app project into the container /app folder
COPY . /app

RUN pip install .
# install the required packages
RUN pip install --no-cache-dir -r requirements.txt

# Expose the port on which FastAPI will run
EXPOSE 8000
Expand All @@ -18,4 +18,4 @@ EXPOSE 8000
ENV DSBA_MODELS_ROOT_PATH=/app/models

# Define the default command run when starting the container: Run the FastAPI app using Uvicorn
CMD ["uvicorn", "src.api.api:app", "--host", "0.0.0.0", "--port", "8000"]
CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000"]
Binary file added Project_Report.pdf
Binary file not shown.
131 changes: 119 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
# DSBA Platform

A toy MLOps Platform for educational purposes
A toy MLOps Project

## Project Structure

This project has a fairly standard structure (but it is still adapted to be simplified compared to a slightly more typical structure):

- a `pyproject.toml` file contains the project metadata, including the dependencies. It is common to see a "setup.py" file in Python projects but we use this more modern approach to define the project metadata.
- The `src` folder contains the code code (dsba) as well as the code for the CLI, the API, the web app, the notebooks, as well as the Dockerfiles.
Expand Down Expand Up @@ -98,9 +97,52 @@ For windows, something of the sort may work:
set DSBA_MODELS_ROOT_PATH="C:\path\to\your\models"
```

## CLI

List models registered on your system:



# MLOps student project
This project aimed to create an interface for a bank employees that allows them to choose a prediction model, a customer dataset, and then provide a result on whether customers will churn or not.

To do this, we based our work on a machine learning project on bank churners that one of us had previously created. This project used the dataset of a kaggle challenge available at : https://www.kaggle.com/datasets/thedevastator/predicting-credit-card-customer-attrition-with-m


## Important Folders and Files

### Data

```bash
data/
```

This contains:
- the original `BankChurners.csv` file of the Kaggle challenge used for the original ML project
- the `X_test.csv` and the `y_test.csv` files created after the preprocessing of the original dataset, this files are the ones used for the rest of the MLOps project


### Models

```bash
models/
```

This contains the different models trained and available to predict results. The 4 models are :
- `lgbm_model.pkl`
- `rf_model.pkl`
- `svm_model.pkl`
- `xgb_model.pkl`


### Src

```bash
src/
```
This is the main source directory where the code resides.

#### CLI

Not used in the MLOps project, normally it's used to list models registered on your system:

```bash
src/cli/dsba_cli list
Expand All @@ -112,17 +154,82 @@ Use a model to predict on a file:
src/cli/dsba_cli predict --input /path/to/your/data/file.csv --output /path/to/your/output/file.csv --model-id your_model_id
```

### Notebook
### API
An API is provided, it allows to interact with models. You can start the API by running:
```bash
uvicorn api:app --reload
```

Dockerized API
To run the API in a Docker container, follow these steps:
1. Build the Docker image:
```bash
docker build -f Dockerfile.api -t fastapi .
```
2. Run the Docker container:
```bash
docker run -d -p 8000:80 fastapi
```
The API will be available at http://127.0.0.1:8000/

Note: Ensure Docker is installed on your machine.
3. Tag the image
```bash
docker tag fastapi stanchiangtw/fastapi
```

Note: Ensure Docker Desktop is logged in by using
```bash
docker login
```

4. Push the tag image to Docker Hub
```bash
docker push stanchiangtw/fastapi:latest
```


AWS ECS & EC2
We successfully implemented the deployment and scaling of Docker containers on AWS ECS. The process involved the following steps:
- Creating an ECS cluster
- Defining a Task Definition
- Configuring a Security Group
- Setting up a service
- Accessing the running service

The API was successfully running at http://13.37.241.233:8000/.

However, the service quickly exceeded the free tier quota, resulting in costs ($12.46). Before stopping the service, we took a screenshot to document the progress made.
![](https://drive.google.com/uc?export=view&id=1YI7dOU-cK-AJIUZxjXoJGsSZiGhT2f55)


### Templates
The `templates/` directory contains HTML templates used for rendering web views in our application. The main file `dashboard.html` serves as the user interface for our ML model evaluation dashboard.
![](./static/dashboard.gif)


### static
The `static/` directory stores static resources like generated plots (.png files). The image is served directly to the client browser and don't change during runtime. The visualization image created during model evaluation is stored here, allowing it to be displayed in the dashboard.


### Dockerfile
The .api file includes all the necessary instructions, organized in a way that ensures Docker executes them correctly.

### Requirements
The .txt file contains all the necessary Python packages required to process the data, train the model, and run the API.


#### DSBA
This contains the core functionality of the MLOps project, including model handling, data preprocessing, and utilities for training and predicting.

...

### API

...
#### Notebooks
This contains:
- `model_training_example.ipynb` the original example of Notebook of the MLOps platform project
- `Bank_MLOps.ipynb` the notebook of an ML project from which we used it for our MLOps project. The original code from the ML project has not been modified; it may contain some elements from LLM. To use the notebook, navigate to the `notebooks/` folder and open the file. You can use the provided utilities to train models, preprocess data, and evaluate performance.

### Dockerized API

...

### REMARKS
remove extra preprocessing files
## REMARKS
No remarks
182 changes: 182 additions & 0 deletions api.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
from fastapi import FastAPI, UploadFile, File
import pandas as pd
import joblib
import pickle
from src.dsba.data_ingestion.data_ingestion import load_csv
from src.dsba.preprocessing_a import preprocess_data
from src.dsba.model_evaluation import evaluate_models
import io
from pathlib import Path
from typing import Optional
import sys
from fastapi.responses import HTMLResponse
from fastapi.staticfiles import StaticFiles
from fastapi import Request
import matplotlib.pyplot as plt
from fastapi.templating import Jinja2Templates
import numpy
sys.modules['numpy._core'] = numpy.core





df = load_csv("data/BankChurners.csv")
X_train, X_test, y_train, y_test = preprocess_data("data/BankChurners.csv")

lgbm_model = joblib.load('models/lgbm_model.pkl')
rf_model = joblib.load('models/rf_model.pkl')
xgb_model = joblib.load('models/xgb_model.pkl')
svm_model = joblib.load('models/svm_model.pkl')

models = {
"LGBM": lgbm_model,
"RandomForest": rf_model,
"XGBoost": xgb_model,
"SVM": svm_model
}

results, model_comparison = evaluate_models(models, X_train, y_train, X_test, y_test)

png_path = "static/model_comparison.png"
model_comparison.savefig(png_path)



data_info = df.iloc[:5].to_html(classes='table table-striped table-bordered', index=False)
columns_df = pd.DataFrame({'Column Name': df.columns.tolist()})
columns_info = columns_df.to_html(classes='table table-striped table-bordered', index=False)
model_info = {"Show the model info": {}}
for model_name, model in models.items():
model_info["Show the model info"][model_name] = model.__class__.__name__

results_dict = results.to_dict(orient="records")
metrics_summary = {}

for record in results_dict:
model_name = record["Model"]
dataset_type = record["Dataset"]

if model_name not in metrics_summary:
metrics_summary[model_name] = {}

metrics_summary[model_name][dataset_type] = {
"accuracy": record["accuracy"],
"precision": record["precision"],
"recall": record["recall"],
"f1_score": record["f1_score"]
}

app = FastAPI()

template = Jinja2Templates(directory="templates")
app.mount("/static", StaticFiles(directory="static"), name="static")

async def tmp_save_file(upload_file: UploadFile) -> Path:
try:
temp_dir = Path("temp_uploads")
temp_dir.mkdir(exist_ok=True)

# Generate Temporary file path
file_path = temp_dir / f"{upload_file.filename}"

# Read the contents
contents = await upload_file.read()

# Write the temporary file
with open(file_path, "wb") as f:
f.write(contents)

return file_path

except Exception as e:
raise ValueError(f"Error when saving the file: {str(e)}")


@app.get("/")
async def read_root(request: Request):
return template.TemplateResponse("dashboard.html", {
"request": request,
"data_info": data_info,
"columns_info": columns_info,
"model_info": model_info,
"metrics_summary": metrics_summary,
"plot_image_path": "static/model_comparison.png"

})

@app.post("/upload-and-evaluate/")
async def upload_and_evaluate(request: Request, file: Optional[UploadFile] = None):
try:

if file is None:
return template.TemplateResponse("dashboard.html", {
"request": request,
"data_info": data_info,
"columns_info": columns_info,
"model_info": model_info,
"metrics_summary": metrics_summary,
"plot_image_path": "static/model_comparison.png"

})


# Need HTML to design the layout to upload the file
temp_file_path = await tmp_save_file(file)

df_upload = pd.read_csv(temp_file_path)

X_train_upload, X_test_upload, y_train_upload, y_test_upload = preprocess_data(temp_file_path)


results_upload, model_comparison_upload = evaluate_models(models, X_train_upload, X_test_upload, y_train_upload, y_test_upload )

png_path_upload = "static/model_comparison_upload.png"
model_comparison_upload.savefig(png_path_upload)

# Need HTML to show the plot for model_comparison_upload

data_upload = df_upload.iloc[:5].to_html(classes='table table-striped table-bordered', index=False)# Over 5 records, the layout will be messy


columns_upload_df = pd.DataFrame({'Column Name': df_upload.columns.tolist()})
columns_info_upload = columns_upload_df.to_html(classes='table table-striped table-bordered', index=False)

model_info_upload = {"Show the model info": {}}
for model_name, model in models.items():
model_info_upload["Show the model info"][model_name] = model.__class__.__name__

results_dict_upload = results_upload.to_dict(orient="records")
metrics_summary_upload = {}


for record in results_dict_upload:
model_name = record["Model"]
dataset_type = record["Dataset"]

if model_name not in metrics_summary_upload:
metrics_summary_upload[model_name] = {}

metrics_summary_upload[model_name][dataset_type] = {
"accuracy": record["accuracy"],
"precision": record["precision"],
"recall": record["recall"],
"f1_score": record["f1_score"]
}

return template.TemplateResponse("dashboard.html", {
"request": request,
"data_info": data_upload,
"columns_info": columns_info_upload,
"model_info": model_info_upload,
"metrics_summary": metrics_summary_upload,
"plot_image_path": "static/model_comparison_upload.png"

})

except ValueError as e:
return {"error": str(e)}

except Exception as e:
return {"error": f"Unexpected Error: {str(e)}"}

Loading