Skip to content

utkarshg1/ml_project_structure

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Iris Classification Project

A machine learning project for classifying Iris flower species using Logistic Regression with a Streamlit web interface.

πŸ“‹ Table of Contents

🌸 Overview

This project implements a machine learning pipeline to classify Iris flowers into three species (Setosa, Versicolor, Virginica) based on their sepal and petal measurements. The project includes data preprocessing, model training, evaluation, and a user-friendly Streamlit web application for making predictions.

✨ Features

  • Automated Data Pipeline: Automatically downloads and processes Iris dataset
  • Machine Learning Pipeline: Complete preprocessing with imputation and standardization
  • Model Training & Evaluation: Comprehensive model evaluation with cross-validation
  • Web Interface: Interactive Streamlit app for real-time predictions
  • Logging: Comprehensive logging system using Loguru
  • Modular Design: Well-structured, reusable code components

πŸ“ Project Structure

iris-classification/
β”œβ”€β”€ .gitignore            # Git ignore rules
β”œβ”€β”€ .python-version       # Python version specifications
β”œβ”€β”€ README.md            # Project documentation
β”œβ”€β”€ app.py               # Streamlit web application
β”œβ”€β”€ main.py              # Main training pipeline
β”œβ”€β”€ template.py          # Project template/setup script
β”œβ”€β”€ pyproject.toml       # Project configuration and dependencies (uv)
β”œβ”€β”€ requirements.txt     # Dependencies list
β”œβ”€β”€ uv.lock             # Dependency lock file (uv)
β”œβ”€β”€ data/               # Data directory (auto-generated)
β”œβ”€β”€ models/             # Trained models directory (auto-generated)
β”œβ”€β”€ logs/               # Application logs (auto-generated)
└── src/
    β”œβ”€β”€ __init__.py
    β”œβ”€β”€ constants.py      # Project constants and configuration
    β”œβ”€β”€ data.py          # Data download functionality
    β”œβ”€β”€ logging_config.py # Logging configuration
    β”œβ”€β”€ model_evaluator.py # Model evaluation utilities
    β”œβ”€β”€ model_trainer.py  # Model training pipeline
    └── predict.py       # Prediction utilities

πŸš€ Installation

Prerequisites

  • Python 3.8+ (as specified in .python-version)
  • uv package manager (recommended)

Setup

  1. Clone the repository

    git clone <repository-url>
    cd iris-classification
  2. Install uv (if not already installed)

    # On macOS and Linux
    curl -LsSf https://astral.sh/uv/install.sh | sh
    
    # On Windows
    powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
  3. Install dependencies using uv

    uv sync

    Alternative: Using pip

    pip install -r requirements.txt

πŸ“– Usage

Training the Model

Run the complete training pipeline:

# Using uv
uv run main.py

# Or activate the environment first
uv run --with-requirements requirements.txt python main.py

This will:

  • Download the Iris dataset
  • Preprocess the data (handle duplicates, split features/target)
  • Train a Logistic Regression model with preprocessing pipeline
  • Evaluate the model performance
  • Save the trained model

Running the Web Application

Launch the Streamlit app:

# Using uv
uv run streamlit run app.py

# Traditional method
streamlit run app.py

Then open your browser to http://localhost:8501 and:

  1. Enter the sepal length, width, petal length, and width measurements
  2. Click "Predict" to get the species classification and prediction probabilities

Example Usage

from src.predict import IrisPredictor

# Load the trained model
predictor = IrisPredictor()

# Make a prediction
prediction = predictor.predict(predictor.to_dataframe(5.1, 3.5, 1.4, 0.2))
probabilities = predictor.predict_proba(predictor.to_dataframe(5.1, 3.5, 1.4, 0.2))

print(f"Predicted species: {prediction}")
print(f"Prediction probabilities: {probabilities}")

πŸ€– Model Details

Algorithm

  • Model: Logistic Regression
  • Preprocessing Pipeline:
    • Simple Imputer (median strategy)
    • Standard Scaler for feature normalization

Performance Metrics

The model is evaluated using:

  • F1-score (macro average)
  • Classification report
  • 5-fold cross-validation
  • Training and testing performance comparison

Dataset

  • Source: Iris Dataset
  • Features: 4 numerical features (sepal_length, sepal_width, petal_length, petal_width)
  • Target: 3 classes (setosa, versicolor, virginica)
  • Size: ~150 samples

πŸ”§ Configuration

Project Configuration (pyproject.toml)

This project uses pyproject.toml for modern Python packaging and dependency management with uv.

Application Configuration (src/constants.py)

Key configuration parameters:

URL = "https://raw.githubusercontent.com/utkarshg1/iris_data/refs/heads/main/iris.csv"
DATA_PATH = Path("data", "iris.csv")
MODEL_PATH = Path("models", "iris_model.joblib")
TARGET = "species"
IMPUTE_STRAT = "median"
TEST_SIZE = 0.33
RANDOM_STATE = 21

Dependencies

  • All dependencies are managed through uv.lock for reproducible builds
  • requirements.txt is also available for traditional pip installations

πŸ“Š API Reference

IrisPredictor Class

class IrisPredictor:
    def __init__(self, model_path: Path = MODEL_PATH)
    def to_dataframe(self, sep_len: float, sep_wid: float, pet_len: float, pet_wid: float) -> pd.DataFrame
    def predict(self, x: pd.DataFrame) -> str
    def predict_proba(self, x: pd.DataFrame) -> pd.DataFrame

ModelTrainer Class

class ModelTrainer:
    def __init__(self, model_path: Path = MODEL_PATH)
    def create_pipeline(self) -> Pipeline
    def train_model(self, xtrain: pd.DataFrame, ytrain: pd.Series)
    def save_model(self)

ModelEvaluator Class

class ModelEvaluator:
    def __init__(self, model: Pipeline)
    def evaluate(self, xtrain, ytrain, xtest, ytest)

πŸ“ Logging

The project uses Loguru for comprehensive logging:

  • Console Output: Colored, formatted logs for development
  • File Output: Rotating log files in logs/app.log
  • Log Rotation: 10MB rotation with 7-day retention
  • Compression: Automatic ZIP compression of old logs

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Update dependencies if needed:
    uv add <package-name>  # Add new dependency
    uv sync                # Sync dependencies
  5. Commit your changes (git commit -m 'Add amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

Development Workflow with uv

# Run scripts
uv run main.py
uv run streamlit run app.py

# Update dependencies
uv sync --upgrade

πŸ‘¨β€πŸ’» Author

Utkarsh Gaikwad

πŸ“„ License

This project is open source and available under the MIT License.


Note: This project uses uv for fast and reliable Python package management. Make sure to run uv run python main.py first to train and save the model before using the Streamlit application.

About

This is modular code for project structure of a machine learning project

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages