A machine learning project for classifying Iris flower species using Logistic Regression with a Streamlit web interface.
- Overview
- Features
- Project Structure
- Installation
- Usage
- Model Details
- Configuration
- API Reference
- Logging
- Author
- License
This project implements a machine learning pipeline to classify Iris flowers into three species (Setosa, Versicolor, Virginica) based on their sepal and petal measurements. The project includes data preprocessing, model training, evaluation, and a user-friendly Streamlit web application for making predictions.
- Automated Data Pipeline: Automatically downloads and processes Iris dataset
- Machine Learning Pipeline: Complete preprocessing with imputation and standardization
- Model Training & Evaluation: Comprehensive model evaluation with cross-validation
- Web Interface: Interactive Streamlit app for real-time predictions
- Logging: Comprehensive logging system using Loguru
- Modular Design: Well-structured, reusable code components
iris-classification/
βββ .gitignore # Git ignore rules
βββ .python-version # Python version specifications
βββ README.md # Project documentation
βββ app.py # Streamlit web application
βββ main.py # Main training pipeline
βββ template.py # Project template/setup script
βββ pyproject.toml # Project configuration and dependencies (uv)
βββ requirements.txt # Dependencies list
βββ uv.lock # Dependency lock file (uv)
βββ data/ # Data directory (auto-generated)
βββ models/ # Trained models directory (auto-generated)
βββ logs/ # Application logs (auto-generated)
βββ src/
βββ __init__.py
βββ constants.py # Project constants and configuration
βββ data.py # Data download functionality
βββ logging_config.py # Logging configuration
βββ model_evaluator.py # Model evaluation utilities
βββ model_trainer.py # Model training pipeline
βββ predict.py # Prediction utilities
- Python 3.8+ (as specified in
.python-version) - uv package manager (recommended)
-
Clone the repository
git clone <repository-url> cd iris-classification
-
Install uv (if not already installed)
# On macOS and Linux curl -LsSf https://astral.sh/uv/install.sh | sh # On Windows powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
-
Install dependencies using uv
uv sync
Alternative: Using pip
pip install -r requirements.txt
Run the complete training pipeline:
# Using uv
uv run main.py
# Or activate the environment first
uv run --with-requirements requirements.txt python main.pyThis will:
- Download the Iris dataset
- Preprocess the data (handle duplicates, split features/target)
- Train a Logistic Regression model with preprocessing pipeline
- Evaluate the model performance
- Save the trained model
Launch the Streamlit app:
# Using uv
uv run streamlit run app.py
# Traditional method
streamlit run app.pyThen open your browser to http://localhost:8501 and:
- Enter the sepal length, width, petal length, and width measurements
- Click "Predict" to get the species classification and prediction probabilities
from src.predict import IrisPredictor
# Load the trained model
predictor = IrisPredictor()
# Make a prediction
prediction = predictor.predict(predictor.to_dataframe(5.1, 3.5, 1.4, 0.2))
probabilities = predictor.predict_proba(predictor.to_dataframe(5.1, 3.5, 1.4, 0.2))
print(f"Predicted species: {prediction}")
print(f"Prediction probabilities: {probabilities}")- Model: Logistic Regression
- Preprocessing Pipeline:
- Simple Imputer (median strategy)
- Standard Scaler for feature normalization
The model is evaluated using:
- F1-score (macro average)
- Classification report
- 5-fold cross-validation
- Training and testing performance comparison
- Source: Iris Dataset
- Features: 4 numerical features (sepal_length, sepal_width, petal_length, petal_width)
- Target: 3 classes (setosa, versicolor, virginica)
- Size: ~150 samples
This project uses pyproject.toml for modern Python packaging and dependency management with uv.
Key configuration parameters:
URL = "https://raw.githubusercontent.com/utkarshg1/iris_data/refs/heads/main/iris.csv"
DATA_PATH = Path("data", "iris.csv")
MODEL_PATH = Path("models", "iris_model.joblib")
TARGET = "species"
IMPUTE_STRAT = "median"
TEST_SIZE = 0.33
RANDOM_STATE = 21- All dependencies are managed through
uv.lockfor reproducible builds requirements.txtis also available for traditional pip installations
class IrisPredictor:
def __init__(self, model_path: Path = MODEL_PATH)
def to_dataframe(self, sep_len: float, sep_wid: float, pet_len: float, pet_wid: float) -> pd.DataFrame
def predict(self, x: pd.DataFrame) -> str
def predict_proba(self, x: pd.DataFrame) -> pd.DataFrameclass ModelTrainer:
def __init__(self, model_path: Path = MODEL_PATH)
def create_pipeline(self) -> Pipeline
def train_model(self, xtrain: pd.DataFrame, ytrain: pd.Series)
def save_model(self)class ModelEvaluator:
def __init__(self, model: Pipeline)
def evaluate(self, xtrain, ytrain, xtest, ytest)The project uses Loguru for comprehensive logging:
- Console Output: Colored, formatted logs for development
- File Output: Rotating log files in
logs/app.log - Log Rotation: 10MB rotation with 7-day retention
- Compression: Automatic ZIP compression of old logs
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Update dependencies if needed:
uv add <package-name> # Add new dependency uv sync # Sync dependencies
- Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
# Run scripts
uv run main.py
uv run streamlit run app.py
# Update dependencies
uv sync --upgradeUtkarsh Gaikwad
This project is open source and available under the MIT License.
Note: This project uses uv for fast and reliable Python package management. Make sure to run uv run python main.py first to train and save the model before using the Streamlit application.