This project implements an automated feature engineering system that can detect, generate, and select important features from any dataset. The system uses machine learning techniques to identify the most relevant features and create new ones to improve model performance.
- Automatic Feature Detection: Identifies numerical, categorical, and datetime features
- Feature Generation: Creates new features using various transformations and combinations
- Neural Network-based Feature Selection: Uses a deep learning model to select the most important features
- Interactive Visualizations: Provides detailed visualizations of feature importance and relationships
.
├── src/
│ ├── feature_detector.py # Feature type detection
│ ├── feature_generator.py # Feature generation
│ ├── feature_selector.py # Feature selection
│ ├── visualization.py # Visualization tools
│ └── main.py # Main pipeline
├── data/ # Data directory
├── notebooks/ # Jupyter notebooks
├── plots/ # Generated plots
├── output/ # Output files
├── requirements.txt # Dependencies
└── README.md # This file
- Clone the repository:
git clone <repository-url>
cd automated-feature-engineering- Create and activate a virtual environment:
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt-
Place your dataset in the
data/directory. -
Update the target column name in
main.pyif needed:
pipeline = AutomatedFeatureEngineering(target_col='your_target_column')- Run the pipeline:
python src/main.py- Check the results in:
plots/directory for visualizationsoutput/directory for processed data and feature importance scores
- Automatically identifies feature types
- Supports numerical, categorical, and datetime features
- Groups features by type for processing
- Creates polynomial features for numerical data
- Generates interaction terms
- Handles categorical encoding
- Extracts datetime components
- Uses a neural network to evaluate feature importance
- Selects features based on importance scores
- Provides feature rankings
- Creates feature importance plots
- Generates correlation matrices
- Shows feature distributions
- Provides interactive Plotly visualizations
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
If you use this code in your research, please cite:
@misc{automated-feature-engineering,
author = {Your Name},
title = {Automated Feature Engineering System},
year = {2024},
publisher = {GitHub},
url = {https://github.com/yourusername/automated-feature-engineering}
}