Library contains most useful functions to participate in Kaggle and other machine learning competitions. Module is divided into parts usually included in machine learning application pipeline: Preprocessing -> Features engineering -> Features selection -> Model tuning -> Model training -> Model predicting -> Building ensemble of models
All dependencies listed in requirements.txt
- clone repository
cd KaggleLibpip install -r requirements.txtpip install .- check installation by running example:
python examples/example.py
The library contains following parts:
-
Model - generic class for machine learning models
type: model type (XGBoost, LightGBM, Keras or Scikit-Learn)params: dictionary of model parametersmodel: instance of model objectcv_score: cross-validation score
-
Preprocessing
hash_data: hashing of categorical columns (one-hot)normalize_data: numerical data normalization
-
Feature engineering
make_numerical_interactions: feature interactions of 2 and 3 order, operations: sum, division, multipliciation, divisionmake_categorical_interactions: categorical feature interactions of 2 and 3 ordercategorical_target_encoding:logarithm: log feature transformationexponent: exponent feature transformationsigmoid: sigmoid feature transformationtrgonometry: sin, cos, tan feature transformation
-
Feature selection
genetic_feature_selection: select subset of features with best cross-validation metric by genetic algorithm (evolutional change of features subsets)
-
Model tuning
cross_validation: calculate cross-validation score of a modeltune_lgbm: find best LightGBM parameters by HyperOpttune_xgb: find best XGBoost parameters by HyperOpt
-
Model training
train_keras: train Keras modeltrain_lgbm: train LightGBM modeltrain_xgb: train XGBoost modeltrain_sklearn: train Scikit-Learn mdoel
-
Model predicting
predict_keras: prediction by Keras modelpredict_lgbm: prediction by LightGBM modelpredict_xgb: prediction by XGBoost modelpredict_sklearn: prediction by Scikit-Learn model
-
Model ensembles
stacking: creating stack of model using out-of-fold predictions technique
-
Utils
make_folds: split data into foldsgenerate_keras_model: generate Keras model by dictionaryHistoryCallback: callback to preserve Keras training information on every epoch