Skip to content

techtana/ml-packaging-toolbox

Repository files navigation

ML Training and Packaging Toolbox

Shared development toolbox for engineers. Provides reusable data + modeling pipelines and a unified packaging/deployment client for ml-deployment-ecosystem. Not for storage of models/data or high-frequency production extraction.

Repo Structure

ml_packaging_toolbox/
├── README.md
├── requirements.txt
├── requirements-test.txt
├── examples/                         # Usage examples (start here)
│   └── end_to_end.py                 # Full pipeline → package (deploy optional)
├── ml_packaging_toolbox/
│   ├── data_pipeline/
│   │   ├── extraction/               # LT query builders + RT normalisation
│   │   ├── preprocess/               # sklearn-compatible preprocessors
│   │   └── assembly/                 # DataAssemblyPipeline (LT + RT paths)
│   ├── model_pipeline/
│   │   ├── filters/                  # Pre-modeling data filtering
│   │   ├── model_selection/          # CV / HPO utilities
│   │   └── modeling/                 # Model wrappers (sklearn estimators)
│   ├── model_deployment/
│   │   └── packaging/                # Integration seam with ml-deployment-ecosystem
│   └── utils/
└── tests/                            # Unit tests guarding the packaging contract

Deployment Integration Contract

model_deployment/packaging/packager.py produces a zip with exactly:

File Consumed by deployment server
model.pkl POST /deploy/ → stored in MongoDB; loaded for scoring
preprocessors.pkl reconcile step (optional, depends on server wiring)
context.yml POST /deploy/ registration + context store
features.json ordered feature alignment during inference
test_data.json POST /deploy/validate/<model_id> smoke test

context.yml schema

name: my_model
version: 1.0.0
device_id: deviceA          # primary device tag (informational)
required_contexts:
  - device_id: deviceA
    event_code: EVT_001
    features: [feat1, feat2, feat3]
  - device_id: deviceB      # optional: multi-device models list multiple entries
    event_code: EVT_002
    features: [feat4, feat5]

required_contexts drives the server's dispatch and inference logic:

  • Each entry names a device_id, event_code, and the feature names that device contributes.
  • The deployment server waits until all listed entries have arrived for the same material_id before scoring.
  • Feature vectors are assembled in the order required_contexts is listed.

Any changes to data_pipeline/ or packaging.py require matching changes to ml_deployment_server/webservice/mod/deploy/ (per README warning).

Quick Start

Install (editable) + run tests

py -m venv .venv
.\.venv\Scripts\python -m pip install -U pip
.\.venv\Scripts\python -m pip install -r requirements.txt -r requirements-test.txt
.\.venv\Scripts\python -m pip install -e .
.\.venv\Scripts\python -m pytest -q

Run the end-to-end demo

.\.venv\Scripts\python examples\end_to_end.py

The demo always builds the artifact under ./packages/. The deploy step is optional and requires a running ml-deployment-ecosystem service.

Git Etiquette

  • Keep changes small (one bug / one feature per branch)
  • One new feature = one branch → PR to main
  • Changes to data_pipeline or packaging.py → notify deployment server team

About

Shared development toolbox for engineers. Provides reusable data + modeling pipelines and a unified packaging/deployment client for ml-deployment-ecosystem. Not for storage of models/data or high-frequency production extraction.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages