Skip to content

appliedalgorithmslab/distributed-ml-dask

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Distributed ML with Dask

A demonstration repository for scalable machine learning workflows using Dask on larger-than-memory tabular datasets.

Purpose

This repository illustrates practical machine learning engineering patterns for:

  • distributed data loading
  • scalable preprocessing
  • larger-than-memory workflows
  • structured ML pipelines
  • configuration-driven experimentation

It is designed as a public demonstration repository using synthetic or public-style tabular data. Proprietary systems and internal datasets are not included.

Repository Structure

src/
    data_generation.py
    distributed_preprocessing.py
    distributed_training.py
    run_pipeline.py

experiments/
    pipeline_config.yaml

results/

Quick Start

Install dependencies:

pip install -r requirements.txt

Run the pipeline:

python src/run_pipeline.py

Notes

This repository contains demonstration implementations illustrating scalable machine learning engineering patterns using Dask.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages