Distributed ML with Dask

A demonstration repository for scalable machine learning workflows using Dask on larger-than-memory tabular datasets.

Purpose

This repository illustrates practical machine learning engineering patterns for:

distributed data loading
scalable preprocessing
larger-than-memory workflows
structured ML pipelines
configuration-driven experimentation

It is designed as a public demonstration repository using synthetic or public-style tabular data. Proprietary systems and internal datasets are not included.

Repository Structure

src/
    data_generation.py
    distributed_preprocessing.py
    distributed_training.py
    run_pipeline.py

experiments/
    pipeline_config.yaml

results/

Quick Start

Install dependencies:

pip install -r requirements.txt

Run the pipeline:

python src/run_pipeline.py

Pipeline summary will be written to:

results/pipeline_summary.txt

Notes

This repository contains demonstration implementations illustrating scalable machine learning engineering patterns using Dask.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed ML with Dask

Purpose

Repository Structure

Quick Start

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
experiments		experiments
results		results
src		src
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Distributed ML with Dask

Purpose

Repository Structure

Quick Start

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages