Add MLflow experiment tracking integration by buntingj-vt · Pull Request #79 · replicate/flux-fine-tuner

buntingj-vt · 2025-11-30T17:39:51Z

Successfully integrated MLflow experiment tracking into the Flux fine-tuner project, providing an alternative/complementary tracking solution to Weights & Biases.

Files Created:

mlflow_client.py: New module handling MLflow tracking operations
- MLflowClient class for experiment and run management
- Logs training parameters, metrics, and artifacts
- Error handling to prevent training interruption
MLFLOW_INTEGRATION.md: Comprehensive documentation covering setup, usage, features, and troubleshooting

Files Modified:

train.py:
- Added MLflowClient import
- Updated CustomSDTrainer to support MLflow logging in:
  - hook_train_loop(): Loss logging
  - sample(): Sample image logging
  - post_save_hook(): Weight saving
- Updated CustomJob to accept mlflow_client parameter
- Added three new input parameters:
  - mlflow_tracking_uri: MLflow server URI
  - mlflow_experiment_name: Experiment name (default: flux-lora-training)
  - mlflow_run_name: Optional run name
- Created unified tracking_config dict for both W&B and MLflow
- Added MLflow client initialization and cleanup
cog.yaml: Added mlflow==2.18.0 dependency
README.md: Added MLflow integration to features list

Integration Architecture:

Non-invasive: Works alongside existing W&B integration
Optional: Only activated when mlflow_tracking_uri is provided
Comprehensive tracking: hyperparameters, loss, samples, weights
All logging wrapped in try-except for robustness

Key Features:

Track all training hyperparameters
Log training loss at each step
Save sample images during training
Archive final LoRA weights as artifacts
Compatible with concurrent W&B usage

Usage Example:
train( input_images=Path("images.zip"), mlflow_tracking_uri="http://localhost:5000", mlflow_experiment_name="flux-lora-experiment", mlflow_run_name="baseline-run", ... )

Successfully integrated MLflow experiment tracking into the Flux fine-tuner project, providing an alternative/complementary tracking solution to Weights & Biases. Files Created: - mlflow_client.py: New module handling MLflow tracking operations - MLflowClient class for experiment and run management - Logs training parameters, metrics, and artifacts - Error handling to prevent training interruption - MLFLOW_INTEGRATION.md: Comprehensive documentation covering setup, usage, features, and troubleshooting Files Modified: - train.py: - Added MLflowClient import - Updated CustomSDTrainer to support MLflow logging in: - hook_train_loop(): Loss logging - sample(): Sample image logging - post_save_hook(): Weight saving - Updated CustomJob to accept mlflow_client parameter - Added three new input parameters: - mlflow_tracking_uri: MLflow server URI - mlflow_experiment_name: Experiment name (default: flux-lora-training) - mlflow_run_name: Optional run name - Created unified tracking_config dict for both W&B and MLflow - Added MLflow client initialization and cleanup - cog.yaml: Added mlflow==2.18.0 dependency - README.md: Added MLflow integration to features list Integration Architecture: - Non-invasive: Works alongside existing W&B integration - Optional: Only activated when mlflow_tracking_uri is provided - Comprehensive tracking: hyperparameters, loss, samples, weights - All logging wrapped in try-except for robustness Key Features: - Track all training hyperparameters - Log training loss at each step - Save sample images during training - Archive final LoRA weights as artifacts - Compatible with concurrent W&B usage Usage Example: train( input_images=Path("images.zip"), mlflow_tracking_uri="http://localhost:5000", mlflow_experiment_name="flux-lora-experiment", mlflow_run_name="baseline-run", ... )

Remove unused artifact_name variable to fix ruff check failure. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

buntingj-vt and others added 2 commits November 21, 2025 01:43

Fix unused variable in mlflow_client.py

324bc76

Remove unused artifact_name variable to fix ruff check failure. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MLflow experiment tracking integration#79

Add MLflow experiment tracking integration#79
buntingj-vt wants to merge 2 commits into
mainfrom
jon/add-support-for-mlflow

buntingj-vt commented Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

buntingj-vt commented Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant