Add MLflow experiment tracking integration#79
Draft
buntingj-vt wants to merge 2 commits into
Draft
Conversation
Successfully integrated MLflow experiment tracking into the Flux fine-tuner
project, providing an alternative/complementary tracking solution to Weights
& Biases.
Files Created:
- mlflow_client.py: New module handling MLflow tracking operations
- MLflowClient class for experiment and run management
- Logs training parameters, metrics, and artifacts
- Error handling to prevent training interruption
- MLFLOW_INTEGRATION.md: Comprehensive documentation covering setup,
usage, features, and troubleshooting
Files Modified:
- train.py:
- Added MLflowClient import
- Updated CustomSDTrainer to support MLflow logging in:
- hook_train_loop(): Loss logging
- sample(): Sample image logging
- post_save_hook(): Weight saving
- Updated CustomJob to accept mlflow_client parameter
- Added three new input parameters:
- mlflow_tracking_uri: MLflow server URI
- mlflow_experiment_name: Experiment name (default: flux-lora-training)
- mlflow_run_name: Optional run name
- Created unified tracking_config dict for both W&B and MLflow
- Added MLflow client initialization and cleanup
- cog.yaml: Added mlflow==2.18.0 dependency
- README.md: Added MLflow integration to features list
Integration Architecture:
- Non-invasive: Works alongside existing W&B integration
- Optional: Only activated when mlflow_tracking_uri is provided
- Comprehensive tracking: hyperparameters, loss, samples, weights
- All logging wrapped in try-except for robustness
Key Features:
- Track all training hyperparameters
- Log training loss at each step
- Save sample images during training
- Archive final LoRA weights as artifacts
- Compatible with concurrent W&B usage
Usage Example:
train(
input_images=Path("images.zip"),
mlflow_tracking_uri="http://localhost:5000",
mlflow_experiment_name="flux-lora-experiment",
mlflow_run_name="baseline-run",
...
)
Remove unused artifact_name variable to fix ruff check failure. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Successfully integrated MLflow experiment tracking into the Flux fine-tuner project, providing an alternative/complementary tracking solution to Weights & Biases.
Files Created:
Files Modified:
Integration Architecture:
Key Features:
Usage Example:
train( input_images=Path("images.zip"), mlflow_tracking_uri="http://localhost:5000", mlflow_experiment_name="flux-lora-experiment", mlflow_run_name="baseline-run", ... )