Question about storing hf_ckpt of when training

Dear dFactory and LLaDA authors and developers,

Thanks for your great projects. I'm trying to follow the full training pipeline described in the README. However, when I tried to convert the checkpoints from the merged format. I could not find the `hf_ckpt` in the output dir as mentioned:

> **Important: Finding the Correct Input Path**
>
> The --input-path for the conversion script is the path to the saved Hugging Face checkpoint, not the root output directory you specified during training. The checkpoint is typically located in a subdirectory like:
>
> TRAIN_OUTPUT_DIR/checkpoints/global_step_XXX/hf_ckpt/

After I checked the `tasks/train_llada2_bd.py`, I found the storage code on line 558:

```python
    if args.train.global_rank == 0 and args.train.save_hf_weights and save_checkpoint_path is not None:
        hf_weights_path = os.path.join(save_checkpoint_path, "hf_ckpt")
        model_state_dict = ckpt_to_state_dict(
            save_checkpoint_path=save_checkpoint_path,
            output_dir=args.train.output_dir,
            ckpt_manager=args.train.ckpt_manager,
        )
        save_model_weights(hf_weights_path, model_state_dict, model_assets=model_assets)
        logger.info_rank0(f"Huggingface checkpoint saved at {hf_weights_path} successfully!")
```

This code seems to be executed after the entire training is complete. Does this mean that all checkpoints are available only after the entire training is fully complete？or does that mean I can't use checkpoints for testing while training is still going on? Thanks for your kindly answers!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about storing hf_ckpt of when training #19

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Question about storing hf_ckpt of when training #19

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions