Skip to content

scrept for ray cluster trining #150

@AliMamaghani1999

Description

@AliMamaghani1999

I created a ray cluster with Helm and tried running the scripts below for multi-node training, but it got stuck at the first step of training (see the image below). Could you please help me with this?

Image

train_1.5b_grpo_tool server.sh
train_1.5b_grpo.sh

ray-values-verl.yaml

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions