Skip to content

Question when reproducing the experiment in the paper #6

@Kevinstone-199898

Description

@Kevinstone-199898

Hi, thanks for your sharing of the code. I have a question when I try to reproduce the experiment in the paper or more specifically, this figure.
Image
I followed the code in this repo and set the parameters of 1B models according to the paper:
Image
I also set the global batch size the same with the paper which is 512.
I use 8 H800 with 80GB. The Validation loss during training I got is as follow:
Image
which seems very different from the figure in the paper. And I wonder whether I am doing something wrong or why is this. I hope to get your help. thanks a lot!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions