Skip to content

I failed to reproduce the Llama2-7b-4k (w/o SFT) in the paper #17

Description

@WNQzhu

Hi, I failed to reproduce the Llama2-7b-4k (w/o SFT) in the paper.

Here is our result:

Methods Tokens Coursera GSM QuALITY TOEFL CodeU SFiction Avg
(L-Eval)Llama2-7b-4k (w/o SFT) 4k 20.05 2.0 28.71 24.53 0.00 40.62 19.31
(Ours) Llama2-7b-4k (w/o SFT) 4k 15.26 19.0 30.69 13.01 3.33 35.93 19.54

Here is our experimental setting:
We change the llama2-chat-test.py file, disable the NTK parameters and using LLama2-7b to conduct the evaluation.
And run like this:
python3 Baselines/llama2-chat-test.py
--scale 7b
--max_length 4k
--metric exam_eval

What's the possible reason for that ? Should I adjust the prompt or other pamameters?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions