Improved logging to support lm-eval>=0.4.8#154
Improved logging to support lm-eval>=0.4.8#154hjenryin wants to merge 3 commits intomlfoundations:mainfrom
Conversation
neginraoof
left a comment
There was a problem hiding this comment.
Thanks a lot this is very helpful. Have you performed any parity experiments with older and newer lm-eval and vllm versions? If so, can you share the result with us?
I'm very interested to know iif we can try data-parallel mode for parity experiments
I don't think I have the resource to run everything, but I think this table should demonstrate that the performance is consistent. The official numbers come from https://huggingface.co/open-thoughts/OpenThinker-7B, where the authors reported using evalchemy. The
|
Current code base is based on
lm-eval<=0.4.7.lm-evaladopts the best practice of logging in0.4.8(EleutherAI/lm-evaluation-harness#2203), making it incompatible with the current version of evalchemy. Minimal changes needs to be made to support newer versions (and to adopt better logging practices). I have no problem running this withlm-eval==0.4.9.2andvllm==0.13.0with this patch; also, these changes are backward-compatible.lm-eval 0.4.8can be particularly helpful. It starts to supportvllm 0.7+(EleutherAI/lm-evaluation-harness#2706), with easier local data parallel setup. I've also included a section on how to run data-parallel with vllm for faster evaluation.This is similar to #124, but I think it's better to decouple the logging so that people can know where the message is coming from.
I also encourage the authors set up version control and publish this on pypi; it would make this wonderful tool be more easier to use by everyone!