Skip to content

alkinun/SeDiR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

SeDiR: Self-Disstilled Reasoner for Self-Improving LLMs

News:

  • [Coming soon]: Full code, models, datasets, more r&d...
  • [24/11/2024]: Uploaded paper to github repo.

Absrtact:

In recent months, OpenAI o1 has shown promising progress in solving complex reasoning tasks by synthesizing long chain-of-thoughts (CoT) before giving a final answer. This approach has demonstrated the potential to enhance performance on reasoning and coding tasks by increasing test-time compute. Existing open-source approaches remain limited by the need for human labeling, distilled datasets, or grounded verifiers, however a open-ended self-improving framework has yet to be fully explored with open-ended reasoning tasks.

This paper introduces SeDiR, a novel framework for enabling fully open-ended self-improvement in reasoning LLMs. By leveraging the diversity of data at both pretraining and post-training stages, SeDiR iteratively generates and scores high-quality reasoning traces without requiring human intervention or seed data. This is a report of replicating o1 like reasoning capabilities with open-ended self-improving systems.

More testing, r&d, models, datasets, code coming soon!

About

SeDiR: Self-Disstilled Reasoner for Self-Improving LLMs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors