SeDiR: Self-Disstilled Reasoner for Self-Improving LLMs

News:

[Coming soon]: Full code, models, datasets, more r&d...
[24/11/2024]: Uploaded paper to github repo.

Absrtact:

In recent months, OpenAI o1 has shown promising progress in solving complex reasoning tasks by synthesizing long chain-of-thoughts (CoT) before giving a final answer. This approach has demonstrated the potential to enhance performance on reasoning and coding tasks by increasing test-time compute. Existing open-source approaches remain limited by the need for human labeling, distilled datasets, or grounded verifiers, however a open-ended self-improving framework has yet to be fully explored with open-ended reasoning tasks.

This paper introduces SeDiR, a novel framework for enabling fully open-ended self-improvement in reasoning LLMs. By leveraging the diversity of data at both pretraining and post-training stages, SeDiR iteratively generates and scores high-quality reasoning traces without requiring human intervention or seed data. This is a report of replicating o1 like reasoning capabilities with open-ended self-improving systems.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
main.pdf		main.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SeDiR: Self-Disstilled Reasoner for Self-Improving LLMs

News:

Absrtact:

More testing, r&d, models, datasets, code coming soon!

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

SeDiR: Self-Disstilled Reasoner for Self-Improving LLMs

News:

Absrtact:

More testing, r&d, models, datasets, code coming soon!

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages