Skip to content

benchopt/benchmark_nanogpt

Repository files navigation

Benchmarking deep learning optimization with nanoGPT

Build Status Python 3.10+

This benchmark is dedicated to evaluate new deep learning optimization methods on the nanoGPT architecture. The optimization problem is defined as in the original speedrun of nanoGPT (see modded nanogpt):

  • The training and validation is perfromed on FineWeb -- Do not change the dataloaders.
  • The training is stopped once the validation loss is below 3.28. (Still todo)

For now, the repository contains a single solver, Adam, and run on CPU. The dataloaders are working but with fixed sequence length of 128 tokens. We used the original code from nanoGPT (GPT2 from llm.c), but use the simple dataloader from `modded-nanogpt`_.

TODO:

  • Tweak the dataloaders to make it more efficient/less error prone.
  • See if we want to add imporevments to the architecture (QK-norm, Rotary embeddings, etc.).

Install

This benchmark can be run using the following commands:

$ pip install -U benchopt
$ git clone https://github.com/tomMoral/benchmark_nanogpt
$ benchopt run benchmark_nanogpt

Apart from the problem, options can be passed to benchopt run, to restrict the benchmarks to some solvers or datasets, e.g.:

$ benchopt run benchmark_nanogpt -s solver1 -d dataset2 --max-runs 10 --n-repetitions 10

Use benchopt run -h for more details about these options, or visit https://benchopt.github.io/api.html.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages