Skip to content

Add Parameter Golf submission: Depth12 Dim416 KV4#71

Open
AntDX316 wants to merge 3 commits intoopenai:mainfrom
AntDX316:antdx316-parameter-golf-submission
Open

Add Parameter Golf submission: Depth12 Dim416 KV4#71
AntDX316 wants to merge 3 commits intoopenai:mainfrom
AntDX316:antdx316-parameter-golf-submission

Conversation

@AntDX316
Copy link

Adds a 10-minute 8xH100 submission for the 16MB track.

Summary:

  • Layout: VOCAB_SIZE=1024 NUM_LAYERS=12 MODEL_DIM=416 NUM_HEADS=8 NUM_KV_HEADS=4 MLP_MULT=2
  • Tied embeddings: TIE_EMBEDDINGS=1
  • Global batch: TRAIN_BATCH_TOKENS=524288
  • Stop condition: MAX_WALLCLOCK_SECONDS=600

Results:

  • final_int8_zlib_roundtrip_exact val_loss: 2.28096783
  • final_int8_zlib_roundtrip_exact val_bpb: 1.35091763
  • Total submission size int8+zlib: 14301562 bytes
  • Timed training stopped at 9067/20000 due to the wallclock cap

Included in record folder:

  • README.md
  • submission.json
  • train.log
  • train_gpt.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant