Add Parameter Golf submission prep tooling#72
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a6c2584b66
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
scripts/pg_lab.py
Outdated
| env.setdefault("DATA_PATH", dataset_path) | ||
| if tokenizer_path: | ||
| env.setdefault("TOKENIZER_PATH", tokenizer_path) | ||
| env.setdefault("VOCAB_SIZE", args.variant.removeprefix("sp") if args.variant.startswith("sp") else env.get("VOCAB_SIZE", "1024")) |
There was a problem hiding this comment.
Set VOCAB_SIZE from selected variant
When --variant is changed (for example sp4096), cmd_command updates DATA_PATH but leaves VOCAB_SIZE at the profile default because setdefault does not overwrite existing keys. The generated command can therefore pair a 4096-token dataset with VOCAB_SIZE=1024, which will fail once token IDs exceed 1023 (embedding index out of range) and makes non-sp1024 variants unusable from this helper.
Useful? React with 👍 / 👎.
scripts/pg_lab.py
Outdated
| if pre_quant and final: | ||
| data["quant_delta_bpb"] = round(float(final["val_bpb"]) - float(pre_quant["val_bpb"]), 8) | ||
| data["quant_delta_val_loss"] = round(float(final["val_loss"]) - float(pre_quant["val_loss"]), 8) |
There was a problem hiding this comment.
Compute quant delta from exact post-quant metric
parse_log derives quant_delta_bpb and quant_delta_val_loss from final_int8_zlib_roundtrip (4-decimal values) even when final_int8_zlib_roundtrip_exact is present and already parsed. This introduces rounding error in the comparison metric used for ranking/gating runs, which can flip decisions near tight thresholds (e.g. 0.003).
Useful? React with 👍 / 👎.
No description provided.