Skip to content

Buffbench with only hard tasks across 4 eval sets #3751

Buffbench with only hard tasks across 4 eval sets

Buffbench with only hard tasks across 4 eval sets #3751

Triggered via push December 8, 2025 02:08
Status Success
Total duration 8s
Artifacts

evals.yml

on: push
run-evals
5s
run-evals
Fit to window
Zoom out
Zoom in