I see a lot of exciting updates on X https://x.com/METR_Evals/status/1950740117020389870
and on the top chart here: https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
Would love to see all those juicy details in data/external/all_runs.jsonl!
Thanks for this incredibly important project.
I see a lot of exciting updates on X https://x.com/METR_Evals/status/1950740117020389870
and on the top chart here: https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
Would love to see all those juicy details in
data/external/all_runs.jsonl!Thanks for this incredibly important project.