diff --git a/_data/blogs.yml b/_data/blogs.yml index 51fcc20..da5d081 100644 --- a/_data/blogs.yml +++ b/_data/blogs.yml @@ -1,5 +1,5 @@ - link: /blog/agents-last-exam - img: /blog/agents-last-exam/main-results.png + img: /blog/agents-last-exam/cover.png title: "Agents' Last Exam" date: "2026-06-15" description: "A rolling benchmark measuring whether AI agents can perform economically valuable work across 55 occupations. We evaluated Fable 5, GPT-5.5, Composer 2.5, and other frontier systems on 1,500+ expert-sourced tasks. Today's agents solve a meaningful fraction of professional work, but on ALE's hardest tier every frontier agent we tested, including Fable 5, scored 0%." diff --git a/blog/agents-last-exam/cover.png b/blog/agents-last-exam/cover.png new file mode 100644 index 0000000..09ecc0b Binary files /dev/null and b/blog/agents-last-exam/cover.png differ diff --git a/blog/agents-last-exam/index.md b/blog/agents-last-exam/index.md index c050104..c6670f9 100644 --- a/blog/agents-last-exam/index.md +++ b/blog/agents-last-exam/index.md @@ -24,6 +24,8 @@ June 2026 Also on LinkedIn and X +Agents' Last Exam: a benchmark spanning 55 occupations, with real-world agent pipelines across Manufacturing and Game Development + *Everyone says the latest AI agents will be "job-ready" soon, especially after the release of Fable 5 last week. But is that really the case?*