Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion _data/blogs.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
- link: /blog/agents-last-exam
img: /blog/agents-last-exam/main-results.png
img: /blog/agents-last-exam/cover.png
title: "Agents' Last Exam"
date: "2026-06-15"
description: "A rolling benchmark measuring whether AI agents can perform economically valuable work across 55 occupations. We evaluated Fable 5, GPT-5.5, Composer 2.5, and other frontier systems on 1,500+ expert-sourced tasks. Today's agents solve a meaningful fraction of professional work, but on ALE's hardest tier every frontier agent we tested, including Fable 5, scored 0%."
Expand Down
Binary file added blog/agents-last-exam/cover.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions blog/agents-last-exam/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ June 2026
Also on <a href="https://www.linkedin.com/pulse/introducing-agents-last-exam-ale-new-standard-evaluating-dawn-song-dntuc/" target="_blank">LinkedIn</a> and <a href="https://x.com/dawnsongtweets/status/2065095757988868190" target="_blank">X</a>
</div>

<img src="cover.png" alt="Agents' Last Exam: a benchmark spanning 55 occupations, with real-world agent pipelines across Manufacturing and Game Development" class="content-image" style="width: 100%; max-width: 100%; padding: 10px;">


*Everyone says the latest AI agents will be "job-ready" soon, especially after the release of Fable 5 last week. But is that really the case?*

Expand Down