From 11c76e8ae45b0d0a8f265c0a1822e152f037db50 Mon Sep 17 00:00:00 2001 From: Artem Zhuravel Date: Sat, 11 Apr 2026 17:14:14 +0530 Subject: [PATCH] Revise test results for new models Updated the weighted mean scores for newly added tests in the results. --- experiments/kdd 2026/new_tests_results.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/experiments/kdd 2026/new_tests_results.md b/experiments/kdd 2026/new_tests_results.md index da7a0d1..6ebf840 100644 --- a/experiments/kdd 2026/new_tests_results.md +++ b/experiments/kdd 2026/new_tests_results.md @@ -3,7 +3,7 @@ Assertion-weighted mean scores (0-100) on the **36 newly added tests** only: 17 Box, 5 Google Calendar, 7 Linear, and 6 Slack. All runs included API documentation. | Model | Weighted Mean -|---|---|---| +|---|---| | openai/gpt-5 | 88.10 | openai/gpt-5-mini | 87.61 | deepseek/deepseek-v3.2 | 84.26