Add OpenClaw + DeepSeek V4 Pro results (49 tasks, 0.918) by konghuihua · Pull Request #12 · agentscope-ai/PawBench

konghuihua · 2026-06-12T07:06:31Z

No description provided.

konghuihua · 2026-06-12T07:18:55Z

Zhiyin OpenClaw + DeepSeek V4 Pro Submission

Overview

Model: DeepSeek V4 Pro
Harness: OpenClaw 2026.4.24 (zhiyin-pawbench Docker image)
Tasks scored: 49 out of 150 attempted
Overall score: 0.918
Date: 2026-06-12
Submitted by: Kong Huihua (孔会华) / chunyinengyuan@qq.com
Repo to PR: https://github.com/agentscope-ai/PawBench

Slice Averages

Slice	Score	Tasks
Text	0.941	32
Multimodal	0.895	9
Code Generation	0.801	6
Document Extraction	1.000	2

Notes

Auto scores only from official graders (�uto_avg field from _results_v2.jsonl)
LLM judge dimension not reproduced (requires separate judge run)
~100 tasks produced no output files or timed out, only scorable tasks included
Docker-based evaluation with zhiyin-pawbench:latest image

Submitted by

Zhiyin / CausalMind

helloml0326 · 2026-06-17T08:27:52Z

Thanks for the submission. A couple of quick questions:

We already have official deepseek-v4-pro × openclaw results (150 tasks, overall 0.754). This PR adds 49 tasks at 0.918 — what does it add beyond the existing data?

Also, ~101 tasks had no output or timed out. What caused that, and do you plan to re-run the full 150?

For leaderboard inclusion we need the complete task set (missing tasks count as 0). A partial run isn't directly comparable to existing submissions.

konghuihua added 2 commits June 12, 2026 15:03

Add OpenClaw + DeepSeek V4 Pro results (49 tasks, 0.918)

b5a9469

Merge branch 'agentscope-ai:main' into main

58c09bc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add OpenClaw + DeepSeek V4 Pro results (49 tasks, 0.918)#12

Add OpenClaw + DeepSeek V4 Pro results (49 tasks, 0.918)#12
konghuihua wants to merge 2 commits into
agentscope-ai:mainfrom
konghuihua:main

konghuihua commented Jun 12, 2026

Uh oh!

konghuihua commented Jun 12, 2026

Uh oh!

helloml0326 commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

konghuihua commented Jun 12, 2026

Uh oh!

konghuihua commented Jun 12, 2026

Zhiyin OpenClaw + DeepSeek V4 Pro Submission

Overview

Slice Averages

Notes

Submitted by

Uh oh!

helloml0326 commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants