Hardest — vague multi-turn proactive search in the wild.
Verifiable — schema-free knowledge graph evaluation.
Long-horizon — persona-driven progressive disclosure.
Browse the full leaderboard and individual task trajectories at vibebench.github.io/VibeSearchBench.github.io.
Evaluation:
- Primary metric: Triplet F1. Predicted knowledge graphs are matched against ground truth via LLM-as-judge node alignment and triplet semantic equivalence.
- Frameworks: ReAct and OpenClaw, evaluated on VibeSearch-Pro and VibeSearch-Daily.
- Best reported score: 30.3 triplet F1 (Claude Opus 4.6, OpenClaw).
Explore: Leaderboard · Task trajectories · Paper
200 tasks across 2 subsets and 20 domains. Each task pairs a vague initial query with a ground-truth knowledge graph and a persona simulator.
| Split | Count | Description |
|---|---|---|
pro |
100 | Professional research — literature reviews, market analysis, technical due diligence |
daily |
100 | Daily-life search — shopping, travel, lifestyle with evolving preferences |
Real users rarely specify full intent upfront. VibeSearch captures bidirectional convergence: agents interleave partial results with follow-up questions while users progressively disclose needs. VibeSearchBench evaluates schema-free knowledge graphs via graph matching (Precision / Recall / F1).
Available on Hugging Face: VibeSearchBench/VibeSearchBench
https://vibebench.github.io/VibeSearchBench.github.io/
Static project website for VibeSearchBench. This repo is under the VibeBench org as a project site.
The Publish site to gh-pages workflow builds the site and pushes the gh-pages branch. Then:
- Open Settings → Pages: https://github.com/VibeBench/VibeSearchBench.github.io/settings/pages
- Build and deployment → Source → Deploy from a branch
- Branch
gh-pages, folder/ (root)→ Save - Wait 2–5 min → https://vibebench.github.io/VibeSearchBench.github.io/
If Actions cannot push, enable Settings → Actions → General → Workflow permissions → Read and write.
cd /path/to/VibeSearchBench
bash scripts/publish_github_io.shOr build only:
SITE_DIR=../VibeSearchBench.github.io bash scripts/build_website.sh
cd ../VibeSearchBench.github.io && git add -A && git commit -m "Update site" && git push- Pro source (jsonl):
data/trajs/pro/*.jsonl→ viewer JSON viascripts/convert_pro_trajs.py - Daily source (jsonl):
data/trajs/claude-opus-4.6_custom_serper_simulated/trajs_reextract/ - Viewer (json):
data/trajs/pro/(001.json…),data/trajs/daily/(task_*.json)
python3 scripts/convert_pro_trajs.py
python3 scripts/build_final_extractions.py
python3 scripts/build_tasks_index.py
python3 scripts/fetch_ground_truth.pyThen commit and push this repository.
VibeSearchBench · Rednote-Hilab & Unipat AI