Fixing GRPO training collapse in long-horizon multi-tool agents. A lightweight PRM-Lite + LATA joint approach achieves +37% over vanilla GRPO on τ-bench airline (50-task, multi-turn).
-
Updated
May 11, 2026 - Python
Fixing GRPO training collapse in long-horizon multi-tool agents. A lightweight PRM-Lite + LATA joint approach achieves +37% over vanilla GRPO on τ-bench airline (50-task, multi-turn).
Open harness for running, measuring, and visualizing agent benchmarks. Adapters for AutomationBench, τ-bench, LeRobot, WorkArena.
Retail LLM agent optimization and evaluation showcase built on top of tau2-bench, focused on execution-chain improvements, route comparison, and reproducible benchmark demos.
Add a description, image, and links to the tau-bench topic page so that developers can more easily learn about it.
To associate your repository with the tau-bench topic, visit your repo's landing page and select "manage topics."