Code that accompanies the paper release for "LLMs Corrupt Your Documents When You Delegate"
-
Updated
Apr 20, 2026 - Python
Code that accompanies the paper release for "LLMs Corrupt Your Documents When You Delegate"
Fixing GRPO training collapse in long-horizon multi-tool agents. A lightweight PRM-Lite + LATA joint approach achieves +37% over vanilla GRPO on τ-bench airline (50-task, multi-turn).
🧠 Awesome Memory-VLA: A curated list of Visual-Language-Action models with memory
SWE-Marathon: an ultra long-horizon SWE benchmark
Code for Scalable Offline Model-Based RL with Action chunking
Code for Tackling Long-Horizon Tasks with Model-based Offline Reinforcement Learning
The simplest way to build long-horizon environments
A family of long-horizon software-engineering environments for OpenEnv, adapted from https://github.com/Proximal-Labs/frontier-swe
VLM-RL Hierarchical Loco-Manupilation For Long-Horizon Tasks With G1 robot in Isaac Lab/Sim
OpenClaw humanity infusions OtherPowers Creative Intelligence Agency. 🦞
A real-world inspired environment for selective context retention under noise. It evaluates an LLM's ability to manage a fixed-capacity memory buffer, retaining high-value information while filtering out distractors
Dashboard for real-world inspired environment for selective context retention under noise. It evaluates an LLM's ability to manage a fixed-capacity memory buffer, retaining high-value information while filtering out distractors
TPipe is the agent operating environment for deterministic, multimodal AI systems. Built Kotlin-first, it composes runtime substrates into governed pipelines with rich tracing, disciplined context and token control, native function binding, and provider-agnostic execution for long-running, headless agents.
Long-horizon agent execution harness — reliable autonomous runs for Claude Code, Codex, OpenHands, and custom agents. Goal graphs, spin detection, HITL gates, fork/merge, 8 strategies, 6 validators.
bstack P12 — Persistent Loop Discipline. Cross-context restart loop with state in the filesystem. Closes the long-horizon context-rot failure mode. Composes with bstack P5/P6/P7/P10/P11.
Official repository for the paper: Can AI Agents Synthesize Scientific Conclusions?
Add a description, image, and links to the long-horizon topic page so that developers can more easily learn about it.
To associate your repository with the long-horizon topic, visit your repo's landing page and select "manage topics."