Life-Harness

Adapting the interface, not the model, for deterministic LLM agents

News

2026/05/24: Released the paper and codebase. The second version of the paper has also been submitted to arXiv, and the code release includes the evolution prompts used to build the harness.

Life-Harness is the code release for "Adapting the Interface, Not the Model: Runtime Harness Adaptation for Deterministic LLM Agents." It targets a practical question: when a frozen LLM agent repeatedly fails in a deterministic environment, can we improve the runtime harness around the agent instead of retraining the model or modifying the environment?

The answer is yes. Life-Harness turns recurring failures into reusable runtime interventions across action realization, environment contracts, trajectory regulation, and procedural skills. The model remains frozen; the benchmark environment remains intact; only the harness interface adapts.

Benchmarks	Model backbones	Settings improved	Avg. relative gain	Training-free
7	18	116 / 126	88.5%	Yes

Why Life-Harness

What changes?	What stays fixed?	Why it matters
Runtime harness behavior	LLM weights	No finetuning or model-specific training pipeline
Prompted environment interface	Benchmark environment	Keeps deterministic evaluation comparable

Results

Across 7 deterministic agent benchmarks and 18 model backbones, Life-Harness improves 116 / 126 model-environment settings, with an 88.5% average relative improvement reported in the paper.

Method

Life-Harness evolves a small set of runtime layers from observed failures, then reuses those layers during evaluation.

Harness flag	Paper layer	Runtime role
`h2`	Action Realization Layer	Helps convert model decisions into executable environment actions.
`h3`	Environment Contract Layer	Makes task and environment constraints explicit at runtime.
`h4`	Trajectory Regulation Layer	Regulates multi-step interaction traces to avoid repeated failure patterns.
`h5`	Procedural Skill Layer	Reuses procedural knowledge distilled from recurring successful recoveries.

When the harness is disabled, these layers are not applied.

Benchmarks

This repository keeps the two benchmark families in separate folders because their environments and dependencies are intentionally different.

Suite	Environments	Start here
AgentBench-style harness	ALFWorld, DBBench, OS, WebShop	AgentBench/README.md
tau-bench-style harness	Airline, Retail, Telecom	TauBench/README.md

Life-harness/
  AgentBench/      # Docker-based AgentBench-style tasks
  TauBench/        # uv-based tau-bench-style tasks
  assets/          # README figures

Quick Start

Clone the repository, then enter the benchmark suite you want to run:

cd Life-harness

# tau-bench-style tasks: Airline, Retail, Telecom
cd TauBench

# AgentBench-style tasks: ALFWorld, DBBench, OS, WebShop
cd ../AgentBench

Each subfolder README contains its own environment setup, evaluation commands, and harness switches. API keys and provider URLs should be configured locally through environment variables or .env files; do not commit them.

Star History

Citation

If you use this repository, please cite the paper:

@article{xu2026adapting,
  title={Adapting the Interface, Not the Model: Runtime Harness Adaptation for Deterministic LLM Agents},
  author={Xu, Tianshi and Wen, Huifeng and Li, Meng},
  journal={arXiv preprint arXiv:2605.22166},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
AgentBench		AgentBench
TauBench		TauBench
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Life-Harness

Adapting the interface, not the model, for deterministic LLM agents

News

Why Life-Harness

Results

Method

Benchmarks

Quick Start

Star History

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Life-Harness

Adapting the interface, not the model, for deterministic LLM agents

News

Why Life-Harness

Results

Method

Benchmarks

Quick Start

Star History

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages