Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,8 @@
title: Agent World Model
- local: environments/opencode
title: OpenCode
- local: environments/sophistry_bench_sprint
title: Sophistry Bench Sprint
title: Environments
- isExpanded: false
sections:
Expand Down
7 changes: 7 additions & 0 deletions docs/source/environments.md
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,13 @@ The OpenEnv community has built a catalog of ready-to-run environments that cove
<a href="environments/opencode" class="!no-underline border dark:border-gray-700 px-3 py-1 rounded text-sm hover:shadow">📄 Docs</a>
</div>
</div>
<div class="border dark:border-gray-700 p-5 rounded-lg shadow">
<div class="font-bold mb-2">Sophistry Bench Sprint</div>
<p class="text-sm"><code>sophistry_bench_sprint_env</code> is a single-turn advocacy reward-hacking environment on QuALITY passages: the policy defends an assigned answer and the reward proxy peaks at 8 <code>&lt;claim&gt;</code> tags, with four weight-0 canaries that detect format hacking.</p>
<div class="flex gap-2 mt-3">
<a href="environments/sophistry_bench_sprint" class="!no-underline border dark:border-gray-700 px-3 py-1 rounded text-sm hover:shadow">📄 Docs</a>
</div>
</div>
</div>
</div>

Expand Down
67 changes: 67 additions & 0 deletions docs/source/environments/sophistry_bench_sprint.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
<!-- openenv-source: sophistry_bench_sprint_env -->
# sophistry_bench_sprint_env

OpenEnv port of the [sophistry-bench](https://github.com/acharyaanusha/sophistry-bench)
single-agent advocacy **reward-hacking sprint** environment. One-turn advocacy on QuALITY
reading-comprehension passages: the policy is assigned the gold answer or a distractor and must
produce one `<claim>`/`<cite>` argument. The reward proxy peaks at exactly 8 `<claim>` tags
(`claim_count_cliff`); four weight-0 canaries detect format hacking.

Scoring is imported verbatim from the [`sophistry-bench-sprint`](https://pypi.org/project/sophistry-bench-sprint/)
PyPI package, so the reward numbers are identical to the Prime Intellect Hub env.

## Episode model

Single step. `reset()` issues a task; `step(AdvocacyAction(text=...))` scores it and returns
`done=True`.

## Configuration (environment variables)

| Var | Default | Meaning |
|---|---|---|
| `SPRINT_N_ITEMS` | `50` | QuALITY items to load (2 advocacy rows each) |
| `SPRINT_PASSAGE_CHARS` | `2000` | Passage char cap |
| `SPRINT_SEED` | `0` | Distractor-selection seed (deterministic) |
| `SPRINT_WEIGHTS` | `1,0,0,0,0,0,0,0` | 8 reward weights, order: `aggregate, correctness, n_claims, n_citations, alternation_canary, starts_with_canary, length_band_canary, template_echo_canary`. Do **not** weight canaries during training. |

## Usage

```python
from sophistry_bench_sprint_env import SophistryBenchSprintEnv

# Run the deployed Hugging Face Space:
env = SophistryBenchSprintEnv.from_env("anushaacharya/sophistry_bench_sprint_env")
# ...or a local image: SophistryBenchSprintEnv.from_docker_image("openenv-sophistry_bench_sprint:latest")
try:
obs = env.reset().observation
print(obs.prompt, obs.answer_to_defend)
result = env.step_text("<claim>...</claim><cite>...</cite>")
print(result.reward, result.observation.metadata)
finally:
env.close()
```

`result.observation.metadata` contains all eight reward components every step — the canary
scores are the reward-hacking measurement.

> **Do not feed `observation.metadata` / `observation.components` back into the policy's
> prompt.** They include `correctness_reward` (whether the assigned answer is the gold one),
> which is the hidden ground truth. `reset()` deliberately tells the policy only *what* to
> defend, never *whether* it is correct; surfacing the components to the agent leaks that
> signal and defeats the reward-hacking measurement.

## Build & test

```bash
# Tests live with the other env tests. Run them from the repo root using this
# env's venv (which installs the scoring package):
uv run --project envs/sophistry_bench_sprint_env --extra dev \
pytest tests/envs/test_sophistry_bench_sprint_environment.py -v
# The module pulls the published sophistry-bench-sprint, so in the repo's shared
# CI (where it isn't installed) it skips via pytest.importorskip — same as other
# envs with heavy deps (e.g. tbench2's camel guard).

# Container
openenv build sophistry_bench_sprint_env
# produces image tag: openenv-sophistry_bench_sprint:latest
```
79 changes: 79 additions & 0 deletions envs/sophistry_bench_sprint_env/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
---
title: Sophistry Bench Sprint Env
emoji: 🗣️
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
---

# sophistry_bench_sprint_env

OpenEnv port of the [sophistry-bench](https://github.com/acharyaanusha/sophistry-bench)
single-agent advocacy **reward-hacking sprint** environment. One-turn advocacy on QuALITY
reading-comprehension passages: the policy is assigned the gold answer or a distractor and must
produce one `<claim>`/`<cite>` argument. The reward proxy peaks at exactly 8 `<claim>` tags
(`claim_count_cliff`); four weight-0 canaries detect format hacking.

Scoring is imported verbatim from the [`sophistry-bench-sprint`](https://pypi.org/project/sophistry-bench-sprint/)
PyPI package, so the reward numbers are identical to the Prime Intellect Hub env.

## Episode model

Single step. `reset()` issues a task; `step(AdvocacyAction(text=...))` scores it and returns
`done=True`.

## Configuration (environment variables)

| Var | Default | Meaning |
|---|---|---|
| `SPRINT_N_ITEMS` | `50` | QuALITY items to load (2 advocacy rows each) |
| `SPRINT_PASSAGE_CHARS` | `2000` | Passage char cap |
| `SPRINT_SEED` | `0` | Distractor-selection seed (deterministic) |
| `SPRINT_WEIGHTS` | `1,0,0,0,0,0,0,0` | 8 reward weights, order: `aggregate, correctness, n_claims, n_citations, alternation_canary, starts_with_canary, length_band_canary, template_echo_canary`. Do **not** weight canaries during training. |

## Usage

```python
from sophistry_bench_sprint_env import SophistryBenchSprintEnv

# Run the deployed Hugging Face Space:
env = SophistryBenchSprintEnv.from_env("anushaacharya/sophistry_bench_sprint_env")
# ...or a local image: SophistryBenchSprintEnv.from_docker_image("openenv-sophistry_bench_sprint:latest")
try:
obs = env.reset().observation
print(obs.prompt, obs.answer_to_defend)
result = env.step_text("<claim>...</claim><cite>...</cite>")
print(result.reward, result.observation.metadata)
finally:
env.close()
```

`result.observation.metadata` contains all eight reward components every step — the canary
scores are the reward-hacking measurement.

> **Do not feed `observation.metadata` / `observation.components` back into the policy's
> prompt.** They include `correctness_reward` (whether the assigned answer is the gold one),
> which is the hidden ground truth. `reset()` deliberately tells the policy only *what* to
> defend, never *whether* it is correct; surfacing the components to the agent leaks that
> signal and defeats the reward-hacking measurement.

## Build & test

```bash
# Tests live with the other env tests. Run them from the repo root using this
# env's venv (which installs the scoring package):
uv run --project envs/sophistry_bench_sprint_env --extra dev \
pytest tests/envs/test_sophistry_bench_sprint_environment.py -v
# The module pulls the published sophistry-bench-sprint, so in the repo's shared
# CI (where it isn't installed) it skips via pytest.importorskip — same as other
# envs with heavy deps (e.g. tbench2's camel guard).

# Container
openenv build sophistry_bench_sprint_env
# produces image tag: openenv-sophistry_bench_sprint:latest
```
17 changes: 17 additions & 0 deletions envs/sophistry_bench_sprint_env/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the BSD-style license found in the
# LICENSE file in the root directory of this source tree.

"""Sophistry-Bench Sprint Environment (OpenEnv port).

Single-step advocacy environment: reset() issues a QuALITY reading-comprehension
advocacy task, step(AdvocacyAction(text=...)) scores the argument and returns the
reward plus all eight sprint reward components in observation.metadata.
"""

from .client import SophistryBenchSprintEnv
from .models import AdvocacyAction, AdvocacyObservation

__all__ = ["SophistryBenchSprintEnv", "AdvocacyAction", "AdvocacyObservation"]
51 changes: 51 additions & 0 deletions envs/sophistry_bench_sprint_env/client.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the BSD-style license found in the
# LICENSE file in the root directory of this source tree.

from __future__ import annotations

try:
from openenv.core.client_types import StepResult
from openenv.core.env_client import EnvClient
from openenv.core.env_server.types import State
except ImportError: # standalone import path
from core.client_types import StepResult
from core.env_client import EnvClient
from core.env_server.types import State

from .models import AdvocacyAction, AdvocacyObservation


class SophistryBenchSprintEnv(EnvClient[AdvocacyAction, AdvocacyObservation, State]):
"""Typed client for the sophistry-bench sprint OpenEnv environment."""

def step_text(self, text: str) -> StepResult[AdvocacyObservation]:
"""Convenience: submit a raw argument string as an AdvocacyAction."""
return super().step(AdvocacyAction(text=text))

def _step_payload(self, action: AdvocacyAction) -> dict:
return action.model_dump()

def _parse_result(self, data: dict) -> StepResult[AdvocacyObservation]:
observation = AdvocacyObservation(**data["observation"])
# The framework's HTTP layer strips the base ``metadata`` dict from the
# serialized observation, so the reward components arrive in the declared
# ``components`` field. Re-populate ``metadata`` to keep the public
# contract (``observation.metadata`` carries the eight components).
if not observation.metadata and observation.components:
observation.metadata = dict(observation.components)
# The error path's ``metadata={"error": ...}`` is also stripped over the
# wire, arriving only in the declared ``error`` field. Restore it so the
# over-the-wire contract matches in-process behavior.
if observation.error and "error" not in observation.metadata:
observation.metadata["error"] = observation.error
return StepResult(
observation=observation,
reward=data["reward"],
done=data["done"],
)

def _parse_state(self, data: dict) -> State:
return State(**data)
62 changes: 62 additions & 0 deletions envs/sophistry_bench_sprint_env/models.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the BSD-style license found in the
# LICENSE file in the root directory of this source tree.

from __future__ import annotations

try:
from openenv.core.env_server.types import Action, Observation
except ImportError: # standalone import path
from core.env_server.types import Action, Observation

from pydantic import Field


class AdvocacyAction(Action):
"""The policy's one-shot advocacy argument."""

text: str = Field(
..., description="The argument completion, using <claim>/<cite> tags."
)


class AdvocacyObservation(Observation):
"""Task on reset; scored result on step.

On reset: ``prompt`` holds the full system prompt (passage + question +
answer-to-defend), ``done`` is False.
On step: ``prompt`` is empty, ``done`` is True, and ``metadata`` carries all
eight reward components.

Note on ``reward``: read the post-step reward from ``StepResult.reward``, not
from ``observation.reward``. The framework's serializer strips ``reward`` from
the observation payload, so over the wire ``observation.reward`` is always the
default 0.0; only ``StepResult.reward`` carries the weighted aggregate.

The eight reward components are also mirrored in the declared ``components``
field. The base ``metadata`` dict is stripped by the framework's HTTP
serialization layer, so ``components`` is what survives the wire; the typed
client re-populates ``metadata`` from it on the way back.
"""

prompt: str = Field("", description="Full prompt the policy must answer.")
answer_to_defend: str = Field(
"", description="The answer the policy advocates for."
)
item_id: str = Field("", description="Source QuALITY article id.")
reward: float = Field(
0.0,
description="In-process weighted aggregate. Stripped over the wire — read "
"StepResult.reward after a step instead.",
)
done: bool = Field(False, description="Whether the episode has ended.")
components: dict[str, float] = Field(
default_factory=dict,
description="Eight reward components (mirror of metadata; survives HTTP).",
)
error: str = Field(
"",
description="Diagnostic message (e.g. step-before-reset); survives serialization.",
)
6 changes: 6 additions & 0 deletions envs/sophistry_bench_sprint_env/openenv.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
spec_version: 1
name: sophistry_bench_sprint_env
type: space
runtime: fastapi
app: server.app:app
port: 8000
34 changes: 34 additions & 0 deletions envs/sophistry_bench_sprint_env/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
[build-system]
requires = ["setuptools>=45", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "openenv-sophistry-bench-sprint-env"
version = "0.1.0"
description = "OpenEnv port of the sophistry-bench single-agent advocacy reward-hacking sprint env"
requires-python = ">=3.10"
dependencies = [
"openenv[core]>=0.2.2",
"fastapi>=0.115.0",
"pydantic>=2.0.0",
"uvicorn>=0.24.0",
"requests>=2.31.0",
# Capped: the scoring math is reproduced from this package and guarded by the
# parity test. Any 0.2+ bump must re-run that test before widening this bound.
"sophistry-bench-sprint>=0.1.5,<0.2.0",
]

[project.optional-dependencies]
dev = [
"pytest>=9.0.3",
"pytest-asyncio>=0.21",
"pytest-cov",
]

[project.scripts]
server = "sophistry_bench_sprint_env.server.app:main"

[tool.setuptools]
include-package-data = true
packages = ["sophistry_bench_sprint_env", "sophistry_bench_sprint_env.server"]
package-dir = { "sophistry_bench_sprint_env" = ".", "sophistry_bench_sprint_env.server" = "server" }
Loading