huggingface · acharyaanusha · Jun 11, 2026 · Jun 11, 2026 · Jun 11, 2026 · Jun 11, 2026
diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
@@ -125,6 +125,8 @@
     title: Agent World Model
   - local: environments/opencode
     title: OpenCode
+  - local: environments/sophistry_bench_sprint
+    title: Sophistry Bench Sprint
   title: Environments
 - isExpanded: false
   sections:

diff --git a/docs/source/environments.md b/docs/source/environments.md
@@ -258,6 +258,13 @@ The OpenEnv community has built a catalog of ready-to-run environments that cove
         <a href="environments/opencode" class="!no-underline border dark:border-gray-700 px-3 py-1 rounded text-sm hover:shadow">📄 Docs</a>
       </div>
     </div>
+    <div class="border dark:border-gray-700 p-5 rounded-lg shadow">
+      <div class="font-bold mb-2">Sophistry Bench Sprint</div>
+      <p class="text-sm"><code>sophistry_bench_sprint_env</code> is a single-turn advocacy reward-hacking environment on QuALITY passages: the policy defends an assigned answer and the reward proxy peaks at 8 <code>&lt;claim&gt;</code> tags, with four weight-0 canaries that detect format hacking.</p>
+      <div class="flex gap-2 mt-3">
+        <a href="environments/sophistry_bench_sprint" class="!no-underline border dark:border-gray-700 px-3 py-1 rounded text-sm hover:shadow">📄 Docs</a>
+      </div>
+    </div>
   </div>
 </div>
 

diff --git a/docs/source/environments/sophistry_bench_sprint.md b/docs/source/environments/sophistry_bench_sprint.md
@@ -0,0 +1,67 @@
+<!-- openenv-source: sophistry_bench_sprint_env -->
+# sophistry_bench_sprint_env
+
+OpenEnv port of the [sophistry-bench](https://github.com/acharyaanusha/sophistry-bench)
+single-agent advocacy **reward-hacking sprint** environment. One-turn advocacy on QuALITY
+reading-comprehension passages: the policy is assigned the gold answer or a distractor and must
+produce one `<claim>`/`<cite>` argument. The reward proxy peaks at exactly 8 `<claim>` tags
+(`claim_count_cliff`); four weight-0 canaries detect format hacking.
+
+Scoring is imported verbatim from the [`sophistry-bench-sprint`](https://pypi.org/project/sophistry-bench-sprint/)
+PyPI package, so the reward numbers are identical to the Prime Intellect Hub env.
+
+## Episode model
+
+Single step. `reset()` issues a task; `step(AdvocacyAction(text=...))` scores it and returns
+`done=True`.
+
+## Configuration (environment variables)
+
+| Var | Default | Meaning |
+|---|---|---|
+| `SPRINT_N_ITEMS` | `50` | QuALITY items to load (2 advocacy rows each) |
+| `SPRINT_PASSAGE_CHARS` | `2000` | Passage char cap |
+| `SPRINT_SEED` | `0` | Distractor-selection seed (deterministic) |
+| `SPRINT_WEIGHTS` | `1,0,0,0,0,0,0,0` | 8 reward weights, order: `aggregate, correctness, n_claims, n_citations, alternation_canary, starts_with_canary, length_band_canary, template_echo_canary`. Do **not** weight canaries during training. |
+
+## Usage
+
+```python
+from sophistry_bench_sprint_env import SophistryBenchSprintEnv
+
+# Run the deployed Hugging Face Space:
+env = SophistryBenchSprintEnv.from_env("anushaacharya/sophistry_bench_sprint_env")
+# ...or a local image: SophistryBenchSprintEnv.from_docker_image("openenv-sophistry_bench_sprint:latest")
+try:
+    obs = env.reset().observation
+    print(obs.prompt, obs.answer_to_defend)
+    result = env.step_text("<claim>...</claim><cite>...</cite>")
+    print(result.reward, result.observation.metadata)
+finally:
+    env.close()
+```
+
+`result.observation.metadata` contains all eight reward components every step — the canary
+scores are the reward-hacking measurement.
+
+> **Do not feed `observation.metadata` / `observation.components` back into the policy's
+> prompt.** They include `correctness_reward` (whether the assigned answer is the gold one),
+> which is the hidden ground truth. `reset()` deliberately tells the policy only *what* to
+> defend, never *whether* it is correct; surfacing the components to the agent leaks that
+> signal and defeats the reward-hacking measurement.
+
+## Build & test
+
+```bash
+# Tests live with the other env tests. Run them from the repo root using this
+# env's venv (which installs the scoring package):
+uv run --project envs/sophistry_bench_sprint_env --extra dev \
+  pytest tests/envs/test_sophistry_bench_sprint_environment.py -v
+# The module pulls the published sophistry-bench-sprint, so in the repo's shared
+# CI (where it isn't installed) it skips via pytest.importorskip — same as other
+# envs with heavy deps (e.g. tbench2's camel guard).
+
+# Container
+openenv build sophistry_bench_sprint_env
+# produces image tag: openenv-sophistry_bench_sprint:latest
+```
diff --git a/envs/sophistry_bench_sprint_env/README.md b/envs/sophistry_bench_sprint_env/README.md
@@ -0,0 +1,79 @@
+---
+title: Sophistry Bench Sprint Env
+emoji: 🗣️
+colorFrom: blue
+colorTo: indigo
+sdk: docker
+pinned: false
+app_port: 8000
+base_path: /web
+tags:
+  - openenv
+---
+
+# sophistry_bench_sprint_env
+
+OpenEnv port of the [sophistry-bench](https://github.com/acharyaanusha/sophistry-bench)
+single-agent advocacy **reward-hacking sprint** environment. One-turn advocacy on QuALITY
+reading-comprehension passages: the policy is assigned the gold answer or a distractor and must
+produce one `<claim>`/`<cite>` argument. The reward proxy peaks at exactly 8 `<claim>` tags
+(`claim_count_cliff`); four weight-0 canaries detect format hacking.
+
+Scoring is imported verbatim from the [`sophistry-bench-sprint`](https://pypi.org/project/sophistry-bench-sprint/)
+PyPI package, so the reward numbers are identical to the Prime Intellect Hub env.
+
+## Episode model
+
+Single step. `reset()` issues a task; `step(AdvocacyAction(text=...))` scores it and returns
+`done=True`.
+
+## Configuration (environment variables)
+
+| Var | Default | Meaning |
+|---|---|---|
+| `SPRINT_N_ITEMS` | `50` | QuALITY items to load (2 advocacy rows each) |
+| `SPRINT_PASSAGE_CHARS` | `2000` | Passage char cap |
+| `SPRINT_SEED` | `0` | Distractor-selection seed (deterministic) |
+| `SPRINT_WEIGHTS` | `1,0,0,0,0,0,0,0` | 8 reward weights, order: `aggregate, correctness, n_claims, n_citations, alternation_canary, starts_with_canary, length_band_canary, template_echo_canary`. Do **not** weight canaries during training. |
+
+## Usage
+
+```python
+from sophistry_bench_sprint_env import SophistryBenchSprintEnv
+
+# Run the deployed Hugging Face Space:
+env = SophistryBenchSprintEnv.from_env("anushaacharya/sophistry_bench_sprint_env")
+# ...or a local image: SophistryBenchSprintEnv.from_docker_image("openenv-sophistry_bench_sprint:latest")
+try:
+    obs = env.reset().observation
+    print(obs.prompt, obs.answer_to_defend)
+    result = env.step_text("<claim>...</claim><cite>...</cite>")
+    print(result.reward, result.observation.metadata)
+finally:
+    env.close()
+```
+
+`result.observation.metadata` contains all eight reward components every step — the canary
+scores are the reward-hacking measurement.
+
+> **Do not feed `observation.metadata` / `observation.components` back into the policy's
+> prompt.** They include `correctness_reward` (whether the assigned answer is the gold one),
+> which is the hidden ground truth. `reset()` deliberately tells the policy only *what* to
+> defend, never *whether* it is correct; surfacing the components to the agent leaks that
+> signal and defeats the reward-hacking measurement.
+
+## Build & test
+
+```bash
+# Tests live with the other env tests. Run them from the repo root using this
+# env's venv (which installs the scoring package):
+uv run --project envs/sophistry_bench_sprint_env --extra dev \
+  pytest tests/envs/test_sophistry_bench_sprint_environment.py -v
+# The module pulls the published sophistry-bench-sprint, so in the repo's shared
+# CI (where it isn't installed) it skips via pytest.importorskip — same as other
+# envs with heavy deps (e.g. tbench2's camel guard).
+
+# Container
+openenv build sophistry_bench_sprint_env
+# produces image tag: openenv-sophistry_bench_sprint:latest
+```
diff --git a/envs/sophistry_bench_sprint_env/__init__.py b/envs/sophistry_bench_sprint_env/__init__.py
@@ -0,0 +1,17 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""Sophistry-Bench Sprint Environment (OpenEnv port).
+
+Single-step advocacy environment: reset() issues a QuALITY reading-comprehension
+advocacy task, step(AdvocacyAction(text=...)) scores the argument and returns the
+reward plus all eight sprint reward components in observation.metadata.
+"""
+
+from .client import SophistryBenchSprintEnv
+from .models import AdvocacyAction, AdvocacyObservation
+
+__all__ = ["SophistryBenchSprintEnv", "AdvocacyAction", "AdvocacyObservation"]
diff --git a/envs/sophistry_bench_sprint_env/client.py b/envs/sophistry_bench_sprint_env/client.py
@@ -0,0 +1,51 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+
+from __future__ import annotations
+
+try:
+    from openenv.core.client_types import StepResult
+    from openenv.core.env_client import EnvClient
+    from openenv.core.env_server.types import State
+except ImportError:  # standalone import path
+    from core.client_types import StepResult
+    from core.env_client import EnvClient
+    from core.env_server.types import State
+
+from .models import AdvocacyAction, AdvocacyObservation
+
+
+class SophistryBenchSprintEnv(EnvClient[AdvocacyAction, AdvocacyObservation, State]):
+    """Typed client for the sophistry-bench sprint OpenEnv environment."""
+
+    def step_text(self, text: str) -> StepResult[AdvocacyObservation]:
+        """Convenience: submit a raw argument string as an AdvocacyAction."""
+        return super().step(AdvocacyAction(text=text))
+
+    def _step_payload(self, action: AdvocacyAction) -> dict:
+        return action.model_dump()
+
+    def _parse_result(self, data: dict) -> StepResult[AdvocacyObservation]:
+        observation = AdvocacyObservation(**data["observation"])
+        # The framework's HTTP layer strips the base ``metadata`` dict from the
+        # serialized observation, so the reward components arrive in the declared
+        # ``components`` field. Re-populate ``metadata`` to keep the public
+        # contract (``observation.metadata`` carries the eight components).
+        if not observation.metadata and observation.components:
+            observation.metadata = dict(observation.components)
+        # The error path's ``metadata={"error": ...}`` is also stripped over the
+        # wire, arriving only in the declared ``error`` field. Restore it so the
+        # over-the-wire contract matches in-process behavior.
+        if observation.error and "error" not in observation.metadata:
+            observation.metadata["error"] = observation.error
+        return StepResult(
+            observation=observation,
+            reward=data["reward"],
+            done=data["done"],
+        )
+
+    def _parse_state(self, data: dict) -> State:
+        return State(**data)
diff --git a/envs/sophistry_bench_sprint_env/models.py b/envs/sophistry_bench_sprint_env/models.py
@@ -0,0 +1,62 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+
+from __future__ import annotations
+
+try:
+    from openenv.core.env_server.types import Action, Observation
+except ImportError:  # standalone import path
+    from core.env_server.types import Action, Observation
+
+from pydantic import Field
+
+
+class AdvocacyAction(Action):
+    """The policy's one-shot advocacy argument."""
+
+    text: str = Field(
+        ..., description="The argument completion, using <claim>/<cite> tags."
+    )
+
+
+class AdvocacyObservation(Observation):
+    """Task on reset; scored result on step.
+
+    On reset: ``prompt`` holds the full system prompt (passage + question +
+    answer-to-defend), ``done`` is False.
+    On step: ``prompt`` is empty, ``done`` is True, and ``metadata`` carries all
+    eight reward components.
+
+    Note on ``reward``: read the post-step reward from ``StepResult.reward``, not
+    from ``observation.reward``. The framework's serializer strips ``reward`` from
+    the observation payload, so over the wire ``observation.reward`` is always the
+    default 0.0; only ``StepResult.reward`` carries the weighted aggregate.
+
+    The eight reward components are also mirrored in the declared ``components``
+    field. The base ``metadata`` dict is stripped by the framework's HTTP
+    serialization layer, so ``components`` is what survives the wire; the typed
+    client re-populates ``metadata`` from it on the way back.
+    """
+
+    prompt: str = Field("", description="Full prompt the policy must answer.")
+    answer_to_defend: str = Field(
+        "", description="The answer the policy advocates for."
+    )
+    item_id: str = Field("", description="Source QuALITY article id.")
+    reward: float = Field(
+        0.0,
+        description="In-process weighted aggregate. Stripped over the wire — read "
+        "StepResult.reward after a step instead.",
+    )
+    done: bool = Field(False, description="Whether the episode has ended.")
+    components: dict[str, float] = Field(
+        default_factory=dict,
+        description="Eight reward components (mirror of metadata; survives HTTP).",
+    )
+    error: str = Field(
+        "",
+        description="Diagnostic message (e.g. step-before-reset); survives serialization.",
+    )
diff --git a/envs/sophistry_bench_sprint_env/openenv.yaml b/envs/sophistry_bench_sprint_env/openenv.yaml
@@ -0,0 +1,6 @@
+spec_version: 1
+name: sophistry_bench_sprint_env
+type: space
+runtime: fastapi
+app: server.app:app
+port: 8000
diff --git a/envs/sophistry_bench_sprint_env/pyproject.toml b/envs/sophistry_bench_sprint_env/pyproject.toml
@@ -0,0 +1,34 @@
+[build-system]
+requires = ["setuptools>=45", "wheel"]
+build-backend = "setuptools.build_meta"
+
+[project]
+name = "openenv-sophistry-bench-sprint-env"
+version = "0.1.0"
+description = "OpenEnv port of the sophistry-bench single-agent advocacy reward-hacking sprint env"
+requires-python = ">=3.10"
+dependencies = [
+    "openenv[core]>=0.2.2",
+    "fastapi>=0.115.0",
+    "pydantic>=2.0.0",
+    "uvicorn>=0.24.0",
+    "requests>=2.31.0",
+    # Capped: the scoring math is reproduced from this package and guarded by the
+    # parity test. Any 0.2+ bump must re-run that test before widening this bound.
+    "sophistry-bench-sprint>=0.1.5,<0.2.0",
+]
+
+[project.optional-dependencies]
+dev = [
+    "pytest>=9.0.3",
+    "pytest-asyncio>=0.21",
+    "pytest-cov",
+]
+
+[project.scripts]
+server = "sophistry_bench_sprint_env.server.app:main"
+
+[tool.setuptools]
+include-package-data = true
+packages = ["sophistry_bench_sprint_env", "sophistry_bench_sprint_env.server"]
+package-dir = { "sophistry_bench_sprint_env" = ".", "sophistry_bench_sprint_env.server" = "server" }