From 2ef60ddfe5edc0b0e9a80cf8d86e126fda16708b Mon Sep 17 00:00:00 2001 From: burtenshaw Date: Thu, 21 May 2026 15:27:35 +0200 Subject: [PATCH] docs: add hf rl environment datasets rfc --- rfcs/006-hf-rl-environment-datasets.md | 434 +++++++++++++++++++++++++ rfcs/README.md | 3 + 2 files changed, 437 insertions(+) create mode 100644 rfcs/006-hf-rl-environment-datasets.md diff --git a/rfcs/006-hf-rl-environment-datasets.md b/rfcs/006-hf-rl-environment-datasets.md new file mode 100644 index 000000000..2901875e9 --- /dev/null +++ b/rfcs/006-hf-rl-environment-datasets.md @@ -0,0 +1,434 @@ +# RFC 006: Hugging Face RL Environment Datasets + +**Status**: In Review +**Created**: 2026-05-21 +**Authors**: @ben +**RFC ID**: 006 + +## Summary + +This RFC proposes that OpenEnv task sets should be represented as ordinary Hugging Face dataset repositories with a small `environment.yaml` declaration at the dataset repo root. A task set is not a new OpenEnv manifest format and not a new Hub repo type. It is a named RL environment declaration inside a dataset repo, backed by normal dataset files, Dataset Viewer configs, and optional framework-specific runtime metadata. + +OpenEnv should consume these declarations through `AutoEnv`. When `AutoEnv` is passed a dataset environment reference such as `hf://datasets/org/repo/env-id@revision`, it should return a dataset-bound environment client. The returned environment has the loaded dataset attached as `env.dataset`, and dataset rows are fed to the runtime automatically through a default row cursor. Users should not need to call a separate task-set resolver in normal evaluation or collection code. + +## Motivation + +### Problem Statement + +OpenEnv currently packages runtimes as Spaces or containers, while tasks are either generated inside an environment, embedded in environment-specific code, or supplied manually by harnesses. RFC 001 already describes tasks as reset-time inputs and separates datasets from the environment runtime, but there is no implemented discovery format for task sets. + +The broader ecosystem has converged on dataset-hosted environment artifacts: + +- Harbor datasets package task directories and environment requirements. +- Verifiers tasksets load rows from Hugging Face datasets and bind them to a harness. +- OpenEnv environments are deployable runtimes, usually as Spaces or containers. + +Without a shared dataset-side declaration, these assets are hard to discover, hard to run from framework clients, and likely to diverge into framework-specific registries. + +### Goals + +1. Represent OpenEnv task sets as ordinary Hugging Face dataset artifacts. +2. Keep publishing unchanged: users push files to dataset repos as they do today. +3. Move task-set definition into a dataset-root `environment.yaml`. +4. Keep OpenEnv's implementation surface small: resolve, load rows, run runtime. +5. Support framework facets for OpenEnv, Harbor, and Verifiers without requiring any framework to adopt another framework's task schema. +6. Avoid arbitrary remote Python execution by default. +7. Preserve backward compatibility for existing OpenEnv manifests and runtimes. + +### Non-Goals + +1. No new Hugging Face repo type. +2. No requirement that the Hub execute environments. +3. No replacement for Dataset Viewer configs, Croissant, or dataset cards. +4. No attempt to standardize all RL frameworks or task schemas. +5. No required Dataset Card YAML metadata for RL-environment discovery. +6. No required remote code execution to load a task set. +7. No requirement that all OpenEnv environments expose task sets. + +## Design + +### Architecture Overview + +An RL environment dataset is a Hugging Face dataset repository that includes an optional `environment.yaml` file at its root: + +```text +dataset repo +├── README.md +├── environment.yaml +├── data/ +│ ├── train-00000-of-00001.parquet +│ └── validation-00000-of-00001.parquet +└── optional assets... +``` + +The dataset repo owns task-set declarations. The runtime repo, Space, container, or package owns environment execution. + +```mermaid +flowchart LR + Ref["hf://datasets/org/repo/env-id@rev"] --> Resolver["OpenEnv resolver"] + Resolver --> Manifest["dataset repo environment.yaml"] + Resolver --> Rows["load_dataset(repo, config, split, revision)"] + Manifest --> Runtime["OpenEnv runtime target"] + Runtime --> Space["HF Space"] + Runtime --> Image["Container image"] + Runtime --> Package["Python package"] + Rows --> BoundEnv["dataset-bound env"] + BoundEnv --> Reset["reset/step row cursor"] +``` + +### `environment.yaml` + +Dataset repos that want to expose RL environments add this file: + +```yaml +spec_version: hf-rl-env-0.1 + +environments: + - id: coding + title: Coding Tasks + config_name: default + splits: [train, validation] + frameworks: + openenv: + min_version: ">=0.3.0" + space_id: openenv/coding_env +``` + +Required top-level fields: + +- `spec_version`: currently `hf-rl-env-0.1`. +- `environments`: non-empty list of environment declarations. + +Required per-environment fields: + +- `id`: stable identifier within the dataset repo. +- `frameworks`: map of framework declarations the subset is expected to run with. + +Optional per-environment fields: + +- `title`: human-readable display name. +- `description`: human-readable description. +- `config_name`: matching Dataset Viewer / `datasets.load_dataset` config. +- `splits`: valid split names for this environment subset. +- `dataset`: explicit data locations when Dataset Viewer configs are not enough. +- `runtime`: cross-framework isolation, resource, network, and secret expectations. +- `reward`: scalar/object reward convention when non-standard. + +### OpenEnv Framework Declaration + +The OpenEnv declaration should be intentionally small. The dataset already implies the repo id and revision; Dataset Viewer metadata already describes config and split layout. OpenEnv only needs to know how to obtain a compatible runtime. + +Space-backed runtime: + +```yaml +frameworks: + openenv: + min_version: ">=0.3.0" + space_id: openenv/coding_env +``` + +Container-backed runtime: + +```yaml +frameworks: + openenv: + min_version: ">=0.3.0" + image: ghcr.io/openenv/coding_env:0.3.0 +``` + +Package-backed runtime: + +```yaml +frameworks: + openenv: + min_version: ">=0.3.0" + package: openenv-coding-env>=0.3.0 +``` + +Exactly one of `space_id`, `image`, or `package` should be specified for the MVP. `manifest` defaults to `openenv.yaml` when a runtime package or Space needs manifest discovery. + +Optional OpenEnv fields: + +```yaml +frameworks: + openenv: + image: ghcr.io/openenv/tbench2:0.3.0 + resources: + cpus: 4 + memory_gb: 8 + gpus: 0 + network: false + secrets: + - HF_TOKEN +``` + +These fields describe expected runtime needs. They are not executable hooks. + +### Environment References + +OpenEnv should support a stable reference syntax: + +```text +hf://datasets/{repo_id}/{environment_id}@{revision} +``` + +Examples: + +```text +hf://datasets/openenv/coding-tasksets/coding@main +hf://datasets/harborframework/terminal-bench-2.0/terminal-bench-2@2.0 +``` + +The revision is optional and defaults to the dataset repo default branch during experimentation. Reproducible runs should pin a tag or commit SHA. + +`AutoEnv` should parse these references directly. A structured reference type may exist internally, but it should not be the primary user-facing API: + +```python +env = AutoEnv.from_env( + "hf://datasets/openenv/coding-tasksets/coding@main", + split="train", +) +``` + +### Dataset-Bound Environment Behavior + +OpenEnv should avoid requiring users to manually resolve rows or call row-mapping helpers. A dataset environment reference returns a normal environment client with extra dataset attributes: + +```python +env.dataset # loaded Hugging Face dataset split +env.dataset_ref # parsed hf://datasets/... reference +env.dataset_index # current row cursor +env.current_row # row currently bound to the environment, if any +``` + +Default behavior is cursor-based. On `reset()` with no explicit task kwargs, the dataset-bound environment starts from row `0`. On each environment step, the wrapper advances the row cursor so row `0` corresponds to step `0`, row `1` corresponds to step `1`, and so on. Runtime implementations that need a different interpretation can override this behavior, but simple dataset-backed environments should work without custom client code. + +The default row mapping should use a small convention rather than a YAML column-mapping DSL: + +1. If the row has `openenv_reset`, pass those fields to `reset()` when the row is used for reset. +2. If the row has `openenv_step`, pass those fields to `step()` when the row is used for a step. +3. Else if the row has `task`, expose `row["task"]` to the runtime. +4. Else expose the whole row to the runtime. + +This supports canonical row-based environments, step-indexed datasets, and legacy/specialized runtimes. + +Canonical row: + +```json +{ + "task_id": "example-001", + "task": { + "prompt": [{ "role": "user", "content": "Solve this problem." }], + "answer": "42", + "metadata": { "difficulty": "easy" } + } +} +``` + +Explicit reset row: + +```json +{ + "task_id": "headless-terminal", + "openenv_reset": { + "task_id": "headless-terminal" + }, + "metadata": { + "source": "terminal-bench-2" + } +} +``` + +The OpenEnv runtime must ensure reference answers, hidden tests, and verifier metadata are not leaked to the agent through observations or tool results unless the environment author intentionally exposes them. + +### Client API + +The primary API should instantiate an environment from either a runtime reference or a dataset environment reference. If the reference points to a dataset, `AutoEnv` returns a dataset-bound environment: + +```python +from openenv import AutoEnv + +env = AutoEnv.from_env( + "hf://datasets/openenv/coding-tasksets/coding@main", + split="train", + trust_remote_code=False, +) + +assert env.dataset is not None + +obs = env.reset() # binds row 0 +obs = env.step(action) # binds row 1 by default +obs = env.step(action) # binds row 2 by default +``` + +Evaluation and collection loops can use the environment directly: + +```python +from openenv import AutoEnv + +env = AutoEnv.from_env( + "hf://datasets/openenv/coding-tasksets/coding@main", + split="train", +) + +with env: + obs = env.reset() + done = False + while not done: + action = policy.act(obs) + result = env.step(action) + obs = result.observation + done = result.done +``` + +`openenv.yaml` inside an OpenEnv runtime may optionally reference a default task set, but should not duplicate the dataset-side schema: + +```yaml +task_sets: + default: hf://datasets/openenv/coding-tasksets/coding@main +``` + +### Hub Behavior + +Hub support is outside the OpenEnv implementation, but this RFC is designed to align with a small Hub feature: + +1. Check for `environment.yaml` at the dataset repo root. +2. Parse and validate `environment.yaml`. +3. Add an `Environment` badge and framework facets such as `openenv`, `harbor`, and `verifiers`. +4. Generate framework snippets. +5. Expose a machine-readable endpoint for parsed environment declarations. + +OpenEnv should not block on Hub UI support. The framework client can fetch `environment.yaml` directly from the dataset repo. + +## Key Design Decisions + +- A task set is a dataset-side environment declaration in `environment.yaml`. +- `openenv.yaml` may contain a short reference to a default task set, but the detailed declaration lives in the dataset repo. +- `AutoEnv` returns a dataset-bound environment when given a dataset environment reference. The environment owns the row cursor and applies a simple row convention. +- `environment.yaml` does not execute arbitrary adapter code. Custom behavior belongs in the trusted OpenEnv runtime package, Space, or container image. +- The `frameworks.openenv` block chooses a Space, image, or package. Harbor and Verifiers may use their own framework blocks in the same environment declaration. + + +## Implementation Plan + +### Phase 1: Dataset Environment Resolver + +Add a small dataset environment resolver under `src/openenv/core/`: + +- dataset environment reference parser for `hf://datasets/...`. +- `EnvironmentDatasetManifest`: Pydantic model for `environment.yaml`. +- dataset-bound environment wrapper that attaches `dataset`, `dataset_ref`, `dataset_index`, and `current_row`. +- row cursor logic for default reset/step row binding. + +The resolver should be internal to `AutoEnv`. It fetches `environment.yaml` from the Hugging Face dataset repo using `huggingface_hub`, then loads rows through `datasets.load_dataset` when `datasets` is installed. If `datasets` is not available, return a clear dependency error with the install extra to use. + +### Phase 2: AutoEnv Integration + +Add: + +```python +AutoEnv.from_env(ref: str, *, split: str | None = None, trust_remote_code: bool = False) +``` + +For `frameworks.openenv.space_id`, this resolves the Space URL and returns a dataset-bound OpenEnv client for that runtime. For `image`, it starts a container through the existing provider path and wraps the client. For `package`, it installs or imports through the existing auto-discovery path and wraps the client. + +### Phase 3: CLI Support + +Add commands: + +```bash +openenv env inspect hf://datasets/org/repo/env-id@rev +openenv env validate path/to/environment.yaml +openenv run hf://datasets/org/repo/env-id@rev --split train +``` + +`validate` should check schema shape, selected Dataset Viewer config/splits when available, and whether `frameworks.openenv` has exactly one runtime target. + +### Phase 4: Reference Environment + +Update one existing environment to demonstrate the pattern. Good candidates: + +- `reasoning_gym_env`: simple row-based tasks. +- `tbench2_env`: validates explicit `openenv_reset` for task-id based runtimes. + +### Phase 5: Documentation + +Document: + +- How to author `environment.yaml`. +- How to publish an OpenEnv-compatible task set as a HF dataset. +- How to run by `hf://datasets/...`. +- How to pin revisions for reproducible training/evaluation. + +## Examples + +### Dataset Repository Declaration + +```yaml +spec_version: hf-rl-env-0.1 + +environments: + - id: terminal-bench-2 + title: Terminal-Bench 2.0 + config_name: harbor_tasks + splits: [train, validation] + frameworks: + harbor: + min_version: ">=0.1.0" + verifiers: + min_version: ">=0.1.14" + adapter: verifiers.v1.packages.tasksets.HarborTaskset + openenv: + min_version: ">=0.3.0" + space_id: openenv/tbench2_env +``` + +### Running with OpenEnv + +```python +from openenv import AutoEnv + +env = AutoEnv.from_env( + "hf://datasets/harborframework/terminal-bench-2.0/terminal-bench-2@2.0", + split="train", + trust_remote_code=False, +) +``` + +### Dataset-Bound Evaluation Loop + +```python +from openenv import AutoEnv + +env = AutoEnv.from_env( + "hf://datasets/openenv/reasoning-gym-tasksets/chain-sum@main", + split="train", +) + +with env: + obs = env.reset() # row 0 + obs = env.step(action) # row 1 + obs = env.step(action) # row 2 +``` + +### Optional OpenEnv Runtime Default + +```yaml +spec_version: 1 +name: reasoning_gym +type: space +runtime: fastapi +app: server.app:app +port: 8000 + +task_sets: + default: hf://datasets/openenv/reasoning-gym-tasksets/chain-sum@main +``` + +## Open Questions + +1. Should OpenEnv define a package extra such as `openenv-core[tasksets]` for the `datasets` dependency? +2. Should `environment.yaml` live only at the dataset root, or should a dataset config be allowed to carry a colocated declaration in a subdirectory? +3. Should dataset-bound `step()` pass row data through kwargs, through an attached context object, or through an environment-side row provider? +4. Should `openenv_reset` and `openenv_step` be the right names for explicit method kwargs? +5. How should private dataset repos and private Spaces coordinate auth and secrets without storing secret values in metadata? diff --git a/rfcs/README.md b/rfcs/README.md index 30e4d9afb..e14f5e888 100644 --- a/rfcs/README.md +++ b/rfcs/README.md @@ -93,6 +93,9 @@ Each RFC should include the following sections: ### Agentic Harnesses - [005-agentic-harnesses.md](./005-agentic-harnesses.md) - Agentic Harness Integration (OpenClaw, Claude Code, etc.) +### Environment Datasets +- [006-hf-rl-environment-datasets.md](./006-hf-rl-environment-datasets.md) - Hugging Face RL Environment Datasets + ## Questions? For questions about the RFC process, reach out to the core team or open a discussion in the project repository.