RFC: Support multi-threading (e.g. subagents, multi-agents)

## Problem Statement

OpenAI Gym abstraction is built for serial MDP execution: one action yields one observation, so the execution order is action -> observation -> action -> observation -> ...

Parallel execution (subagents, multi-agents) is important for scaling test-time compute and keeping wall-clock time down.

As a simple example, if you take a `step()` and the action is to spawn two subagents, then naturally two observations will be produced: one for each subagent prompt. But the interface as-is returns only one observation, and you can't get two observations without submitting two actions.

## Proposed Solution

Change from a single `step()` function to a producer-consumer architecture with decoupled `act()` and `observe()`. The user can then maintain both an observe asyncio task and multiple act tasks at all times, and thus any execution order can happen such as action -> observation -> observation -> action -> action -> ...

For RL purposes, the MDP abstraction still fits mathematically if you take your trajectory to be ordered either by the observation time or the action time. Prefix merging for training also still works if you use a trie or rolling hash or similar, in order to match prefixes which are not directly the previous request.

## Alignment with Principles

Addresses "Design for LLMs" by acknowledging that LLM inference and tool execution has long and variable-length latency which makes serial MDP execution less viable for larger workflows

Bends "Simple Gymnasium-style API"; I claim that departing from the step() API is necessary to support concurrency.

### Principles Addressed
- [x] Minimize lifecycle deltas (training = production)
- [x] Minimize human-agent divergence
- [x] Design for LLMs

## Trade-offs

I'm not aware of any alternatives.

## Impact
Breaks the current API contract.

### Files/Systems Affected
All

### Related RFCs
The only search result for "concurrency" is https://github.com/huggingface/OpenEnv/issues/194 , which I don't believe is related.

## Implementation Sketch


```python
observations: asyncio.Queue[tuple[UUID, Observation]]
actions: asyncio.Queue[tuple[UUID, Action]]

async def observe() -> tuple[UUID, Observation]:
    return await self.observations.get()

def act(observation_uuid: UUID, action: Action) -> None:
    self.actions.put_nowait((observation_uuid, action))

async def event_loop() -> None: ...  # await self.actions.get(), self.observations.put_nowait()
```

## Open Questions
N/A


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Support multi-threading (e.g. subagents, multi-agents) #779

Problem Statement

Proposed Solution

Alignment with Principles

Principles Addressed

Trade-offs

Impact

Files/Systems Affected

Related RFCs

Implementation Sketch

Open Questions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

RFC: Support multi-threading (e.g. subagents, multi-agents) #779

Description

Problem Statement

Proposed Solution

Alignment with Principles

Principles Addressed

Trade-offs

Impact

Files/Systems Affected

Related RFCs

Implementation Sketch

Open Questions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions