Skip to content

RFC: Support multi-threading (e.g. subagents, multi-agents) #779

@lawrence-harmonic

Description

@lawrence-harmonic

Problem Statement

OpenAI Gym abstraction is built for serial MDP execution: one action yields one observation, so the execution order is action -> observation -> action -> observation -> ...

Parallel execution (subagents, multi-agents) is important for scaling test-time compute and keeping wall-clock time down.

As a simple example, if you take a step() and the action is to spawn two subagents, then naturally two observations will be produced: one for each subagent prompt. But the interface as-is returns only one observation, and you can't get two observations without submitting two actions.

Proposed Solution

Change from a single step() function to a producer-consumer architecture with decoupled act() and observe(). The user can then maintain both an observe asyncio task and multiple act tasks at all times, and thus any execution order can happen such as action -> observation -> observation -> action -> action -> ...

For RL purposes, the MDP abstraction still fits mathematically if you take your trajectory to be ordered either by the observation time or the action time. Prefix merging for training also still works if you use a trie or rolling hash or similar, in order to match prefixes which are not directly the previous request.

Alignment with Principles

Addresses "Design for LLMs" by acknowledging that LLM inference and tool execution has long and variable-length latency which makes serial MDP execution less viable for larger workflows

Bends "Simple Gymnasium-style API"; I claim that departing from the step() API is necessary to support concurrency.

Principles Addressed

  • Minimize lifecycle deltas (training = production)
  • Minimize human-agent divergence
  • Design for LLMs

Trade-offs

I'm not aware of any alternatives.

Impact

Breaks the current API contract.

Files/Systems Affected

All

Related RFCs

The only search result for "concurrency" is #194 , which I don't believe is related.

Implementation Sketch

observations: asyncio.Queue[tuple[UUID, Observation]]
actions: asyncio.Queue[tuple[UUID, Action]]

async def observe() -> tuple[UUID, Observation]:
    return await self.observations.get()

def act(observation_uuid: UUID, action: Action) -> None:
    self.actions.put_nowait((observation_uuid, action))

async def event_loop() -> None: ...  # await self.actions.get(), self.observations.put_nowait()

Open Questions

N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions