Summary
predict-rlm has a strong sandboxed execution model, which is a real advantage. But to embed it inside a larger agent system, the runtime needs a standard way to call host-provided capabilities across the sandbox boundary without breaking that security model.
Right now, the likely integration problem for external systems is: the RLM can reason recursively inside the runtime, but many real tasks need carefully controlled access to host resources such as:
- repository files
- search/index APIs
- patch application
- memory retrieval
- whitelisted command execution
- UI/event publishing
The ask here is not to allow arbitrary host escape. The ask is to make the host/runtime boundary explicit and safe.
Why this matters
For an embedding system, the ideal split is:
- host owns policy, approvals, capabilities, lifecycle
predict-rlm owns recursive execution in a constrained environment
Without a host capability bridge, the embedding options are awkward:
- either the runtime is too constrained for many useful tasks
- or the host has to bypass the runtime with ad hoc glue
- or users are forced into an unsafe trusted mode too early
Proposed direction
Add a first-class API for host-registered capabilities/tools that the runtime can invoke.
Properties:
- capabilities are explicitly registered by the host
- each capability has a schema
- host controls permissions and availability per run
- capability calls return structured results
- failures are structured, not opaque strings
- timeouts and denial semantics are standardized
Example host capabilities
Examples of what an embedding system might register:
read_repo_file(path)
search_repo(query)
apply_patch(path, diff)
run_test(command, args) with host-side allowlisting
memory_lookup(query)
emit_progress(message, metadata)
These should remain host-mediated, not direct sandbox escape hatches.
Example API shape
This is just illustrative:
runtime = PredictRLMRuntime(
capabilities=[
HostCapability(
name="read_repo_file",
description="Read a file from the mounted repository",
input_schema={...},
handler=read_repo_file,
timeout_ms=5000,
),
HostCapability(
name="memory_lookup",
description="Query host memory system",
input_schema={...},
handler=memory_lookup,
),
]
)
And the runtime result/trace should show capability invocations as first-class events.
Security model
This should preserve the sandbox model:
- no capability exists unless the host registers it
- host decides which capabilities are available for each run
- capability calls are auditable
- capability denials are explicit
- capabilities can be narrow and policy-controlled
Use cases
- Coding worker embedded inside a larger system
- Research worker that needs access to host indexes/APIs
- Approval-gated flows where sensitive operations are host-mediated
- Systems that want to preserve sandbox security while still being useful
Acceptance criteria
- host can register named capabilities with schema + handler
- runtime can invoke those capabilities from inside the recursive execution loop
- capability results are structured
- capability failures are structured
- capability calls appear in structured runtime traces/events
- host can deny or omit capabilities safely
- docs include one embedded-system example
Summary
predict-rlmhas a strong sandboxed execution model, which is a real advantage. But to embed it inside a larger agent system, the runtime needs a standard way to call host-provided capabilities across the sandbox boundary without breaking that security model.Right now, the likely integration problem for external systems is: the RLM can reason recursively inside the runtime, but many real tasks need carefully controlled access to host resources such as:
The ask here is not to allow arbitrary host escape. The ask is to make the host/runtime boundary explicit and safe.
Why this matters
For an embedding system, the ideal split is:
predict-rlmowns recursive execution in a constrained environmentWithout a host capability bridge, the embedding options are awkward:
Proposed direction
Add a first-class API for host-registered capabilities/tools that the runtime can invoke.
Properties:
Example host capabilities
Examples of what an embedding system might register:
read_repo_file(path)search_repo(query)apply_patch(path, diff)run_test(command, args)with host-side allowlistingmemory_lookup(query)emit_progress(message, metadata)These should remain host-mediated, not direct sandbox escape hatches.
Example API shape
This is just illustrative:
And the runtime result/trace should show capability invocations as first-class events.
Security model
This should preserve the sandbox model:
Use cases
Acceptance criteria