First-class host capability bridge for sandboxed RLM execution

### Summary

`predict-rlm` has a strong sandboxed execution model, which is a real advantage. But to embed it inside a larger agent system, the runtime needs a standard way to call host-provided capabilities across the sandbox boundary without breaking that security model.

Right now, the likely integration problem for external systems is: the RLM can reason recursively inside the runtime, but many real tasks need carefully controlled access to host resources such as:
- repository files
- search/index APIs
- patch application
- memory retrieval
- whitelisted command execution
- UI/event publishing

The ask here is **not** to allow arbitrary host escape. The ask is to make the host/runtime boundary explicit and safe.

### Why this matters

For an embedding system, the ideal split is:
- host owns policy, approvals, capabilities, lifecycle
- `predict-rlm` owns recursive execution in a constrained environment

Without a host capability bridge, the embedding options are awkward:
- either the runtime is too constrained for many useful tasks
- or the host has to bypass the runtime with ad hoc glue
- or users are forced into an unsafe trusted mode too early

### Proposed direction

Add a first-class API for host-registered capabilities/tools that the runtime can invoke.

Properties:
- capabilities are explicitly registered by the host
- each capability has a schema
- host controls permissions and availability per run
- capability calls return structured results
- failures are structured, not opaque strings
- timeouts and denial semantics are standardized

### Example host capabilities

Examples of what an embedding system might register:
- `read_repo_file(path)`
- `search_repo(query)`
- `apply_patch(path, diff)`
- `run_test(command, args)` with host-side allowlisting
- `memory_lookup(query)`
- `emit_progress(message, metadata)`

These should remain host-mediated, not direct sandbox escape hatches.

### Example API shape

This is just illustrative:

```python
runtime = PredictRLMRuntime(
    capabilities=[
        HostCapability(
            name="read_repo_file",
            description="Read a file from the mounted repository",
            input_schema={...},
            handler=read_repo_file,
            timeout_ms=5000,
        ),
        HostCapability(
            name="memory_lookup",
            description="Query host memory system",
            input_schema={...},
            handler=memory_lookup,
        ),
    ]
)
```

And the runtime result/trace should show capability invocations as first-class events.

### Security model

This should preserve the sandbox model:
- no capability exists unless the host registers it
- host decides which capabilities are available for each run
- capability calls are auditable
- capability denials are explicit
- capabilities can be narrow and policy-controlled

### Use cases

- Coding worker embedded inside a larger system
- Research worker that needs access to host indexes/APIs
- Approval-gated flows where sensitive operations are host-mediated
- Systems that want to preserve sandbox security while still being useful

### Acceptance criteria

- host can register named capabilities with schema + handler
- runtime can invoke those capabilities from inside the recursive execution loop
- capability results are structured
- capability failures are structured
- capability calls appear in structured runtime traces/events
- host can deny or omit capabilities safely
- docs include one embedded-system example


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

First-class host capability bridge for sandboxed RLM execution #5

Summary

Why this matters

Proposed direction

Example host capabilities

Example API shape

Security model

Use cases

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

First-class host capability bridge for sandboxed RLM execution #5

Description

Summary

Why this matters

Proposed direction

Example host capabilities

Example API shape

Security model

Use cases

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions