kb-spike: multi-message-per-Pi-process — gates the prompt-routing L4 dimension (incl. /kb load-and-cite)

## Context

The `kb-command` L4 matrix (`experiments/kb-command/EXPERIMENT.md`) has one row — `load-and-cite` — that cannot be tested with the current harness:

> Operator types `/kb <area-id>`; a *subsequent* prompt asks for a fact in that area → entries injected; LLM cites the marker.

`/kb` is a Pi **slash command**, not a prompt — it triggers no LLM turn (the handler just `pi.sendMessage(...)`s the area's entries into the session and returns). To observe "the LLM cited it" you need a **second prompt in the same Pi process**, because:

- `step` (in `scripts/spike/lib/step.sh`) runs `vfa run --prompt "<one prompt>"` — one prompt per invocation.
- `vfa run` accepts only a single `--prompt`/`--prompt-file` (no message array, no `--continue`).
- Each `step` spins up a fresh Pi container with no Pi-level session continuity (it threads `KB_SESSION_ID` for the *kb extension's* file-backed `SessionState` — loaded areas, signals — not the Pi conversation transcript).

So `step "load" --prompt "/kb foo"` then `step "cite" --prompt "..."` → step 2's Pi never saw step 1's injected message.

## What we have

3 single-step scenarios that DO work and anchor the dispatch path (`load-marks-area-loaded`, `usage-on-empty-args`, `unknown-area-no-load`) — they assert on `vfa logs --raw <run_id>` (the `mykb-loaded-areas` custom message in the event stream) + the persisted `.sessions/<id>.json` `loadedAreas`. Combined with the L1 test (`tests/extension/kb-command.test.ts`, which checks `pi.sendMessage` content) that's solid coverage of *dispatch + injection-call + loaded-state*. The gap is purely the *LLM-sees-the-injected-message* end-to-end leg.

## Options

1. **`step --command "/kb foo" --then-prompt "..."`** — needs vfa to accept two messages in one run (Pi's print mode already loops over a `messages` array internally; vfa would need to surface it, e.g. `--prompt` repeatable or `--messages-file`). Cleanest. Probably an upstream vf-agents change (cf. the `--append-system-prompt-file` upstreaming in [vf-agents PR #10]).
2. **`vfa session` continuity** — `step` switches from `vfa run` to `vfa session` so step N+1 continues step N's Pi conversation. Bigger harness change; also changes the semantics of every other multi-step scenario (they currently rely on `KB_SESSION_ID` state, not conversation carry-over — that's deliberate isolation).
3. **Accept the gap** — keep `load-and-cite` scaffolded, document it (done). The L1 + the 3 single-step L4 scenarios cover everything except the LLM-reads-it leg, which is the same `role:'custom'` → `convertToLlm` path the per-turn `<mykb-context>` injection uses and which `area-scoring/scoring-isolated.sh` already exercises end-to-end.

## Cross-refs

- Surfaced while fixing the `/kb` bug (`registerCommand` shape + `ctx.inject`) — see `docs/findings-log.md` row `QbRwqLxJ` and kb gotcha `QbRwqLxJ` on the `mykb` area.
- `experiments/kb-command/EXPERIMENT.md` — behavior matrix + the blocked row.
- `docs/experiment-coverage.md` — kb-command listed with the `load-and-cite` sub-gap.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kb-spike: multi-message-per-Pi-process — gates the prompt-routing L4 dimension (incl. /kb load-and-cite) #7

Context

What we have

Options

Cross-refs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

kb-spike: multi-message-per-Pi-process — gates the prompt-routing L4 dimension (incl. /kb load-and-cite) #7

Description

Context

What we have

Options

Cross-refs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions