Skip to content

kb-spike: multi-message-per-Pi-process — gates the prompt-routing L4 dimension (incl. /kb load-and-cite) #7

@vilosource

Description

@vilosource

Context

The kb-command L4 matrix (experiments/kb-command/EXPERIMENT.md) has one row — load-and-cite — that cannot be tested with the current harness:

Operator types /kb <area-id>; a subsequent prompt asks for a fact in that area → entries injected; LLM cites the marker.

/kb is a Pi slash command, not a prompt — it triggers no LLM turn (the handler just pi.sendMessage(...)s the area's entries into the session and returns). To observe "the LLM cited it" you need a second prompt in the same Pi process, because:

  • step (in scripts/spike/lib/step.sh) runs vfa run --prompt "<one prompt>" — one prompt per invocation.
  • vfa run accepts only a single --prompt/--prompt-file (no message array, no --continue).
  • Each step spins up a fresh Pi container with no Pi-level session continuity (it threads KB_SESSION_ID for the kb extension's file-backed SessionState — loaded areas, signals — not the Pi conversation transcript).

So step "load" --prompt "/kb foo" then step "cite" --prompt "..." → step 2's Pi never saw step 1's injected message.

What we have

3 single-step scenarios that DO work and anchor the dispatch path (load-marks-area-loaded, usage-on-empty-args, unknown-area-no-load) — they assert on vfa logs --raw <run_id> (the mykb-loaded-areas custom message in the event stream) + the persisted .sessions/<id>.json loadedAreas. Combined with the L1 test (tests/extension/kb-command.test.ts, which checks pi.sendMessage content) that's solid coverage of dispatch + injection-call + loaded-state. The gap is purely the LLM-sees-the-injected-message end-to-end leg.

Options

  1. step --command "/kb foo" --then-prompt "..." — needs vfa to accept two messages in one run (Pi's print mode already loops over a messages array internally; vfa would need to surface it, e.g. --prompt repeatable or --messages-file). Cleanest. Probably an upstream vf-agents change (cf. the --append-system-prompt-file upstreaming in [vf-agents PR chore(deps): bump actions/checkout from 4 to 6 #10]).
  2. vfa session continuitystep switches from vfa run to vfa session so step N+1 continues step N's Pi conversation. Bigger harness change; also changes the semantics of every other multi-step scenario (they currently rely on KB_SESSION_ID state, not conversation carry-over — that's deliberate isolation).
  3. Accept the gap — keep load-and-cite scaffolded, document it (done). The L1 + the 3 single-step L4 scenarios cover everything except the LLM-reads-it leg, which is the same role:'custom'convertToLlm path the per-turn <mykb-context> injection uses and which area-scoring/scoring-isolated.sh already exercises end-to-end.

Cross-refs

  • Surfaced while fixing the /kb bug (registerCommand shape + ctx.inject) — see docs/findings-log.md row QbRwqLxJ and kb gotcha QbRwqLxJ on the mykb area.
  • experiments/kb-command/EXPERIMENT.md — behavior matrix + the blocked row.
  • docs/experiment-coverage.md — kb-command listed with the load-and-cite sub-gap.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions