You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The kb-command L4 matrix (experiments/kb-command/EXPERIMENT.md) has one row — load-and-cite — that cannot be tested with the current harness:
Operator types /kb <area-id>; a subsequent prompt asks for a fact in that area → entries injected; LLM cites the marker.
/kb is a Pi slash command, not a prompt — it triggers no LLM turn (the handler just pi.sendMessage(...)s the area's entries into the session and returns). To observe "the LLM cited it" you need a second prompt in the same Pi process, because:
step (in scripts/spike/lib/step.sh) runs vfa run --prompt "<one prompt>" — one prompt per invocation.
vfa run accepts only a single --prompt/--prompt-file (no message array, no --continue).
Each step spins up a fresh Pi container with no Pi-level session continuity (it threads KB_SESSION_ID for the kb extension's file-backed SessionState — loaded areas, signals — not the Pi conversation transcript).
So step "load" --prompt "/kb foo" then step "cite" --prompt "..." → step 2's Pi never saw step 1's injected message.
What we have
3 single-step scenarios that DO work and anchor the dispatch path (load-marks-area-loaded, usage-on-empty-args, unknown-area-no-load) — they assert on vfa logs --raw <run_id> (the mykb-loaded-areas custom message in the event stream) + the persisted .sessions/<id>.jsonloadedAreas. Combined with the L1 test (tests/extension/kb-command.test.ts, which checks pi.sendMessage content) that's solid coverage of dispatch + injection-call + loaded-state. The gap is purely the LLM-sees-the-injected-message end-to-end leg.
Options
step --command "/kb foo" --then-prompt "..." — needs vfa to accept two messages in one run (Pi's print mode already loops over a messages array internally; vfa would need to surface it, e.g. --prompt repeatable or --messages-file). Cleanest. Probably an upstream vf-agents change (cf. the --append-system-prompt-file upstreaming in [vf-agents PR chore(deps): bump actions/checkout from 4 to 6 #10]).
vfa session continuity — step switches from vfa run to vfa session so step N+1 continues step N's Pi conversation. Bigger harness change; also changes the semantics of every other multi-step scenario (they currently rely on KB_SESSION_ID state, not conversation carry-over — that's deliberate isolation).
Accept the gap — keep load-and-cite scaffolded, document it (done). The L1 + the 3 single-step L4 scenarios cover everything except the LLM-reads-it leg, which is the same role:'custom' → convertToLlm path the per-turn <mykb-context> injection uses and which area-scoring/scoring-isolated.sh already exercises end-to-end.
Cross-refs
Surfaced while fixing the /kb bug (registerCommand shape + ctx.inject) — see docs/findings-log.md row QbRwqLxJ and kb gotcha QbRwqLxJ on the mykb area.
experiments/kb-command/EXPERIMENT.md — behavior matrix + the blocked row.
docs/experiment-coverage.md — kb-command listed with the load-and-cite sub-gap.
Context
The
kb-commandL4 matrix (experiments/kb-command/EXPERIMENT.md) has one row —load-and-cite— that cannot be tested with the current harness:/kbis a Pi slash command, not a prompt — it triggers no LLM turn (the handler justpi.sendMessage(...)s the area's entries into the session and returns). To observe "the LLM cited it" you need a second prompt in the same Pi process, because:step(inscripts/spike/lib/step.sh) runsvfa run --prompt "<one prompt>"— one prompt per invocation.vfa runaccepts only a single--prompt/--prompt-file(no message array, no--continue).stepspins up a fresh Pi container with no Pi-level session continuity (it threadsKB_SESSION_IDfor the kb extension's file-backedSessionState— loaded areas, signals — not the Pi conversation transcript).So
step "load" --prompt "/kb foo"thenstep "cite" --prompt "..."→ step 2's Pi never saw step 1's injected message.What we have
3 single-step scenarios that DO work and anchor the dispatch path (
load-marks-area-loaded,usage-on-empty-args,unknown-area-no-load) — they assert onvfa logs --raw <run_id>(themykb-loaded-areascustom message in the event stream) + the persisted.sessions/<id>.jsonloadedAreas. Combined with the L1 test (tests/extension/kb-command.test.ts, which checkspi.sendMessagecontent) that's solid coverage of dispatch + injection-call + loaded-state. The gap is purely the LLM-sees-the-injected-message end-to-end leg.Options
step --command "/kb foo" --then-prompt "..."— needs vfa to accept two messages in one run (Pi's print mode already loops over amessagesarray internally; vfa would need to surface it, e.g.--promptrepeatable or--messages-file). Cleanest. Probably an upstream vf-agents change (cf. the--append-system-prompt-fileupstreaming in [vf-agents PR chore(deps): bump actions/checkout from 4 to 6 #10]).vfa sessioncontinuity —stepswitches fromvfa runtovfa sessionso step N+1 continues step N's Pi conversation. Bigger harness change; also changes the semantics of every other multi-step scenario (they currently rely onKB_SESSION_IDstate, not conversation carry-over — that's deliberate isolation).load-and-citescaffolded, document it (done). The L1 + the 3 single-step L4 scenarios cover everything except the LLM-reads-it leg, which is the samerole:'custom'→convertToLlmpath the per-turn<mykb-context>injection uses and whicharea-scoring/scoring-isolated.shalready exercises end-to-end.Cross-refs
/kbbug (registerCommandshape +ctx.inject) — seedocs/findings-log.mdrowQbRwqLxJand kb gotchaQbRwqLxJon themykbarea.experiments/kb-command/EXPERIMENT.md— behavior matrix + the blocked row.docs/experiment-coverage.md— kb-command listed with theload-and-citesub-gap.