Validate 0.530 gains on more LoCoMo samples (currently sample-0 only)

Every result from the 2026-06-18 session (graph 0.286→**0.530**) is on **sample 0 (conv-26) only** — risk of overfitting to one conversation.

**Task:** run the best config on samples 1–9 and confirm the gains hold.
```
bash -ic '.venv/bin/python src/run_locomo.py --sample N --modes graph,baseline --typed --workers 8 --out results/typed_sN.json'
```
- Check graph still beats baseline per-category, especially cat2 (temporal) and cat3 (inference) where sample-0 gains were largest.
- 50-QA (`--max-qa 50`) first for a fast trend, then full.

Gates trusting the 0.530 headline. See memory dgm-handoff-2026-06-18. Refs #18.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate 0.530 gains on more LoCoMo samples (currently sample-0 only) #20

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Validate 0.530 gains on more LoCoMo samples (currently sample-0 only) #20

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions