Observation
The TG-shipped default template.document-prompt is 4 lines:
Study the following context. Use only the information provided in the context in your response.
Do not speculate if the answer is not found in the provided set of knowledge statements.
Here is the context: {{documents}}
Use only the provided knowledge statements to respond to the following: {{query}}
This is excellent for epistemic safety (don't hallucinate) but forfeits format depth on any rubric that rewards elaboration. Common rubrics like 4-axis judging (data_anchoring, specificity, off-topic, refusal) reward:
- Bold named entities
- Numbered lists with per-entity descriptions
- Lead-with-answer (no "Based on the provided information" hedging)
- Named citations to specific products / numbers / dates
The default template.document-prompt instructs none of these. The resulting answers list entities flatly (Microsoft, Google, AWS) instead of elaborating (1. **Microsoft** — Competes in cloud services...), and the judge penalizes both data_anchoring and specificity.
Measured impact
In a 30-sample bake-off (5 probes × 3 companies × POC/TG), TG with the default template.document-prompt scored 65.5% on the 4-axis rubric. After overriding the template with a richer Sizzl-tuned version (1,250 bytes, still epistemically disciplined), TG scored 83.0% — beating the POC baseline of 82.3%.
That's a +17.5 rubric point lift from one config change.
Full evidence + diagnosis chain
Proposal
Ship a richer default template.document-prompt that retains the epistemic discipline but also instructs formatting + per-entity elaboration. Reference shape:
- Format rules (mandatory): bold named entities, numbered lists for enumerations, paragraph structure for analytical questions
- Grounding rules (mandatory): use only the context, lead with the answer, weave multiple pieces of evidence when present
- Anti-hedge rule: never start with "Based on the provided information"
- Sparse-knowledge rule: when context is thin, say so explicitly (e.g. "The provided knowledge is sparse on X; what we do know is Y") rather than generic refusal
Our Sizzl-tuned template (open-sourced under Apache 2.0 in our repo) is a starting point: sizzl-document-prompt.json.
Why this matters upstream
Every TG deployment that benchmarks against a non-RAG LLM baseline will underperform on standard rubrics until they discover this themselves. We spent half a day chasing substrate, retrieval volume, and model swaps before reading the default template carefully. A better default would have saved that time.
Stack
TrustGraph 2.3.21, document-rag (not graph-rag), OpenAI gpt-4.1-mini synthesis.
Observation
The TG-shipped default
template.document-promptis 4 lines:This is excellent for epistemic safety (don't hallucinate) but forfeits format depth on any rubric that rewards elaboration. Common rubrics like 4-axis judging (data_anchoring, specificity, off-topic, refusal) reward:
The default
template.document-promptinstructs none of these. The resulting answers list entities flatly (Microsoft, Google, AWS) instead of elaborating (1. **Microsoft** — Competes in cloud services...), and the judge penalizes bothdata_anchoringandspecificity.Measured impact
In a 30-sample bake-off (5 probes × 3 companies × POC/TG), TG with the default
template.document-promptscored 65.5% on the 4-axis rubric. After overriding the template with a richer Sizzl-tuned version (1,250 bytes, still epistemically disciplined), TG scored 83.0% — beating the POC baseline of 82.3%.That's a +17.5 rubric point lift from one config change.
Full evidence + diagnosis chain
Proposal
Ship a richer default
template.document-promptthat retains the epistemic discipline but also instructs formatting + per-entity elaboration. Reference shape:Our Sizzl-tuned template (open-sourced under Apache 2.0 in our repo) is a starting point: sizzl-document-prompt.json.
Why this matters upstream
Every TG deployment that benchmarks against a non-RAG LLM baseline will underperform on standard rubrics until they discover this themselves. We spent half a day chasing substrate, retrieval volume, and model swaps before reading the default template carefully. A better default would have saved that time.
Stack
TrustGraph 2.3.21, document-rag (not graph-rag), OpenAI gpt-4.1-mini synthesis.