Skip to content

Default template.document-prompt is too sparse — measurable rubric loss on executive synthesis questions #907

@aviyashchin

Description

@aviyashchin

Observation

The TG-shipped default template.document-prompt is 4 lines:

Study the following context. Use only the information provided in the context in your response.
Do not speculate if the answer is not found in the provided set of knowledge statements.
Here is the context: {{documents}}
Use only the provided knowledge statements to respond to the following: {{query}}

This is excellent for epistemic safety (don't hallucinate) but forfeits format depth on any rubric that rewards elaboration. Common rubrics like 4-axis judging (data_anchoring, specificity, off-topic, refusal) reward:

  • Bold named entities
  • Numbered lists with per-entity descriptions
  • Lead-with-answer (no "Based on the provided information" hedging)
  • Named citations to specific products / numbers / dates

The default template.document-prompt instructs none of these. The resulting answers list entities flatly (Microsoft, Google, AWS) instead of elaborating (1. **Microsoft** — Competes in cloud services...), and the judge penalizes both data_anchoring and specificity.

Measured impact

In a 30-sample bake-off (5 probes × 3 companies × POC/TG), TG with the default template.document-prompt scored 65.5% on the 4-axis rubric. After overriding the template with a richer Sizzl-tuned version (1,250 bytes, still epistemically disciplined), TG scored 83.0% — beating the POC baseline of 82.3%.

That's a +17.5 rubric point lift from one config change.

Full evidence + diagnosis chain

Proposal

Ship a richer default template.document-prompt that retains the epistemic discipline but also instructs formatting + per-entity elaboration. Reference shape:

  • Format rules (mandatory): bold named entities, numbered lists for enumerations, paragraph structure for analytical questions
  • Grounding rules (mandatory): use only the context, lead with the answer, weave multiple pieces of evidence when present
  • Anti-hedge rule: never start with "Based on the provided information"
  • Sparse-knowledge rule: when context is thin, say so explicitly (e.g. "The provided knowledge is sparse on X; what we do know is Y") rather than generic refusal

Our Sizzl-tuned template (open-sourced under Apache 2.0 in our repo) is a starting point: sizzl-document-prompt.json.

Why this matters upstream

Every TG deployment that benchmarks against a non-RAG LLM baseline will underperform on standard rubrics until they discover this themselves. We spent half a day chasing substrate, retrieval volume, and model swaps before reading the default template carefully. A better default would have saved that time.

Stack

TrustGraph 2.3.21, document-rag (not graph-rag), OpenAI gpt-4.1-mini synthesis.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions