Skip to content

graph-rag latency dominated by sequential Pulsar round-trips inside traversal loop (~50s on small substrate) #922

@beca-oc

Description

@beca-oc

Problem

tg-invoke-graph-rag takes 30–60 seconds on a small substrate (~14k triples, single corp-notion-so collection) even though every individual layer is fast:

Layer Latency (warm) Source
Embeddings call 33ms tg-invoke-embeddings
Graph-embeddings query 22ms tg-invoke-graph-embeddings
Single triples-query round-trip 18ms tg-invoke-triples-query
Direct Memgraph Cypher (same data, same question shape) 50–200ms Bolt protocol
Full tg-invoke-graph-rag 30–60 seconds observed wall

So the bottleneck isn't the triple store (already on Memgraph in our deployment via triples-query-memgraph) and isn't any single component. It's the traversal loop inside trustgraph.retrieval.graph_rag.Processor doing 100–200 sequential Pulsar-mediated triples-query calls per graph-rag invocation.

Evidence from a single graph-rag call against the substrate:

exploration:
  Seed entities: 23
  Edges explored: 136
producer stats (rag service → tg/request/triples):
  totalMsgsSent_ = 191

Each round-trip pays:

  • rag service publishes to tg/request/triples:<flow>
  • triples-query consumer (triples-query-memgraph) reads (Pulsar poll cycle ~50–300ms)
  • triples-query → Memgraph Bolt (~5–20ms)
  • triples-query publishes to tg/response/triples:<flow>
  • rag service consumer reads response (Pulsar poll cycle ~50–300ms)
  • Net: ~100–700ms per round-trip

191 round-trips × ~200ms avg = 38s traversal + LLM synthesis = ~50s observed.

Why this matters

  • Workbench's chat UI exposes Graph RAG and Agent as verbs. Agent's ReAct loop internally calls graph-rag (confirmed in rag-1 logs: ReactPattern iteration 1 → graph-rag.Handling input), so the Agent verb inherits graph-rag's latency and times out at the gateway's 30s socket budget.
  • The only working chat verbs in current Workbench are Basic LLM (no substrate, no provenance) and Document RAG (vector-only retrieval) — neither uses the graph the platform spent effort building.
  • Direct Memgraph Bolt queries against the same data return in 166ms warm for the same question shape, demonstrating the substrate is fast — it's the access pattern that's slow.

Proposed fix

Two options (not mutually exclusive):

Option A — in-process triples-query inside graph-rag

graph_rag.Processor currently uses TriplesClientSpec which routes every traversal call through Pulsar. When --triples-store is memgraph (or neo4j/falkordb), graph-rag could open a direct Bolt connection alongside the Pulsar client and use it for the traversal loop. The Pulsar path stays as the fallback / for inter-flow access.

This eliminates ~200 sequential Pulsar round-trips per call. Expected latency: ~1–3s warm (one direct Bolt query expanding from N seed entities + 3–10s LLM synthesis).

Option B — batched triples-query API

Add a TriplesQueryBatch schema/service that accepts an array of triple patterns and returns an array of result sets in one Pulsar round-trip. graph-rag's traversal loop changes from for entity in seeds: triples_query(entity) to triples_query_batch([entities]).

Expected reduction: 100 round-trips → 5–10 round-trips. Latency: ~5–10s warm.

Option C — tighter graph-rag defaults (workaround, not a fix)

Default entity_limit (50) and max_path_length (2) generate too many edges on substrates of any nontrivial size. Cutting to entity_limit=10, max_path_length=1 reduces round-trips ~10× — but trades coverage for latency. This is a config knob, not an architecture fix.

Acceptance criteria

For Option A or B:

  • tg-invoke-graph-rag returns substrate-cited answer within 5 seconds wall on a 15k-triple substrate, warm path
  • Memgraph backend handles the inner traversal; no Pulsar round-trip per edge expansion
  • Existing graph-rag external API unchanged — backwards compatible
  • urn:graph:retrieval traces still written

Reproduction

# Set up: any TG deployment with triples-query-memgraph + ~15k triples in a collection
docker exec deploy-control-1 tg-invoke-graph-rag \
  -u http://api-gateway:8088 \
  -f default \
  -C corp-notion-so \
  -q "Who works at Notion?"
# Expect: 30-60s wall, sparse-substrate answer
# Direct Memgraph for comparison:
docker exec deploy-memgraph-1 mgconsole --port 7687 <<'CYPHER'
MATCH (anchor:Literal) WHERE anchor.collection = 'corp-notion-so'
  AND toLower(anchor.value) CONTAINS 'notion'
WITH anchor LIMIT 10
MATCH (s)-[r:Rel]->(anchor) WHERE r.collection = 'corp-notion-so'
RETURN anchor.value, s LIMIT 20;
CYPHER
# Expect: ~150ms wall, real data

Environment

  • TG trustgraph/trustgraph-flow:2.3.21
  • Compose: Cassandra + Qdrant + Memgraph + Pulsar (standard)
  • triples-query service: triples-query-memgraph --pulsar-host pulsar://pulsar:6650 -g bolt://memgraph:7687
  • graph_rag processor concurrency: 4 (bumped from default 1; doesn't help the inner loop)

Happy to PR Option A if there's interest — would touch trustgraph/retrieval/graph_rag/graph_rag.py and trustgraph/retrieval/graph_rag/rag.py to detect the configured triples backend and instantiate a direct Bolt client for the traversal calls.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions