Problem
tg-invoke-graph-rag takes 30–60 seconds on a small substrate (~14k triples, single corp-notion-so collection) even though every individual layer is fast:
| Layer |
Latency (warm) |
Source |
| Embeddings call |
33ms |
tg-invoke-embeddings |
| Graph-embeddings query |
22ms |
tg-invoke-graph-embeddings |
| Single triples-query round-trip |
18ms |
tg-invoke-triples-query |
| Direct Memgraph Cypher (same data, same question shape) |
50–200ms |
Bolt protocol |
Full tg-invoke-graph-rag |
30–60 seconds |
observed wall |
So the bottleneck isn't the triple store (already on Memgraph in our deployment via triples-query-memgraph) and isn't any single component. It's the traversal loop inside trustgraph.retrieval.graph_rag.Processor doing 100–200 sequential Pulsar-mediated triples-query calls per graph-rag invocation.
Evidence from a single graph-rag call against the substrate:
exploration:
Seed entities: 23
Edges explored: 136
producer stats (rag service → tg/request/triples):
totalMsgsSent_ = 191
Each round-trip pays:
- rag service publishes to
tg/request/triples:<flow>
- triples-query consumer (
triples-query-memgraph) reads (Pulsar poll cycle ~50–300ms)
- triples-query → Memgraph Bolt (~5–20ms)
- triples-query publishes to
tg/response/triples:<flow>
- rag service consumer reads response (Pulsar poll cycle ~50–300ms)
- Net: ~100–700ms per round-trip
191 round-trips × ~200ms avg = 38s traversal + LLM synthesis = ~50s observed.
Why this matters
- Workbench's chat UI exposes
Graph RAG and Agent as verbs. Agent's ReAct loop internally calls graph-rag (confirmed in rag-1 logs: ReactPattern iteration 1 → graph-rag.Handling input), so the Agent verb inherits graph-rag's latency and times out at the gateway's 30s socket budget.
- The only working chat verbs in current Workbench are
Basic LLM (no substrate, no provenance) and Document RAG (vector-only retrieval) — neither uses the graph the platform spent effort building.
- Direct Memgraph Bolt queries against the same data return in 166ms warm for the same question shape, demonstrating the substrate is fast — it's the access pattern that's slow.
Proposed fix
Two options (not mutually exclusive):
Option A — in-process triples-query inside graph-rag
graph_rag.Processor currently uses TriplesClientSpec which routes every traversal call through Pulsar. When --triples-store is memgraph (or neo4j/falkordb), graph-rag could open a direct Bolt connection alongside the Pulsar client and use it for the traversal loop. The Pulsar path stays as the fallback / for inter-flow access.
This eliminates ~200 sequential Pulsar round-trips per call. Expected latency: ~1–3s warm (one direct Bolt query expanding from N seed entities + 3–10s LLM synthesis).
Option B — batched triples-query API
Add a TriplesQueryBatch schema/service that accepts an array of triple patterns and returns an array of result sets in one Pulsar round-trip. graph-rag's traversal loop changes from for entity in seeds: triples_query(entity) to triples_query_batch([entities]).
Expected reduction: 100 round-trips → 5–10 round-trips. Latency: ~5–10s warm.
Option C — tighter graph-rag defaults (workaround, not a fix)
Default entity_limit (50) and max_path_length (2) generate too many edges on substrates of any nontrivial size. Cutting to entity_limit=10, max_path_length=1 reduces round-trips ~10× — but trades coverage for latency. This is a config knob, not an architecture fix.
Acceptance criteria
For Option A or B:
tg-invoke-graph-rag returns substrate-cited answer within 5 seconds wall on a 15k-triple substrate, warm path
- Memgraph backend handles the inner traversal; no Pulsar round-trip per edge expansion
- Existing graph-rag external API unchanged — backwards compatible
urn:graph:retrieval traces still written
Reproduction
# Set up: any TG deployment with triples-query-memgraph + ~15k triples in a collection
docker exec deploy-control-1 tg-invoke-graph-rag \
-u http://api-gateway:8088 \
-f default \
-C corp-notion-so \
-q "Who works at Notion?"
# Expect: 30-60s wall, sparse-substrate answer
# Direct Memgraph for comparison:
docker exec deploy-memgraph-1 mgconsole --port 7687 <<'CYPHER'
MATCH (anchor:Literal) WHERE anchor.collection = 'corp-notion-so'
AND toLower(anchor.value) CONTAINS 'notion'
WITH anchor LIMIT 10
MATCH (s)-[r:Rel]->(anchor) WHERE r.collection = 'corp-notion-so'
RETURN anchor.value, s LIMIT 20;
CYPHER
# Expect: ~150ms wall, real data
Environment
- TG
trustgraph/trustgraph-flow:2.3.21
- Compose: Cassandra + Qdrant + Memgraph + Pulsar (standard)
- triples-query service:
triples-query-memgraph --pulsar-host pulsar://pulsar:6650 -g bolt://memgraph:7687
- graph_rag processor concurrency: 4 (bumped from default 1; doesn't help the inner loop)
Happy to PR Option A if there's interest — would touch trustgraph/retrieval/graph_rag/graph_rag.py and trustgraph/retrieval/graph_rag/rag.py to detect the configured triples backend and instantiate a direct Bolt client for the traversal calls.
Problem
tg-invoke-graph-ragtakes 30–60 seconds on a small substrate (~14k triples, singlecorp-notion-socollection) even though every individual layer is fast:tg-invoke-embeddingstg-invoke-graph-embeddingstg-invoke-triples-querytg-invoke-graph-ragSo the bottleneck isn't the triple store (already on Memgraph in our deployment via
triples-query-memgraph) and isn't any single component. It's the traversal loop insidetrustgraph.retrieval.graph_rag.Processordoing 100–200 sequential Pulsar-mediated triples-query calls per graph-rag invocation.Evidence from a single graph-rag call against the substrate:
Each round-trip pays:
tg/request/triples:<flow>triples-query-memgraph) reads (Pulsar poll cycle ~50–300ms)tg/response/triples:<flow>191 round-trips × ~200ms avg = 38straversal + LLM synthesis = ~50s observed.Why this matters
Graph RAGandAgentas verbs. Agent's ReAct loop internally calls graph-rag (confirmed in rag-1 logs:ReactPattern iteration 1 → graph-rag.Handling input), so the Agent verb inherits graph-rag's latency and times out at the gateway's 30s socket budget.Basic LLM(no substrate, no provenance) andDocument RAG(vector-only retrieval) — neither uses the graph the platform spent effort building.Proposed fix
Two options (not mutually exclusive):
Option A — in-process triples-query inside graph-rag
graph_rag.Processorcurrently usesTriplesClientSpecwhich routes every traversal call through Pulsar. When--triples-storeismemgraph(orneo4j/falkordb), graph-rag could open a direct Bolt connection alongside the Pulsar client and use it for the traversal loop. The Pulsar path stays as the fallback / for inter-flow access.This eliminates ~200 sequential Pulsar round-trips per call. Expected latency: ~1–3s warm (one direct Bolt query expanding from N seed entities + 3–10s LLM synthesis).
Option B — batched triples-query API
Add a
TriplesQueryBatchschema/service that accepts an array of triple patterns and returns an array of result sets in one Pulsar round-trip. graph-rag's traversal loop changes fromfor entity in seeds: triples_query(entity)totriples_query_batch([entities]).Expected reduction: 100 round-trips → 5–10 round-trips. Latency: ~5–10s warm.
Option C — tighter graph-rag defaults (workaround, not a fix)
Default
entity_limit(50) andmax_path_length(2) generate too many edges on substrates of any nontrivial size. Cutting toentity_limit=10,max_path_length=1reduces round-trips ~10× — but trades coverage for latency. This is a config knob, not an architecture fix.Acceptance criteria
For Option A or B:
tg-invoke-graph-ragreturns substrate-cited answer within 5 seconds wall on a 15k-triple substrate, warm pathurn:graph:retrievaltraces still writtenReproduction
Environment
trustgraph/trustgraph-flow:2.3.21triples-query-memgraph --pulsar-host pulsar://pulsar:6650 -g bolt://memgraph:7687Happy to PR Option A if there's interest — would touch
trustgraph/retrieval/graph_rag/graph_rag.pyandtrustgraph/retrieval/graph_rag/rag.pyto detect the configured triples backend and instantiate a direct Bolt client for the traversal calls.