graph-rag latency dominated by sequential Pulsar round-trips inside traversal loop (~50s on small substrate)

## Problem

`tg-invoke-graph-rag` takes **30–60 seconds** on a small substrate (~14k triples, single `corp-notion-so` collection) even though every individual layer is fast:

| Layer | Latency (warm) | Source |
|---|---|---|
| Embeddings call | 33ms | `tg-invoke-embeddings` |
| Graph-embeddings query | 22ms | `tg-invoke-graph-embeddings` |
| Single triples-query round-trip | 18ms | `tg-invoke-triples-query` |
| Direct Memgraph Cypher (same data, same question shape) | 50–200ms | Bolt protocol |
| **Full `tg-invoke-graph-rag`** | **30–60 seconds** | observed wall |

So the bottleneck isn't the triple store (already on Memgraph in our deployment via `triples-query-memgraph`) and isn't any single component. It's the **traversal loop inside `trustgraph.retrieval.graph_rag.Processor`** doing **100–200 sequential Pulsar-mediated triples-query calls per graph-rag invocation**.

Evidence from a single graph-rag call against the substrate:

```
exploration:
  Seed entities: 23
  Edges explored: 136
producer stats (rag service → tg/request/triples):
  totalMsgsSent_ = 191
```

Each round-trip pays:
- rag service publishes to `tg/request/triples:<flow>`
- triples-query consumer (`triples-query-memgraph`) reads (Pulsar poll cycle ~50–300ms)
- triples-query → Memgraph Bolt (~5–20ms)
- triples-query publishes to `tg/response/triples:<flow>`
- rag service consumer reads response (Pulsar poll cycle ~50–300ms)
- **Net: ~100–700ms per round-trip**

`191 round-trips × ~200ms avg = 38s` traversal + LLM synthesis = ~50s observed.

## Why this matters

- Workbench's chat UI exposes `Graph RAG` and `Agent` as verbs. Agent's ReAct loop internally calls graph-rag (confirmed in rag-1 logs: `ReactPattern iteration 1 → graph-rag.Handling input`), so the Agent verb inherits graph-rag's latency and **times out at the gateway's 30s socket budget**.
- The only working chat verbs in current Workbench are `Basic LLM` (no substrate, no provenance) and `Document RAG` (vector-only retrieval) — neither uses the graph the platform spent effort building.
- Direct Memgraph Bolt queries against the same data return in **166ms warm** for the same question shape, demonstrating the substrate is fast — it's the access pattern that's slow.

## Proposed fix

Two options (not mutually exclusive):

### Option A — in-process triples-query inside graph-rag

`graph_rag.Processor` currently uses `TriplesClientSpec` which routes every traversal call through Pulsar. When `--triples-store` is `memgraph` (or `neo4j`/`falkordb`), graph-rag could open a direct Bolt connection alongside the Pulsar client and use it for the traversal loop. The Pulsar path stays as the fallback / for inter-flow access.

This eliminates ~200 sequential Pulsar round-trips per call. Expected latency: ~1–3s warm (one direct Bolt query expanding from N seed entities + 3–10s LLM synthesis).

### Option B — batched triples-query API

Add a `TriplesQueryBatch` schema/service that accepts an array of triple patterns and returns an array of result sets in one Pulsar round-trip. graph-rag's traversal loop changes from `for entity in seeds: triples_query(entity)` to `triples_query_batch([entities])`.

Expected reduction: 100 round-trips → 5–10 round-trips. Latency: ~5–10s warm.

### Option C — tighter graph-rag defaults (workaround, not a fix)

Default `entity_limit` (50) and `max_path_length` (2) generate too many edges on substrates of any nontrivial size. Cutting to `entity_limit=10`, `max_path_length=1` reduces round-trips ~10× — but trades coverage for latency. This is a config knob, not an architecture fix.

## Acceptance criteria

For Option A or B:
- `tg-invoke-graph-rag` returns substrate-cited answer within **5 seconds wall** on a 15k-triple substrate, warm path
- Memgraph backend handles the inner traversal; no Pulsar round-trip per edge expansion
- Existing graph-rag external API unchanged — backwards compatible
- `urn:graph:retrieval` traces still written

## Reproduction

```bash
# Set up: any TG deployment with triples-query-memgraph + ~15k triples in a collection
docker exec deploy-control-1 tg-invoke-graph-rag \
  -u http://api-gateway:8088 \
  -f default \
  -C corp-notion-so \
  -q "Who works at Notion?"
# Expect: 30-60s wall, sparse-substrate answer
# Direct Memgraph for comparison:
docker exec deploy-memgraph-1 mgconsole --port 7687 <<'CYPHER'
MATCH (anchor:Literal) WHERE anchor.collection = 'corp-notion-so'
  AND toLower(anchor.value) CONTAINS 'notion'
WITH anchor LIMIT 10
MATCH (s)-[r:Rel]->(anchor) WHERE r.collection = 'corp-notion-so'
RETURN anchor.value, s LIMIT 20;
CYPHER
# Expect: ~150ms wall, real data
```

## Environment

- TG `trustgraph/trustgraph-flow:2.3.21`
- Compose: Cassandra + Qdrant + Memgraph + Pulsar (standard)
- triples-query service: `triples-query-memgraph --pulsar-host pulsar://pulsar:6650 -g bolt://memgraph:7687`
- graph_rag processor concurrency: 4 (bumped from default 1; doesn't help the inner loop)

Happy to PR Option A if there's interest — would touch `trustgraph/retrieval/graph_rag/graph_rag.py` and `trustgraph/retrieval/graph_rag/rag.py` to detect the configured triples backend and instantiate a direct Bolt client for the traversal calls.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

graph-rag latency dominated by sequential Pulsar round-trips inside traversal loop (~50s on small substrate) #922

Problem

Why this matters

Proposed fix

Option A — in-process triples-query inside graph-rag

Option B — batched triples-query API

Option C — tighter graph-rag defaults (workaround, not a fix)

Acceptance criteria

Reproduction

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Layer	Latency (warm)	Source
Embeddings call	33ms	`tg-invoke-embeddings`
Graph-embeddings query	22ms	`tg-invoke-graph-embeddings`
Single triples-query round-trip	18ms	`tg-invoke-triples-query`
Direct Memgraph Cypher (same data, same question shape)	50–200ms	Bolt protocol
Full `tg-invoke-graph-rag`	30–60 seconds	observed wall

graph-rag latency dominated by sequential Pulsar round-trips inside traversal loop (~50s on small substrate) #922

Description

Problem

Why this matters

Proposed fix

Option A — in-process triples-query inside graph-rag

Option B — batched triples-query API

Option C — tighter graph-rag defaults (workaround, not a fix)

Acceptance criteria

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions