Skip to content

AbstractOperators for agentic patterns#1784

Open
mahikaw wants to merge 13 commits intomainfrom
mahikaw/subquery_abstract_operator
Open

AbstractOperators for agentic patterns#1784
mahikaw wants to merge 13 commits intomainfrom
mahikaw/subquery_abstract_operator

Conversation

@mahikaw
Copy link
Copy Markdown
Collaborator

@mahikaw mahikaw commented Apr 2, 2026

Description

This PR introduces two new agentic retrieval operators built on top of AbstractOperator,

  • SubQueryGeneratorOperator — expands each query row into multiple sub-query rows via an LLM. Supports three built-in strategies: decompose (split into sub-aspects), hyde (generate hypothetical answer passages),
    and multi_perspective (rephrase from diverse angles). Downstream retrieval operators can then independently retrieve per sub-query, enabling RRF-style aggregation.
  • SelectionAgentOperator — re-ranks a set of retrieved candidate documents using an agentic LLM loop. The LLM is given the query and candidate docs, then calls a log_selected_documents tool to produce a final
    ranked list. Supports a think tool for extended reasoning.
  • Adds openai>=1.0 as an optional [llm] dependency in pyproject.toml (lazily imported so workers stay serializable).

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 2, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@mahikaw mahikaw force-pushed the mahikaw/subquery_abstract_operator branch 2 times, most recently from 68db72d to e8d2518 Compare April 3, 2026 18:30
@mahikaw mahikaw changed the title [WIP] AbstractOperator for subquery decomposition [WIP] AbstractOperators for agentic patterns Apr 3, 2026
@mahikaw mahikaw force-pushed the mahikaw/subquery_abstract_operator branch from b7f7371 to 6bec8c0 Compare April 9, 2026 23:29
@mahikaw mahikaw marked this pull request as ready for review April 9, 2026 23:30
@mahikaw mahikaw requested review from a team as code owners April 9, 2026 23:30
@mahikaw mahikaw requested a review from jdye64 April 9, 2026 23:30
@mahikaw mahikaw changed the title [WIP] AbstractOperators for agentic patterns AbstractOperators for agentic patterns Apr 9, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 9, 2026

Greptile Summary

This PR introduces three new agentic retrieval operators (SubQueryGeneratorOperator, ReActAgentOperator, SelectionAgentOperator) and a pure-pandas RRFAggregatorOperator, along with the invoke_chat_completion_step helper in chat_completions.py. The operators are well-documented with complete schemas and examples, but there is a blocking P0 defect: invoke_chat_completion_step calls _parse_invoke_urls and _post_with_retries which live in nim.py but are never imported in chat_completions.py, causing an immediate NameError on any real invocation.

Confidence Score: 3/5

Not safe to merge — the P0 NameError in invoke_chat_completion_step will break both SelectionAgentOperator and ReActAgentOperator in production despite passing all mocked tests.

A P0 NameError (_parse_invoke_urls, _post_with_retries not imported in chat_completions.py) makes the two primary new operators non-functional at runtime. Tests pass only because they mock invoke_chat_completion_step before it executes. Additionally, several P1 issues remain from prior rounds (unguarded LLM calls, broad except without exc_info, bare KeyError on env vars) that reduce overall reliability.

nemo_retriever/src/nemo_retriever/nim/chat_completions.py requires an immediate fix before this can merge; react_agent_operator.py needs exc_info=True on broad catches.

Important Files Changed

Filename Overview
nemo_retriever/src/nemo_retriever/nim/chat_completions.py Adds invoke_chat_completion_step — a P0 NameError exists because it calls _parse_invoke_urls and _post_with_retries which are defined in nim.py but never imported here; all real call-sites will fail at runtime.
nemo_retriever/src/nemo_retriever/graph/react_agent_operator.py New ReActAgentOperator with threaded ReAct loop; broad except Exception blocks at lines 447, 524, 616 swallow stack traces (missing exc_info=True) and bare KeyError on env-var lookup at line 636.
nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py New SelectionAgentOperator with agentic LLM loop; no logger defined, no error handling around the invoke_chat_completion_step call, bare KeyError on env-var resolution, and no guard on empty choices.
nemo_retriever/src/nemo_retriever/graph/subquery_operator.py New SubQueryGeneratorOperator; no logger, no error handling around _generate_one, bare KeyError on env-var resolution, and trailing fence stripping in _parse_json_list may corrupt valid JSON.
nemo_retriever/src/nemo_retriever/graph/rrf_aggregator_operator.py New RRFAggregatorOperator; clean pure-pandas implementation of Reciprocal Rank Fusion with correct formula, good schema validation, and no external dependencies.
nemo_retriever/tests/test_agentic_operators.py Tests cover RRFAggregatorOperator, SelectionAgentOperator, ReActAgentOperator, and prompt renderers; SubQueryGeneratorOperator is entirely absent from coverage.

Sequence Diagram

sequenceDiagram
    participant User
    participant SubQuery as SubQueryGeneratorOperator
    participant LLM1 as LLM (subquery)
    participant Retriever
    participant ReAct as ReActAgentOperator
    participant LLM2 as LLM (react)
    participant RRF as RRFAggregatorOperator
    participant Selection as SelectionAgentOperator
    participant LLM3 as LLM (selection)

    User->>SubQuery: query_df (query_id, query_text)
    SubQuery->>LLM1: invoke_chat_completions (decompose/hyde/multi_perspective)
    LLM1-->>SubQuery: JSON array of sub-queries
    SubQuery-->>ReAct: expanded df (query_id, subquery_text xN)

    loop per query row
        ReAct->>Retriever: retriever_fn(query, top_k)
        Retriever-->>ReAct: [{doc_id, text, score}]
        ReAct->>LLM2: invoke_chat_completion_step (think/retrieve/final_results tools)
        loop ReAct steps
            LLM2-->>ReAct: tool_call(retrieve or think or final_results)
            alt retrieve
                ReAct->>Retriever: retriever_fn(subquery, top_k)
                Retriever-->>ReAct: new docs
            else think
                Note over ReAct: log thought, continue
            else final_results
                Note over ReAct: break loop
            end
        end
        ReAct-->>RRF: rows (query_id, step_idx, doc_id, rank)
    end

    RRF-->>Selection: fused df (query_id, query_text, doc_id, rrf_score, text)

    loop per query_id group
        Selection->>LLM3: invoke_chat_completion_step (think/log_selected_documents tools)
        loop selection steps
            LLM3-->>Selection: tool_call(think or log_selected_documents)
            alt log_selected_documents
                Note over Selection: validate IDs, break loop
            end
        end
    end

    Selection-->>User: result df (query_id, doc_id, rank, message)
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/nim/chat_completions.py
Line: 124-141

Comment:
**`NameError` at runtime: `_parse_invoke_urls` and `_post_with_retries` are not imported**

`invoke_chat_completion_step` calls `_parse_invoke_urls` (line 124) and `_post_with_retries` (line 141), but both of these private helpers are defined in `nemo_retriever/nim/nim.py` and are never imported in `chat_completions.py`. Any real call to this new function will immediately raise `NameError: name '_parse_invoke_urls' is not defined`. The existing tests mask this because they mock `invoke_chat_completion_step` before it executes.

```python
from nemo_retriever.nim.nim import _parse_invoke_urls, _post_with_retries
```

Add the import at the top of the function (or module-level), or factor the helper call through `NIMClient`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/graph/subquery_operator.py
Line: 249-250

Comment:
**`process()` has no logger — silent failure path has no observability**

`SelectionAgentOperator` and `SubQueryGeneratorOperator` both lack a module-level `logger`. When the already-flagged error-handling gaps are addressed (wrapping `_generate_one` / `_select_documents` in `try/except`), there will be no way to emit warnings or structured diagnostics without a logger. `ReActAgentOperator` sets the correct pattern at its top:

```python
import logging
logger = logging.getLogger(__name__)
```

Add the same two lines near the top of both `subquery_operator.py` and `selection_agent_operator.py`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/graph/react_agent_operator.py
Line: 444-449

Comment:
**Broad `except Exception` swallows stack trace — `exc_info=True` missing**

Per the `no-bare-except` rule, `except Exception` at error-handler boundaries is only acceptable when the full traceback is captured via `exc_info=True`. The catch at line 447 logs the message string but discards the stack trace, making it impossible to distinguish a transient timeout from a bug in `retriever_fn` or the LLM adapter. The same gap exists at line 524.

```suggestion
                    except Exception as exc:
                        qid, qtxt = futures[future]
                        logger.warning("ReActAgentOperator: query %r failed: %s", qid, exc, exc_info=True)
```

Apply `exc_info=True` at lines 524 and 616 as well.

**Rule Used:** Never use bare 'except:' that silently swallows er... ([source](https://app.greptile.com/review/custom-context?memory=no-bare-except))

How can I resolve this? If you propose a fix, please make it concise.

Reviews (9): Last reviewed commit: "Adding parallel tool call params and oth..." | Re-trigger Greptile

Comment on lines +218 to +220
if api_key is not None and api_key.strip().startswith("os.environ/"):
var = api_key.strip().removeprefix("os.environ/")
api_key = os.environ[var]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Cryptic KeyError on missing env variable

os.environ[var] raises a bare KeyError (no message about what to do) when the referenced environment variable is not set. Per the actionable-errors rule, replace with os.environ.get(var) and raise a ValueError with a clear explanation and remediation step when the value is None. The identical issue is present in subquery_operator.py at line 272.

Rule Used: Error messages must tell the user what happened AN... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py
Line: 218-220

Comment:
**Cryptic `KeyError` on missing env variable**

`os.environ[var]` raises a bare `KeyError` (no message about what to do) when the referenced environment variable is not set. Per the actionable-errors rule, replace with `os.environ.get(var)` and raise a `ValueError` with a clear explanation and remediation step when the value is `None`. The identical issue is present in `subquery_operator.py` at line 272.

**Rule Used:** Error messages must tell the user what happened AN... ([source](https://app.greptile.com/review/custom-context?memory=actionable-errors))

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +206 to +225
def _ensure_client(self) -> None:
"""Lazily create the OpenAI client (once per instance)."""
if self._client is not None:
return
try:
from openai import OpenAI
except ImportError as exc:
raise ImportError(
"SelectionAgentOperator requires 'openai'. " "Install it with: pip install 'openai>=1.0'"
) from exc

api_key = self._api_key
if api_key is not None and api_key.strip().startswith("os.environ/"):
var = api_key.strip().removeprefix("os.environ/")
api_key = os.environ[var]

self._client = OpenAI(
api_key=api_key,
**({"base_url": self._base_url} if self._base_url is not None else {}),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Lazy client init is not thread-safe for concurrent Ray actors

_ensure_client() uses a check-then-act pattern (if self._client is not None: return) with no lock. If a Ray actor receives concurrent requests, multiple threads can simultaneously pass the null check and each create a separate OpenAI client. Per the thread-async-safety rule, protect with a threading.Lock or explicitly document as single-threaded only. The same pattern applies to SubQueryGeneratorOperator._ensure_client().

Rule Used: Models and stateful components accessed from Ray a... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py
Line: 206-225

Comment:
**Lazy client init is not thread-safe for concurrent Ray actors**

`_ensure_client()` uses a check-then-act pattern (`if self._client is not None: return`) with no lock. If a Ray actor receives concurrent requests, multiple threads can simultaneously pass the null check and each create a separate `OpenAI` client. Per the thread-async-safety rule, protect with a `threading.Lock` or explicitly document as single-threaded only. The same pattern applies to `SubQueryGeneratorOperator._ensure_client()`.

**Rule Used:** Models and stateful components accessed from Ray a... ([source](https://app.greptile.com/review/custom-context?memory=thread-async-safety))

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +338 to +342
for _step in range(self._max_steps):
response = self._client.chat.completions.create(messages=messages, **call_kwargs)
choice = response.choices[0]
msg = choice.message

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No guard against empty choices list

response.choices[0] will raise an IndexError if the API returns an empty choices list (e.g., during content filtering or a partial network error). Consider checking if not response.choices: break before indexing. _generate_one() in subquery_operator.py (line 296) has the same issue.

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py
Line: 338-342

Comment:
**No guard against empty `choices` list**

`response.choices[0]` will raise an `IndexError` if the API returns an empty choices list (e.g., during content filtering or a partial network error). Consider checking `if not response.choices: break` before indexing. `_generate_one()` in `subquery_operator.py` (line 296) has the same issue.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be something to check.

Comment on lines +399 to +406
doc_ids: List[str] = fn_args.get("doc_ids", [])
# Filter to only valid candidates
doc_ids = [d for d in doc_ids if d in valid_id_set][:feasible_k]
end_kwargs = {
"doc_ids": doc_ids,
"message": fn_args.get("message", ""),
}
should_end = True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Silent empty result when LLM returns invalid doc IDs

After filtering with valid_id_set, doc_ids can be empty if the model hallucinated IDs that don't match any candidate. process() then silently produces zero output rows for that query, indistinguishable from a legitimately empty result. Consider logging a warning so the caller can detect the hallucination.

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py
Line: 399-406

Comment:
**Silent empty result when LLM returns invalid doc IDs**

After filtering with `valid_id_set`, `doc_ids` can be empty if the model hallucinated IDs that don't match any candidate. `process()` then silently produces zero output rows for that query, indistinguishable from a legitimately empty result. Consider logging a warning so the caller can detect the hallucination.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +305 to +327
def _parse_json_list(raw: str, *, fallback: str) -> List[str]:
"""Parse a JSON array from *raw*, stripping markdown fences if present.

Returns *[fallback]* when parsing fails so downstream stages always
receive at least one sub-query row.
"""
text = raw
for fence in ("```json", "```"):
if text.startswith(fence):
text = text[len(fence) :]
break
if text.endswith("```"):
text = text[:-3]
text = text.strip()

try:
parsed = json.loads(text)
if isinstance(parsed, list) and parsed and all(isinstance(s, str) for s in parsed):
return parsed
except json.JSONDecodeError:
pass

return [fallback]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 _parse_json_list only strips leading markdown fence

The fence-stripping logic uses startswith for the leading fence but an unconditional endswith for the trailing fence. If the model returns a trailing fence without a matching leading fence, the trailing strip could corrupt valid JSON. Anchor the trailing strip to only apply when a leading fence was matched.

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/graph/subquery_operator.py
Line: 305-327

Comment:
**`_parse_json_list` only strips leading markdown fence**

The fence-stripping logic uses `startswith` for the leading fence but an unconditional `endswith` for the trailing fence. If the model returns a trailing fence without a matching leading fence, the trailing strip could corrupt valid JSON. Anchor the trailing strip to only apply when a leading fence was matched.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1 to +3
# SPDX-FileCopyrightText: Copyright (c) 2024-25, NVIDIA CORPORATION & AFFILIATES.
# All rights reserved.
# SPDX-License-Identifier: Apache-2.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No corresponding tests for new operators

Both selection_agent_operator.py and subquery_operator.py add meaningful business logic (agentic LLM loops, JSON parsing, env-var resolution) but the PR includes no test files. Per the test-coverage-new-code and test-mirrors-source-structure rules, at minimum cover: _parse_json_list fence variants and malformed JSON, preprocess column validation, env-var key resolution, and the max-steps fallback path.

Rule Used: New functionality must include corresponding unit ... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py
Line: 1-3

Comment:
**No corresponding tests for new operators**

Both `selection_agent_operator.py` and `subquery_operator.py` add meaningful business logic (agentic LLM loops, JSON parsing, env-var resolution) but the PR includes no test files. Per the test-coverage-new-code and test-mirrors-source-structure rules, at minimum cover: `_parse_json_list` fence variants and malformed JSON, `preprocess` column validation, env-var key resolution, and the max-steps fallback path.

**Rule Used:** New functionality must include corresponding unit ... ([source](https://app.greptile.com/review/custom-context?memory=test-coverage-new-code))

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Collaborator

@jperez999 jperez999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great first run. Lets add some unit tests for expected data the might come in (and error edge cases) for each of the operators.

Comment thread nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py
Comment thread nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py Outdated
Comment on lines +338 to +342
for _step in range(self._max_steps):
response = self._client.chat.completions.create(messages=messages, **call_kwargs)
choice = response.choices[0]
msg = choice.message

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be something to check.

@mahikaw
Copy link
Copy Markdown
Collaborator Author

mahikaw commented Apr 15, 2026

@greptileai this pr should aim to use NIM instead of openAI APIs. with that updated understanding, re-review it.

Comment on lines +624 to +628
def _resolve_api_key(self) -> Optional[str]:
api_key = self._api_key
if api_key is not None and api_key.strip().startswith("os.environ/"):
var = api_key.strip().removeprefix("os.environ/")
return os.environ[var]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Bare KeyError on missing env variable

os.environ[var] raises a bare KeyError with no message about which variable is unset or how to fix it. Per the actionable-errors rule, use os.environ.get(var) and raise an explicit ValueError when the value is absent, e.g.:

value = os.environ.get(var)
if value is None:
    raise ValueError(
        f"ReActAgentOperator: environment variable '{var}' is not set. "
        f"Export it before running the pipeline."
    )
return value

The identical pattern appears in selection_agent_operator.py and subquery_operator.py (already flagged in prior review threads); react_agent_operator.py was not covered there.

Rule Used: Error messages must tell the user what happened AN... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/graph/react_agent_operator.py
Line: 624-628

Comment:
**Bare `KeyError` on missing env variable**

`os.environ[var]` raises a bare `KeyError` with no message about which variable is unset or how to fix it. Per the actionable-errors rule, use `os.environ.get(var)` and raise an explicit `ValueError` when the value is absent, e.g.:

```python
value = os.environ.get(var)
if value is None:
    raise ValueError(
        f"ReActAgentOperator: environment variable '{var}' is not set. "
        f"Export it before running the pipeline."
    )
return value
```

The identical pattern appears in `selection_agent_operator.py` and `subquery_operator.py` (already flagged in prior review threads); `react_agent_operator.py` was not covered there.

**Rule Used:** Error messages must tell the user what happened AN... ([source](https://app.greptile.com/review/custom-context?memory=actionable-errors))

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +295 to +296
response = self._client.chat.completions.create(**call_kwargs)
raw = response.choices[0].message.content.strip()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Unguarded LLM call aborts entire batch on failure

_generate_one() has no try/except around the OpenAI call. Any transient API error, rate-limit response, or network timeout propagates directly through process() and kills the entire batch — all remaining queries are dropped silently. ReActAgentOperator._run_single_query demonstrates the intended pattern (lines 509–521): wrap in try/except, log a per-query warning, and return a safe fallback. SelectionAgentOperator._select_documents (line 339) has the same gap.

Suggested change
response = self._client.chat.completions.create(**call_kwargs)
raw = response.choices[0].message.content.strip()
try:
response = self._client.chat.completions.create(**call_kwargs)
except Exception as exc:
logger.warning(
"SubQueryGeneratorOperator: LLM call failed for query %r: %s",
query,
exc,
exc_info=True,
)
return [query]
raw = response.choices[0].message.content or ""
raw = raw.strip()
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/graph/subquery_operator.py
Line: 295-296

Comment:
**Unguarded LLM call aborts entire batch on failure**

`_generate_one()` has no `try/except` around the OpenAI call. Any transient API error, rate-limit response, or network timeout propagates directly through `process()` and kills the entire batch — all remaining queries are dropped silently. `ReActAgentOperator._run_single_query` demonstrates the intended pattern (lines 509–521): wrap in `try/except`, log a per-query warning, and return a safe fallback. `SelectionAgentOperator._select_documents` (line 339) has the same gap.

```suggestion
        try:
            response = self._client.chat.completions.create(**call_kwargs)
        except Exception as exc:
            logger.warning(
                "SubQueryGeneratorOperator: LLM call failed for query %r: %s",
                query,
                exc,
                exc_info=True,
            )
            return [query]
        raw = response.choices[0].message.content or ""
        raw = raw.strip()
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +338 to +340
for _step in range(self._max_steps):
response = self._client.chat.completions.create(messages=messages, **call_kwargs)
choice = response.choices[0]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Unguarded LLM call aborts entire batch on failure

The OpenAI call inside _select_documents is not wrapped in a try/except. A transient API failure for one query group propagates through process() and aborts all remaining query groups. The fix is consistent with the pattern in ReActAgentOperator — catch, log with exc_info=True, and return a safe fallback (e.g., {"doc_ids": [], "message": "LLM error: " + str(exc)}).

Suggested change
for _step in range(self._max_steps):
response = self._client.chat.completions.create(messages=messages, **call_kwargs)
choice = response.choices[0]
for _step in range(self._max_steps):
try:
response = self._client.chat.completions.create(messages=messages, **call_kwargs)
except Exception as exc:
import logging as _log
_log.getLogger(__name__).warning(
"SelectionAgentOperator: LLM call failed on step %d: %s", _step, exc, exc_info=True
)
break
if not response.choices:
break
choice = response.choices[0]
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py
Line: 338-340

Comment:
**Unguarded LLM call aborts entire batch on failure**

The OpenAI call inside `_select_documents` is not wrapped in a `try/except`. A transient API failure for one query group propagates through `process()` and aborts all remaining query groups. The fix is consistent with the pattern in `ReActAgentOperator` — catch, log with `exc_info=True`, and return a safe fallback (e.g., `{"doc_ids": [], "message": "LLM error: " + str(exc)}`).

```suggestion
        for _step in range(self._max_steps):
            try:
                response = self._client.chat.completions.create(messages=messages, **call_kwargs)
            except Exception as exc:
                import logging as _log
                _log.getLogger(__name__).warning(
                    "SelectionAgentOperator: LLM call failed on step %d: %s", _step, exc, exc_info=True
                )
                break
            if not response.choices:
                break
            choice = response.choices[0]
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +445 to +447
except Exception as exc:
qid, qtxt = futures[future]
logger.warning("ReActAgentOperator: query %r failed: %s", qid, exc)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Broad except Exception missing exc_info=True

The no-bare-except rule permits except Exception at pipeline error-handler boundaries only when the full traceback is captured via exc_info=True. All three broad catches in this file — here, line 519, and line 607 — log the error message but silently swallow the stack trace, making it impossible to distinguish a transient timeout from a bug in retriever_fn or the LLM adapter without attaching a debugger.

Suggested change
except Exception as exc:
qid, qtxt = futures[future]
logger.warning("ReActAgentOperator: query %r failed: %s", qid, exc)
except Exception as exc:
qid, qtxt = futures[future]
logger.warning("ReActAgentOperator: query %r failed: %s", qid, exc, exc_info=True)

Apply the same fix at lines 519–521 and 607–609.

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/graph/react_agent_operator.py
Line: 445-447

Comment:
**Broad `except Exception` missing `exc_info=True`**

The `no-bare-except` rule permits `except Exception` at pipeline error-handler boundaries only when the full traceback is captured via `exc_info=True`. All three broad catches in this file — here, line 519, and line 607 — log the error message but silently swallow the stack trace, making it impossible to distinguish a transient timeout from a bug in `retriever_fn` or the LLM adapter without attaching a debugger.

```suggestion
                    except Exception as exc:
                        qid, qtxt = futures[future]
                        logger.warning("ReActAgentOperator: query %r failed: %s", qid, exc, exc_info=True)
```

Apply the same fix at lines 519–521 and 607–609.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1 to +10
# SPDX-FileCopyrightText: Copyright (c) 2024-25, NVIDIA CORPORATION & AFFILIATES.
# All rights reserved.
# SPDX-License-Identifier: Apache-2.0

"""Smoke tests for the agentic retrieval operators.

Run with:
cd nemo_retriever && uv run pytest tests/test_agentic_operators.py -v
"""

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 SubQueryGeneratorOperator has no tests in the new test file

The test file covers RRFAggregatorOperator, SelectionAgentOperator, ReActAgentOperator, and their prompt renderers, but SubQueryGeneratorOperator is completely absent. Per the test-coverage-new-code and test-mirrors-source-structure rules, its core code paths still need coverage:

  • preprocess with list[str], list[tuple], and invalid inputs
  • _build_system_prompt for each of the three built-in strategies
  • _generate_one happy path (mocking invoke_chat_completions)
  • _parse_json_list with a leading-fence response, a bare JSON response, and malformed JSON falling back to the original query

Rule Used: New functionality must include corresponding unit ... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/tests/test_agentic_operators.py
Line: 1-10

Comment:
**`SubQueryGeneratorOperator` has no tests in the new test file**

The test file covers `RRFAggregatorOperator`, `SelectionAgentOperator`, `ReActAgentOperator`, and their prompt renderers, but `SubQueryGeneratorOperator` is completely absent. Per the `test-coverage-new-code` and `test-mirrors-source-structure` rules, its core code paths still need coverage:

- `preprocess` with `list[str]`, `list[tuple]`, and invalid inputs
- `_build_system_prompt` for each of the three built-in strategies
- `_generate_one` happy path (mocking `invoke_chat_completions`)
- `_parse_json_list` with a leading-fence response, a bare JSON response, and malformed JSON falling back to the original query

**Rule Used:** New functionality must include corresponding unit ... ([source](https://app.greptile.com/review/custom-context?memory=test-coverage-new-code))

How can I resolve this? If you propose a fix, please make it concise.

@mahikaw mahikaw force-pushed the mahikaw/subquery_abstract_operator branch from 0116107 to b635f58 Compare April 21, 2026 18:29
Comment on lines +124 to +141
invoke_urls = _parse_invoke_urls(invoke_url)
endpoint_url = invoke_urls[0]

payload: Dict[str, Any] = {
"messages": messages,
"temperature": temperature,
}
if model:
payload["model"] = model
if max_tokens is not None:
payload["max_tokens"] = max_tokens
if tools:
payload["tools"] = tools
payload["tool_choice"] = tool_choice
if extra_body:
payload.update(extra_body)

return _post_with_retries(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 NameError at runtime: _parse_invoke_urls and _post_with_retries are not imported

invoke_chat_completion_step calls _parse_invoke_urls (line 124) and _post_with_retries (line 141), but both of these private helpers are defined in nemo_retriever/nim/nim.py and are never imported in chat_completions.py. Any real call to this new function will immediately raise NameError: name '_parse_invoke_urls' is not defined. The existing tests mask this because they mock invoke_chat_completion_step before it executes.

from nemo_retriever.nim.nim import _parse_invoke_urls, _post_with_retries

Add the import at the top of the function (or module-level), or factor the helper call through NIMClient.

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/nim/chat_completions.py
Line: 124-141

Comment:
**`NameError` at runtime: `_parse_invoke_urls` and `_post_with_retries` are not imported**

`invoke_chat_completion_step` calls `_parse_invoke_urls` (line 124) and `_post_with_retries` (line 141), but both of these private helpers are defined in `nemo_retriever/nim/nim.py` and are never imported in `chat_completions.py`. Any real call to this new function will immediately raise `NameError: name '_parse_invoke_urls' is not defined`. The existing tests mask this because they mock `invoke_chat_completion_step` before it executes.

```python
from nemo_retriever.nim.nim import _parse_invoke_urls, _post_with_retries
```

Add the import at the top of the function (or module-level), or factor the helper call through `NIMClient`.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants