Skip to content

fix: cypher prefix matching in embeddings query#301

Open
portwebdesign wants to merge 2 commits intovitali87:mainfrom
portwebdesign:fix/embeddings-query-prefix
Open

fix: cypher prefix matching in embeddings query#301
portwebdesign wants to merge 2 commits intovitali87:mainfrom
portwebdesign:fix/embeddings-query-prefix

Conversation

@portwebdesign
Copy link

This change avoids string concatenation inside the Cypher query by passing the fully-formed prefix from Python.
It prevents type errors like Invalid types: bool and string for '+' when m.qualified_name contains unexpected values.

Copilot AI review requested due to automatic review settings February 7, 2026 00:31
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @portwebdesign, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a bug in the Cypher query responsible for retrieving embeddings by adjusting how project name prefixes are handled. By eliminating an internal string concatenation within the Cypher query and instead relying on the Python application to provide the complete prefix, the change significantly improves the query's stability and prevents specific type-related errors, ensuring more dependable data retrieval.

Highlights

  • Cypher Query Refinement: The CYPHER_QUERY_EMBEDDINGS in codebase_rag/constants.py has been modified to remove the + '.' concatenation from the m.qualified_name STARTS WITH $project_name clause. The full prefix is now expected to be passed directly from Python.
  • Error Prevention: This change prevents potential type errors, such as 'Invalid types: bool and string for '+'', which could arise from string concatenation within the Cypher query when m.qualified_name contains unexpected values.
Changelog
  • codebase_rag/constants.py
    • Removed the + '.' concatenation from the m.qualified_name STARTS WITH $project_name clause within the CYPHER_QUERY_EMBEDDINGS constant.
Activity
  • No specific activity (comments, reviews, or progress updates) has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 7, 2026

Greptile Overview

Greptile Summary

  • Adjusts the Cypher embeddings query to avoid string concatenation in STARTS WITH by relying on a fully-formed $project_name prefix parameter.
  • Intended to prevent Cypher type errors when m.qualified_name contains unexpected values.
  • This query is consumed by GraphUpdater._generate_semantic_embeddings() to fetch functions/methods for embedding generation and storage.

Confidence Score: 2/5

  • Not safe to merge as-is due to a likely behavioral mismatch that can stop embeddings generation.
  • The Cypher query change alters how the module prefix is matched, but the only known caller still appends a '.' to the parameter; this inconsistent contract can cause the query to return no rows in normal operation. Aside from that, the change is small and localized.
  • codebase_rag/constants.py (and verify call site in codebase_rag/graph_updater.py uses the intended prefix format)

Important Files Changed

Filename Overview
codebase_rag/constants.py Updates CYPHER_QUERY_EMBEDDINGS to use $project_name directly for STARTS WITH, but the only caller still passes project_name + '.' so semantics shift and the query likely matches nothing.

Sequence Diagram

sequenceDiagram
  participant GU as GraphUpdater
  participant CS as constants.py
  participant DB as Neo4j/Ingestor

  GU->>CS: uses CYPHER_QUERY_EMBEDDINGS
  GU->>DB: fetch_all(query, {project_name: project_name + "."})
  DB-->>DB: MATCH Module->DEFINES->Function/Method
  DB-->>DB: WHERE m.qualified_name STARTS WITH $project_name
  DB-->>GU: results rows
  GU-->>GU: embed_code() + store_embedding() for each row
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines 420 to 423
CYPHER_QUERY_EMBEDDINGS = """
MATCH (m:Module)-[:DEFINES]->(n)
WHERE (n:Function OR n:Method)
AND m.qualified_name STARTS WITH $project_name + '.'
AND m.qualified_name STARTS WITH $project_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefix semantics mismatch

CYPHER_QUERY_EMBEDDINGS now uses m.qualified_name STARTS WITH $project_name, but graph_updater.py still calls this query with {"project_name": self.project_name + "."} (see codebase_rag/graph_updater.py:371-373). That changes the meaning from "module starts with <project>." to "module starts with <project>. (including dot)"; if qualified_name is stored without a trailing dot separator (common), this will return zero rows and embeddings will never generate. Either update the caller to pass the exact prefix you want (with/without dot) consistently, or restore the query-side concatenation in a type-safe way.

Prompt To Fix With AI
This is a comment left during a code review.
Path: codebase_rag/constants.py
Line: 420:423

Comment:
**Prefix semantics mismatch**

`CYPHER_QUERY_EMBEDDINGS` now uses `m.qualified_name STARTS WITH $project_name`, but `graph_updater.py` still calls this query with `{"project_name": self.project_name + "."}` (see `codebase_rag/graph_updater.py:371-373`). That changes the meaning from "module starts with `<project>.`" to "module starts with `<project>.` (including dot)"; if `qualified_name` is stored without a trailing dot separator (common), this will return zero rows and embeddings will never generate. Either update the caller to pass the exact prefix you want (with/without dot) consistently, or restore the query-side concatenation in a type-safe way.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix a type error in a Cypher query by removing string concatenation. While the intention is good, the change as-is could lead to incorrect query results by matching unintended projects. I've suggested a more robust query logic that handles edge cases correctly and prevents ambiguity. This proposed change relies on fixing the root cause of the type error in the calling Python code, which is the recommended approach.

MATCH (m:Module)-[:DEFINES]->(n)
WHERE (n:Function OR n:Method)
AND m.qualified_name STARTS WITH $project_name + '.'
AND m.qualified_name STARTS WITH $project_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This change avoids the Cypher type error, but it can cause incorrect matches. For example, a search for project foo would also match foobar.

A more robust query should match either the exact project name (for root modules) or a dotted prefix. This prevents ambiguity and correctly includes root modules.

The original type error suggests $project_name is not always a string. This should be fixed in the calling Python code. Assuming that fix, the query can be improved as follows:

Suggested change
AND m.qualified_name STARTS WITH $project_name
AND (m.qualified_name = $project_name OR m.qualified_name STARTS WITH $project_name + '.')

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the embeddings Cypher query to avoid string concatenation inside STARTS WITH, preventing Cypher type/precedence errors and relying on Python to pass the complete module-qualified-name prefix.

Changes:

  • Remove Cypher-side concatenation ($project_name + '.') from CYPHER_QUERY_EMBEDDINGS.
  • Make STARTS WITH compare directly against the provided parameter value.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

MATCH (m:Module)-[:DEFINES]->(n)
WHERE (n:Function OR n:Method)
AND m.qualified_name STARTS WITH $project_name + '.'
AND m.qualified_name STARTS WITH $project_name
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Cypher parameter is now expected to be a fully-formed qualified-name prefix (e.g. "<project>."), but it’s still named $project_name. Renaming the parameter to something like $project_prefix/$module_qn_prefix (and updating the caller) would make the query contract clearer and reduce the risk of future callers passing the raw project name and accidentally matching similarly-prefixed projects.

Suggested change
AND m.qualified_name STARTS WITH $project_name
AND m.qualified_name STARTS WITH $project_prefix

Copilot uses AI. Check for mistakes.
@portwebdesign portwebdesign force-pushed the fix/embeddings-query-prefix branch from 0281c40 to 0ba3b65 Compare February 7, 2026 00:51
@portwebdesign portwebdesign force-pushed the fix/embeddings-query-prefix branch from 9b88a0b to 8370545 Compare February 8, 2026 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant