fix: improve Cypher query generation accuracy by hupeky · Pull Request #294 · vitali87/code-graph-rag

hupeky · 2026-02-02T08:49:02Z

Summary

Improve LLM prompt to teach correct schema patterns
Add example for DEFINES_METHOD relationship
Fix response cleaner to handle markdown formatting

Problem

The LLM sometimes generates incorrect Cypher queries because it doesn't understand:

When to use name vs qualified_name properties
The correct relationship types (DEFINES_METHOD, DEFINES)
Proper Cypher patterns for this specific graph schema

Changes

cypher_queries.py: Added CYPHER_EXAMPLE_CLASS_METHODS example
prompts.py: Added VALUE PATTERN RULES explaining name matching
llm.py: Improved _clean_cypher_response() to handle markdown code blocks

Testing

Tested with codellama model - queries now generate correctly for class method lookups and other common patterns.

🤖 Generated with Claude Code

gemini-code-assist · 2026-02-02T08:49:14Z

Summary of Changes

Hello @hupeky, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing the accuracy of Cypher query generation by the Language Model (LLM). It achieves this by providing more explicit guidance within the LLM prompts regarding schema interpretation, particularly for property usage and relationship types. Additionally, it bolsters the system's ability to parse and extract valid Cypher queries from diverse LLM output formats, making the overall process more reliable.

Highlights

Improved LLM Prompting for Cypher Queries: New VALUE PATTERN RULES have been added to the LLM prompt to guide the model on correctly using name versus qualified_name properties and the DEFINES_METHOD and DEFINES relationships, addressing common misunderstandings in query generation.
Enhanced Cypher Query Examples: A new example, CYPHER_EXAMPLE_CLASS_METHODS, has been introduced to demonstrate how to query for methods of a class, further refining the LLM's understanding of the graph schema.
Robust Cypher Response Cleaning: The _clean_cypher_response function has been significantly improved to handle various markdown formats (e.g., triple backticks, bold headers) that LLMs might use when returning Cypher queries, ensuring accurate extraction of the query string.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request effectively improves the accuracy of Cypher query generation by enhancing the prompts with clearer rules and examples, and by making the response cleaning logic more robust. The changes in prompts.py and cypher_queries.py are well-implemented and align with the PR's objectives. I have a few suggestions for the _clean_cypher_response function in codebase_rag/services/llm.py to align it with project standards and further improve its robustness.

gemini-code-assist · 2026-02-02T08:51:43Z

codebase_rag/services/llm.py

+    else:
+        # Remove markdown bold/headers (e.g., **Cypher Query:**)
+        query = re.sub(r"\*\*[^*]+\*\*:?\s*", "", query)
+        # Remove single backticks
+        query = query.replace(cs.CYPHER_BACKTICK, "")
+        # Remove "cypher" prefix if present
+        if query.lower().startswith(cs.CYPHER_PREFIX):
+            query = query[len(cs.CYPHER_PREFIX):].strip()


The current logic in the else block may not correctly handle all cases with leading whitespace, and the cleaning steps could be ordered for better robustness. For example, a response like **Cypher Query:** MATCH (n) would result in a query with a leading space: MATCH (n);, which could cause execution to fail. This refactoring handles whitespace more consistently and correctly identifies and removes the cypher prefix even if it has leading spaces.

Suggested change

else:

# Remove markdown bold/headers (e.g., **Cypher Query:**)

query = re.sub(r"\*\*[^*]+\*\*:?\s*", "", query)

# Remove single backticks

query = query.replace(cs.CYPHER_BACKTICK, "")

# Remove "cypher" prefix if present

if query.lower().startswith(cs.CYPHER_PREFIX):

query = query[len(cs.CYPHER_PREFIX):].strip()

else:

# Remove markdown bold/headers (e.g., **Cypher Query:**)

query = re.sub(r"\*\*[^*]+\*\*:?\s*", "", query)

# Remove "cypher" prefix if present

if query.lower().strip().startswith(cs.CYPHER_PREFIX):

query = query[query.lower().find(cs.CYPHER_PREFIX) + len(cs.CYPHER_PREFIX):]

# Remove single backticks and strip any remaining whitespace

query = query.replace(cs.CYPHER_BACKTICK, "").strip()

gemini-code-assist · 2026-02-02T08:51:43Z

codebase_rag/services/llm.py

+    """Clean LLM response to extract pure Cypher query.
+
+    Handles markdown formatting that models sometimes output:
+    - Triple backticks (```cypher ... ```)
+    - Bold text (**Cypher Query:**)
+    - Headers and other markdown
+    """


According to the project's general rules, docstrings are not allowed. Please remove this docstring to adhere to the project's coding standards.

References

Docstrings are not allowed in this project, as enforced by a pre-commit hook.

gemini-code-assist · 2026-02-02T08:51:43Z

codebase_rag/services/llm.py

+    - Bold text (**Cypher Query:**)
+    - Headers and other markdown
+    """
+    import re


Per PEP 8, imports should be at the top of the file. Please remove this import from here and add import re to the top-level imports section of the file.

References

PEP 8: Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants. ^(link)

This PR addresses issues where the LLM generates incorrect Cypher queries due to misunderstanding the graph schema. Changes: - Add CYPHER_EXAMPLE_CLASS_METHODS to demonstrate DEFINES_METHOD pattern - Add VALUE PATTERN RULES to prompts explaining name vs qualified_name usage - Improve _clean_cypher_response() to handle markdown formatting in LLM output The prompt improvements teach the LLM to: - Use `name` property for short class/function names (not qualified_name) - Use correct relationships (DEFINES_METHOD, DEFINES) - Follow proper Cypher patterns for this schema The response cleaner now handles: - Triple backtick code blocks (```cypher ... ```) - Bold markdown headers (**Cypher Query:**) - Mixed formatting in LLM responses Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

greptile-apps · 2026-02-02T08:52:09Z

Greptile Overview

Greptile Summary

This PR improves Cypher query generation accuracy by teaching the LLM to correctly use the graph schema properties and relationships.

Key Changes:

Added CYPHER_EXAMPLE_CLASS_METHODS example demonstrating the DEFINES_METHOD relationship pattern for querying class methods
Added VALUE PATTERN RULES section to prompts explaining when to use name vs qualified_name properties (critical for short name matching)
Enhanced _clean_cypher_response() function to handle markdown formatting in LLM outputs (triple backtick code blocks, bold headers)

Impact:
The prompt improvements address a core issue where LLMs would incorrectly use qualified_name for short class/function names (e.g., WHERE c.qualified_name = 'UserService' instead of the correct WHERE c.name = 'UserService'). The qualified_name property contains full paths like 'Project.folder.subfolder.ClassName', so matching against short names would always fail.

The enhanced response cleaner now correctly extracts Cypher queries from markdown-formatted LLM responses, improving robustness across different LLM providers and output formats.

Confidence Score: 4/5

This PR is safe to merge with minor style improvements recommended
The changes are well-targeted improvements to prompt engineering and response parsing. The logic is sound and addresses a real issue with name matching in Cypher queries. Existing tests cover the core functionality of _clean_cypher_response(). The only concerns are minor style issues (import placement and case handling consistency) that don't affect correctness.
codebase_rag/services/llm.py could benefit from moving the re import to module-level, but this is a minor style issue

Important Files Changed

Filename	Overview
codebase_rag/services/llm.py	Enhanced Cypher response cleaning with markdown handling - import statement placement could be improved
codebase_rag/prompts.py	Added VALUE PATTERN RULES and class methods example to improve LLM query generation accuracy
codebase_rag/cypher_queries.py	Added CYPHER_EXAMPLE_CLASS_METHODS to demonstrate DEFINES_METHOD pattern usage

Sequence Diagram

sequenceDiagram
    participant User
    participant CypherGenerator
    participant Agent
    participant LLM
    participant CleanFunction as _clean_cypher_response

    User->>CypherGenerator: "generate('What methods does UserService have?')"
    CypherGenerator->>Agent: "run(natural_language_query)"
    Agent->>LLM: "Send prompt with system prompt and examples"
    Note over LLM: Uses VALUE PATTERN RULES<br/>Match by name property<br/>Use DEFINES_METHOD relationship
    LLM-->>Agent: "Response with markdown formatting"
    Agent-->>CypherGenerator: "result.output"
    CypherGenerator->>CleanFunction: "Clean markdown formatting"
    Note over CleanFunction: Extract from code blocks<br/>Remove bold headers<br/>Add semicolon
    CleanFunction-->>CypherGenerator: "Clean Cypher query"
    CypherGenerator-->>User: "Valid Cypher query"

greptile-apps

_{3 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-02T08:52:12Z

codebase_rag/services/llm.py

+    - Bold text (**Cypher Query:**)
+    - Headers and other markdown
+    """
+    import re


Move import re to top-level imports (after line 1). Module-level imports belong with stdlib imports at the file top, not inside functions.

Suggested change

import re

"""Clean LLM response to extract pure Cypher query.

Handles markdown formatting that models sometimes output:

- Triple backticks (```cypher ... ```)

- Bold text (**Cypher Query:**)

- Headers and other markdown

"""

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: codebase_rag/services/llm.py Line: 36:36 Comment: Move `import re` to top-level imports (after line 1). Module-level imports belong with stdlib imports at the file top, not inside functions. ```suggestion """Clean LLM response to extract pure Cypher query. Handles markdown formatting that models sometimes output: - Triple backticks (```cypher ... ```) - Bold text (**Cypher Query:**) - Headers and other markdown """ ``` <sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub> How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-02T08:52:13Z

codebase_rag/services/llm.py

+        if query.lower().startswith(cs.CYPHER_PREFIX):
+            query = query[len(cs.CYPHER_PREFIX):].strip()


Case mismatch: query.lower().startswith() but using original cs.CYPHER_PREFIX length. If cs.CYPHER_PREFIX = "cypher" and query is "CYPHER MATCH...", slicing by len("cypher") (6 chars) from "CYPHER MATCH..." works correctly. However, for safety and clarity, use consistent casing.

Suggested change

if query.lower().startswith(cs.CYPHER_PREFIX):

query = query[len(cs.CYPHER_PREFIX):].strip()

if query.lower().startswith(cs.CYPHER_PREFIX.lower()):

query = query[len(cs.CYPHER_PREFIX):].strip()

Prompt To Fix With AI

This is a comment left during a code review. Path: codebase_rag/services/llm.py Line: 50:51 Comment: Case mismatch: `query.lower().startswith()` but using original `cs.CYPHER_PREFIX` length. If `cs.CYPHER_PREFIX = "cypher"` and query is `"CYPHER MATCH..."`, slicing by `len("cypher")` (6 chars) from `"CYPHER MATCH..."` works correctly. However, for safety and clarity, use consistent casing. ```suggestion if query.lower().startswith(cs.CYPHER_PREFIX.lower()): query = query[len(cs.CYPHER_PREFIX):].strip() ``` How can I resolve this? If you propose a fix, please make it concise.

github-project-automation bot added this to @vitali87's graph code Feb 2, 2026

gemini-code-assist bot reviewed Feb 2, 2026

View reviewed changes

greptile-apps bot reviewed Feb 2, 2026

View reviewed changes

hupeky force-pushed the fix/improve-cypher-generation branch from c24af15 to 232fee8 Compare February 2, 2026 08:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: improve Cypher query generation accuracy#294

fix: improve Cypher query generation accuracy#294
hupeky wants to merge 1 commit intovitali87:mainfrom
hupeky:fix/improve-cypher-generation

hupeky commented Feb 2, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 2, 2026

Uh oh!

gemini-code-assist bot Feb 2, 2026

Uh oh!

gemini-code-assist bot Feb 2, 2026

Uh oh!

greptile-apps bot commented Feb 2, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 2, 2026

Uh oh!

greptile-apps bot Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if query.lower().startswith(cs.CYPHER_PREFIX):
		query = query[len(cs.CYPHER_PREFIX):].strip()

Uh oh!

Conversation

hupeky commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Changes

Testing

Uh oh!

gemini-code-assist bot commented Feb 2, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Feb 2, 2026

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hupeky commented Feb 2, 2026 •

edited

Loading