fix(cuga_lite): recover code from unclosed ```python fence#252
fix(cuga_lite): recover code from unclosed ```python fence#252haroldship wants to merge 2 commits into
Conversation
`extract_and_combine_codeblocks` relied on a strict `\`\`\`python ... \`\`\`` regex with a non-optional closing fence. When the LLM emitted a Python code block whose closing fence was missing or truncated, the regex returned no matches and `call_model` fell through into the "no-code" branch — terminating the subgraph with the raw response as `final_answer`. `sdk_callback_node` then routed the unexecuted source straight to FinalAnswerAgent, and the user received Python source as the chat reply (issue #204). Add a recovery path that searches for an unclosed `\`\`\`python` fence, strips any partial trailing fence, and only returns the candidate when it compiles. Guarding on `compile()` keeps prose that happens to live after a stray opening fence from being mis-classified as code. Closes #204
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
📝 WalkthroughWalkthroughThe extractor gains a fallback path for responses with an opening ```python fence but no closing fence: it extracts the candidate text after the opening fence, strips trailing/partial backticks, removes ChangesIncomplete Fence Recovery with Validation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_extract_codeblocks.py (1)
122-150: ⚡ Quick winAdd regression coverage for partially truncated closing fences.
Nice additions. One gap remains: no test for a candidate ending with partial fence tokens (for example
python\nprint("x")\nwhere only two backticks are present). That case is central to truncation recovery behavior.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_extract_codeblocks.py` around lines 122 - 150, Add a regression test to cover truncated/partial closing fence tokens (e.g. candidate ending with fewer backticks than opening) by adding a new test function in the same test file that calls extract_and_combine_codeblocks with an input like "```python\nprint('x')\n``" or similar partial fence and asserts the expected recovery behavior (either returns the code when syntactically valid or '' when invalid); reference the existing test naming style (e.g., test_incomplete_markdown_block_*) and use extract_and_combine_codeblocks to validate the truncation recovery path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 734-739: The current recovery logic only strips a full trailing
``` fence before attempting to compile candidate, so partially truncated fences
(e.g., ending with `` or `) cause valid code to fail compilation; update the
re.sub call that produces candidate to remove 1–3 trailing backticks instead
(change the regex from r'\n?```\s*$' to r'\n?`{1,3}\s*$') so the candidate
variable is cleaned of full or partial Markdown fences before the compile call
in the same block (the code that sets candidate and then calls compile(...,
'<string>', 'exec')).
---
Nitpick comments:
In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_extract_codeblocks.py`:
- Around line 122-150: Add a regression test to cover truncated/partial closing
fence tokens (e.g. candidate ending with fewer backticks than opening) by adding
a new test function in the same test file that calls
extract_and_combine_codeblocks with an input like "```python\nprint('x')\n``" or
similar partial fence and asserts the expected recovery behavior (either returns
the code when syntactically valid or '' when invalid); reference the existing
test naming style (e.g., test_incomplete_markdown_block_*) and use
extract_and_combine_codeblocks to validate the truncation recovery path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5b825470-51b7-4a57-a20b-5eb280f24834
📒 Files selected for processing (2)
src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_extract_codeblocks.py
|
On which model did this happen |
| if code_blocks: | ||
| return "\n\n".join(block.strip() for block in code_blocks) | ||
|
|
||
| # Recovery path for an unclosed ```python fence: without this the |
There was a problem hiding this comment.
after refactor i am consildating this function it has multiple implementations across the code
There was a problem hiding this comment.
Noted — happy to coordinate. This PR keeps the change local to cuga_lite's copy (issue #204 is the immediate symptom there), so the consolidation can fold this in without needing to re-derive the recovery logic.
There was a problem hiding this comment.
let's hold. this fix after refactor PR merge
The recovery path stripped only a full \`\`\` (three-backtick) trailing fence, so streamed responses truncated mid-fence (one or two trailing backticks) still failed `compile()` and dropped otherwise-valid code. Match 1–3 trailing backticks instead. Addresses CodeRabbit feedback on PR #252.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py (1)
734-734: 💤 Low valueOptional: Fix EN DASH character in comment.
The comment contains an EN DASH (–) instead of a HYPHEN-MINUS (-) in "1–3 backticks". While this doesn't affect functionality, the linter flags it for consistency. As per coding guidelines, static analysis tool Ruff suggests using the standard hyphen character.
✏️ Suggested fix
- # Strip a trailing full OR partial markdown fence (1–3 backticks). + # Strip a trailing full OR partial markdown fence (1-3 backticks).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py` at line 734, Replace the EN DASH in the comment "Strip a trailing full OR partial markdown fence (1–3 backticks)." with a standard hyphen-minus so it reads "(1-3 backticks)"; locate this comment in cuga_lite_graph.py (search for the exact phrase "Strip a trailing full OR partial markdown fence") and update the character to '-' for linter consistency.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Line 734: Replace the EN DASH in the comment "Strip a trailing full OR partial
markdown fence (1–3 backticks)." with a standard hyphen-minus so it reads "(1-3
backticks)"; locate this comment in cuga_lite_graph.py (search for the exact
phrase "Strip a trailing full OR partial markdown fence") and update the
character to '-' for linter consistency.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9849e98b-9e6d-4947-b076-1953755efaf8
📒 Files selected for processing (2)
src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_extract_codeblocks.py
On |
Bug fix
Fixes #204
Summary
When the LLM emits a Python code block whose closing fence is missing or truncated,
extract_and_combine_codeblocksreturned no matches because the regex (\``python(.*?)```) required a closing fence.call_modelthen fell through into the "no-code" branch and terminated the subgraph with the raw response stored asfinal_answer.sdk_callback_noderouted the unexecuted Python source straight toFinalAnswerAgent, and the user received rawimport json, re` (etc.) as the chat reply — exactly the behavior reported in #204.The fix adds a small recovery path after the strict regex fails: search for an unclosed
\``pythonfence, strip any partial trailing fence, and **only return the candidate when it compiles**. Thecompile()` guard keeps prose that happens to live after a stray opening fence from being mis-classified as code.Why guard with compile()
A naive recovery (return everything after
\``python\n`) would swallow follow-up narration the LLM might tack on after a malformed block. The compile check anchors the recovery to "this is actually Python", which matches the existing pattern in the same function for bare-code responses.Testing
Branch is a single commit (
8a2b6b4f) on top ofmain(69b7e40a).Bug-first: tests written before the fix
Following test-first practice, the new regression tests were added to
src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_extract_codeblocks.pybefore any change toextract_and_combine_codeblocks. With only the tests in place (no fix):test_incomplete_markdown_block_with_valid_code_recovered— failed (returns''instead of'print("test")')test_incomplete_markdown_block_recovers_multiline_code— failed (returns''instead of the recovered multi-line code from the bug report)test_incomplete_markdown_block_with_invalid_code_returns_empty— passed (safety test for the compile guard; this property happens to hold both before and after, by construction)After applying the fix, all 48 tests in the file pass.
Intentional replacement of an existing test
The pre-existing
test_incomplete_markdown_blockasserted the buggy behavior — that\``python\nprint("test")should return''. That test was replaced bytest_incomplete_markdown_block_with_valid_code_recovered`, which asserts the fixed behavior. This is an intentional behavior change required by issue #204: the old test was codifying the bug.New regression tests
test_incomplete_markdown_block_with_valid_code_recovered—\``python\nprint("test")(no closing fence) now returnsprint("test")instead of''`.test_incomplete_markdown_block_recovers_multiline_code— the exact shape from the bug report (prose + unclosed fence + multi-lineimport+print) is recovered.test_incomplete_markdown_block_with_invalid_code_returns_empty— non-Python text after an open fence ($$$ …) is not treated as code (safety guard for the compile gate).Test results (against this branch on
main)src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_extract_codeblocks.pysrc/cuga/backend/tools_env/registry/tests/src/cuga/backend/cuga_graph/nodes/api/variables_manager/tests/tests/unit/test_cuga_lite_bind_tools.pytests/unit/test_task_analyzer_app_matching.pyruff check(touched files)ruff format --check(touched files)Aggregate: 155 unit tests pass across the subset relevant to this change.
Testing
Summary by CodeRabbit