Skip to content

fix(cuga_lite): recover code from unclosed ```python fence#252

Open
haroldship wants to merge 2 commits into
mainfrom
fix/204-sdk-callback-unexecuted-code
Open

fix(cuga_lite): recover code from unclosed ```python fence#252
haroldship wants to merge 2 commits into
mainfrom
fix/204-sdk-callback-unexecuted-code

Conversation

@haroldship
Copy link
Copy Markdown
Collaborator

@haroldship haroldship commented May 20, 2026

Bug fix

Fixes #204

Summary

When the LLM emits a Python code block whose closing fence is missing or truncated, extract_and_combine_codeblocks returned no matches because the regex (\``python(.*?)```) required a closing fence. call_modelthen fell through into the "no-code" branch and terminated the subgraph with the raw response stored asfinal_answer. sdk_callback_noderouted the unexecuted Python source straight toFinalAnswerAgent, and the user received raw import json, re` (etc.) as the chat reply — exactly the behavior reported in #204.

The fix adds a small recovery path after the strict regex fails: search for an unclosed \``pythonfence, strip any partial trailing fence, and **only return the candidate when it compiles**. Thecompile()` guard keeps prose that happens to live after a stray opening fence from being mis-classified as code.

Why guard with compile()

A naive recovery (return everything after \``python\n`) would swallow follow-up narration the LLM might tack on after a malformed block. The compile check anchors the recovery to "this is actually Python", which matches the existing pattern in the same function for bare-code responses.

Testing

Branch is a single commit (8a2b6b4f) on top of main (69b7e40a).

Bug-first: tests written before the fix

Following test-first practice, the new regression tests were added to src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_extract_codeblocks.py before any change to extract_and_combine_codeblocks. With only the tests in place (no fix):

  • test_incomplete_markdown_block_with_valid_code_recoveredfailed (returns '' instead of 'print("test")')
  • test_incomplete_markdown_block_recovers_multiline_codefailed (returns '' instead of the recovered multi-line code from the bug report)
  • test_incomplete_markdown_block_with_invalid_code_returns_empty — passed (safety test for the compile guard; this property happens to hold both before and after, by construction)

After applying the fix, all 48 tests in the file pass.

Intentional replacement of an existing test

The pre-existing test_incomplete_markdown_block asserted the buggy behavior — that \``python\nprint("test")should return''. That test was replaced by test_incomplete_markdown_block_with_valid_code_recovered`, which asserts the fixed behavior. This is an intentional behavior change required by issue #204: the old test was codifying the bug.

New regression tests

  • test_incomplete_markdown_block_with_valid_code_recovered\``python\nprint("test")(no closing fence) now returnsprint("test")instead of''`.
  • test_incomplete_markdown_block_recovers_multiline_code — the exact shape from the bug report (prose + unclosed fence + multi-line import + print) is recovered.
  • test_incomplete_markdown_block_with_invalid_code_returns_empty — non-Python text after an open fence ($$$ …) is not treated as code (safety guard for the compile gate).

Test results (against this branch on main)

Suite Result
src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_extract_codeblocks.py ✅ 48 / 48 passed (45 pre-existing + 3 new)
src/cuga/backend/tools_env/registry/tests/ ✅ all passed
src/cuga/backend/cuga_graph/nodes/api/variables_manager/tests/ ✅ all passed
tests/unit/test_cuga_lite_bind_tools.py ✅ all passed
tests/unit/test_task_analyzer_app_matching.py ✅ all passed
ruff check (touched files) ✅ clean
ruff format --check (touched files) ✅ clean

Aggregate: 155 unit tests pass across the subset relevant to this change.

Testing

  • Verified fix locally; tests pass
  • New unit tests written against current code first — confirmed failing before the fix, passing after

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced code block extraction to recover valid Python code from incomplete Markdown fences, handling cases where closing fence markers are missing or truncated while maintaining syntax validation.

Review Change Stack

`extract_and_combine_codeblocks` relied on a strict `\`\`\`python ... \`\`\``
regex with a non-optional closing fence. When the LLM emitted a Python
code block whose closing fence was missing or truncated, the regex
returned no matches and `call_model` fell through into the
"no-code" branch — terminating the subgraph with the raw response as
`final_answer`. `sdk_callback_node` then routed the unexecuted source
straight to FinalAnswerAgent, and the user received Python source as
the chat reply (issue #204).

Add a recovery path that searches for an unclosed `\`\`\`python` fence,
strips any partial trailing fence, and only returns the candidate when
it compiles. Guarding on `compile()` keeps prose that happens to live
after a stray opening fence from being mis-classified as code.

Closes #204
@haroldship
Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

📝 Walkthrough

Walkthrough

The extractor gains a fallback path for responses with an opening ```python fence but no closing fence: it extracts the candidate text after the opening fence, strips trailing/partial backticks, removes await tokens for validation, and returns the candidate only if `compile()` succeeds. Four regression tests cover single-line recovery, multi-line recovery with preceding prose, rejection of invalid content, and stripping of partial trailing backticks.

Changes

Incomplete Fence Recovery with Validation

Layer / File(s) Summary
Fallback recovery and regression tests
src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py, src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_extract_codeblocks.py
Adds a fallback branch in extract_and_combine_codeblocks that extracts text after an unclosed ```python fence, strips trailing/backtick fragments, and returns the candidate only when compile() (after removing `await `) succeeds. Adds four regression tests: recover valid single-line code, recover multi-line code with preceding prose, reject non-compilable content, and strip partial trailing backticks so valid code still compiles.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I nibble at stray backticks late,
Recovering code from half a gate.
If it compiles, I hop with glee,
Else leave the prose where it should be.
🐰✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding recovery logic for unclosed ```python fences in the cuga_lite code extraction function.
Linked Issues check ✅ Passed The PR directly addresses issue #204 by implementing recovery logic to extract code from unclosed ```python fences, ensuring code blocks are detected and routed to execution instead of being treated as prose.
Out of Scope Changes check ✅ Passed All changes are directly within scope: modifications to extract_and_combine_codeblocks function and corresponding test updates to validate the new recovery behavior for unclosed code fences.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/204-sdk-callback-unexecuted-code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_extract_codeblocks.py (1)

122-150: ⚡ Quick win

Add regression coverage for partially truncated closing fences.

Nice additions. One gap remains: no test for a candidate ending with partial fence tokens (for example python\nprint("x")\n where only two backticks are present). That case is central to truncation recovery behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_extract_codeblocks.py`
around lines 122 - 150, Add a regression test to cover truncated/partial closing
fence tokens (e.g. candidate ending with fewer backticks than opening) by adding
a new test function in the same test file that calls
extract_and_combine_codeblocks with an input like "```python\nprint('x')\n``" or
similar partial fence and asserts the expected recovery behavior (either returns
the code when syntactically valid or '' when invalid); reference the existing
test naming style (e.g., test_incomplete_markdown_block_*) and use
extract_and_combine_codeblocks to validate the truncation recovery path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 734-739: The current recovery logic only strips a full trailing
``` fence before attempting to compile candidate, so partially truncated fences
(e.g., ending with `` or `) cause valid code to fail compilation; update the
re.sub call that produces candidate to remove 1–3 trailing backticks instead
(change the regex from r'\n?```\s*$' to r'\n?`{1,3}\s*$') so the candidate
variable is cleaned of full or partial Markdown fences before the compile call
in the same block (the code that sets candidate and then calls compile(...,
'<string>', 'exec')).

---

Nitpick comments:
In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_extract_codeblocks.py`:
- Around line 122-150: Add a regression test to cover truncated/partial closing
fence tokens (e.g. candidate ending with fewer backticks than opening) by adding
a new test function in the same test file that calls
extract_and_combine_codeblocks with an input like "```python\nprint('x')\n``" or
similar partial fence and asserts the expected recovery behavior (either returns
the code when syntactically valid or '' when invalid); reference the existing
test naming style (e.g., test_incomplete_markdown_block_*) and use
extract_and_combine_codeblocks to validate the truncation recovery path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5b825470-51b7-4a57-a20b-5eb280f24834

📥 Commits

Reviewing files that changed from the base of the PR and between 69b7e40 and 8a2b6b4.

📒 Files selected for processing (2)
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_extract_codeblocks.py

Comment thread src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py Outdated
@sami-marreed
Copy link
Copy Markdown
Contributor

On which model did this happen

if code_blocks:
return "\n\n".join(block.strip() for block in code_blocks)

# Recovery path for an unclosed ```python fence: without this the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after refactor i am consildating this function it has multiple implementations across the code

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted — happy to coordinate. This PR keeps the change local to cuga_lite's copy (issue #204 is the immediate symptom there), so the consolidation can fold this in without needing to re-derive the recovery logic.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's hold. this fix after refactor PR merge

The recovery path stripped only a full \`\`\` (three-backtick) trailing
fence, so streamed responses truncated mid-fence (one or two trailing
backticks) still failed `compile()` and dropped otherwise-valid code.
Match 1–3 trailing backticks instead.

Addresses CodeRabbit feedback on PR #252.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py (1)

734-734: 💤 Low value

Optional: Fix EN DASH character in comment.

The comment contains an EN DASH (–) instead of a HYPHEN-MINUS (-) in "1–3 backticks". While this doesn't affect functionality, the linter flags it for consistency. As per coding guidelines, static analysis tool Ruff suggests using the standard hyphen character.

✏️ Suggested fix
-        # Strip a trailing full OR partial markdown fence (1–3 backticks).
+        # Strip a trailing full OR partial markdown fence (1-3 backticks).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py` at line 734,
Replace the EN DASH in the comment "Strip a trailing full OR partial markdown
fence (1–3 backticks)." with a standard hyphen-minus so it reads "(1-3
backticks)"; locate this comment in cuga_lite_graph.py (search for the exact
phrase "Strip a trailing full OR partial markdown fence") and update the
character to '-' for linter consistency.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Line 734: Replace the EN DASH in the comment "Strip a trailing full OR partial
markdown fence (1–3 backticks)." with a standard hyphen-minus so it reads "(1-3
backticks)"; locate this comment in cuga_lite_graph.py (search for the exact
phrase "Strip a trailing full OR partial markdown fence") and update the
character to '-' for linter consistency.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9849e98b-9e6d-4947-b076-1953755efaf8

📥 Commits

Reviewing files that changed from the base of the PR and between 8a2b6b4 and 6197cf1.

📒 Files selected for processing (2)
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_extract_codeblocks.py

@haroldship
Copy link
Copy Markdown
Collaborator Author

On which model did this happen

On azure/gpt-5.5 as per Issue #204

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: sdk_callback_node routes unexecuted code block as final answer

2 participants