fix(cuga_lite): recover code from unclosed ```python fence by haroldship · Pull Request #252 · cuga-project/cuga-agent

haroldship · 2026-05-20T16:21:37Z

Bug fix

Fixes #204

Summary

When the LLM emits a Python code block whose closing fence is missing or truncated, extract_and_combine_codeblocks returned no matches because the regex (\``python(.*?)```) required a closing fence. call_modelthen fell through into the "no-code" branch and terminated the subgraph with the raw response stored asfinal_answer. sdk_callback_noderouted the unexecuted Python source straight toFinalAnswerAgent, and the user received raw import json, re` (etc.) as the chat reply — exactly the behavior reported in #204.

The fix adds a small recovery path after the strict regex fails: search for an unclosed \``pythonfence, strip any partial trailing fence, and **only return the candidate when it compiles**. Thecompile()` guard keeps prose that happens to live after a stray opening fence from being mis-classified as code.

Why guard with compile()

A naive recovery (return everything after \``python\n`) would swallow follow-up narration the LLM might tack on after a malformed block. The compile check anchors the recovery to "this is actually Python", which matches the existing pattern in the same function for bare-code responses.

Testing

Branch is a single commit (8a2b6b4f) on top of main (69b7e40a).

Bug-first: tests written before the fix

Following test-first practice, the new regression tests were added to src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_extract_codeblocks.py before any change to extract_and_combine_codeblocks. With only the tests in place (no fix):

❌ test_incomplete_markdown_block_with_valid_code_recovered — failed (returns '' instead of 'print("test")')
❌ test_incomplete_markdown_block_recovers_multiline_code — failed (returns '' instead of the recovered multi-line code from the bug report)
✅ test_incomplete_markdown_block_with_invalid_code_returns_empty — passed (safety test for the compile guard; this property happens to hold both before and after, by construction)

After applying the fix, all 48 tests in the file pass.

Intentional replacement of an existing test

The pre-existing test_incomplete_markdown_block asserted the buggy behavior — that \``python\nprint("test")should return''. That test was replaced by test_incomplete_markdown_block_with_valid_code_recovered`, which asserts the fixed behavior. This is an intentional behavior change required by issue #204: the old test was codifying the bug.

New regression tests

test_incomplete_markdown_block_with_valid_code_recovered — \``python\nprint("test")(no closing fence) now returnsprint("test")instead of''`.
test_incomplete_markdown_block_recovers_multiline_code — the exact shape from the bug report (prose + unclosed fence + multi-line import + print) is recovered.
test_incomplete_markdown_block_with_invalid_code_returns_empty — non-Python text after an open fence ($$$ …) is not treated as code (safety guard for the compile gate).

Test results (against this branch on `main`)

Suite	Result
`src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_extract_codeblocks.py`	✅ 48 / 48 passed (45 pre-existing + 3 new)
`src/cuga/backend/tools_env/registry/tests/`	✅ all passed
`src/cuga/backend/cuga_graph/nodes/api/variables_manager/tests/`	✅ all passed
`tests/unit/test_cuga_lite_bind_tools.py`	✅ all passed
`tests/unit/test_task_analyzer_app_matching.py`	✅ all passed
`ruff check` (touched files)	✅ clean
`ruff format --check` (touched files)	✅ clean

Aggregate: 155 unit tests pass across the subset relevant to this change.

Testing

Verified fix locally; tests pass
New unit tests written against current code first — confirmed failing before the fix, passing after

Summary by CodeRabbit

Bug Fixes
- Enhanced code block extraction to recover valid Python code from incomplete Markdown fences, handling cases where closing fence markers are missing or truncated while maintaining syntax validation.

`extract_and_combine_codeblocks` relied on a strict `\`\`\`python ... \`\`\`` regex with a non-optional closing fence. When the LLM emitted a Python code block whose closing fence was missing or truncated, the regex returned no matches and `call_model` fell through into the "no-code" branch — terminating the subgraph with the raw response as `final_answer`. `sdk_callback_node` then routed the unexecuted source straight to FinalAnswerAgent, and the user received Python source as the chat reply (issue #204). Add a recovery path that searches for an unclosed `\`\`\`python` fence, strips any partial trailing fence, and only returns the candidate when it compiles. Guarding on `compile()` keeps prose that happens to live after a stray opening fence from being mis-classified as code. Closes #204

haroldship · 2026-05-20T16:21:43Z

@coderabbitai review

coderabbitai · 2026-05-20T16:21:48Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-05-20T16:21:49Z

📝 Walkthrough

Walkthrough

The extractor gains a fallback path for responses with an opening ```python fence but no closing fence: it extracts the candidate text after the opening fence, strips trailing/partial backticks, removes await tokens for validation, and returns the candidate only if `compile()` succeeds. Four regression tests cover single-line recovery, multi-line recovery with preceding prose, rejection of invalid content, and stripping of partial trailing backticks.

Changes

Incomplete Fence Recovery with Validation

Layer / File(s)	Summary
Fallback recovery and regression tests `src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`, `src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_extract_codeblocks.py`	Adds a fallback branch in `extract_and_combine_codeblocks` that extracts text after an unclosed ```python fence, strips trailing/backtick fragments, and returns the candidate only when `compile()` (after removing `await `) succeeds. Adds four regression tests: recover valid single-line code, recover multi-line code with preceding prose, reject non-compilable content, and strip partial trailing backticks so valid code still compiles.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I nibble at stray backticks late,
Recovering code from half a gate.
If it compiles, I hop with glee,
Else leave the prose where it should be.
🐰✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: adding recovery logic for unclosed ```python fences in the cuga_lite code extraction function.
Linked Issues check	✅ Passed	The PR directly addresses issue `#204` by implementing recovery logic to extract code from unclosed ```python fences, ensuring code blocks are detected and routed to execution instead of being treated as prose.
Out of Scope Changes check	✅ Passed	All changes are directly within scope: modifications to extract_and_combine_codeblocks function and corresponding test updates to validate the new recovery behavior for unclosed code fences.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/204-sdk-callback-unexecuted-code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_extract_codeblocks.py (1)
122-150: ⚡ Quick win

Add regression coverage for partially truncated closing fences.

Nice additions. One gap remains: no test for a candidate ending with partial fence tokens (for example python\nprint("x")\n where only two backticks are present). That case is central to truncation recovery behavior.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_extract_codeblocks.py`
around lines 122 - 150, Add a regression test to cover truncated/partial closing
fence tokens (e.g. candidate ending with fewer backticks than opening) by adding
a new test function in the same test file that calls
extract_and_combine_codeblocks with an input like "```python\nprint('x')\n``" or
similar partial fence and asserts the expected recovery behavior (either returns
the code when syntactically valid or '' when invalid); reference the existing
test naming style (e.g., test_incomplete_markdown_block_*) and use
extract_and_combine_codeblocks to validate the truncation recovery path.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 734-739: The current recovery logic only strips a full trailing
``` fence before attempting to compile candidate, so partially truncated fences
(e.g., ending with `` or `) cause valid code to fail compilation; update the
re.sub call that produces candidate to remove 1–3 trailing backticks instead
(change the regex from r'\n?```\s*$' to r'\n?`{1,3}\s*$') so the candidate
variable is cleaned of full or partial Markdown fences before the compile call
in the same block (the code that sets candidate and then calls compile(...,
'<string>', 'exec')).

---

Nitpick comments:
In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_extract_codeblocks.py`:
- Around line 122-150: Add a regression test to cover truncated/partial closing
fence tokens (e.g. candidate ending with fewer backticks than opening) by adding
a new test function in the same test file that calls
extract_and_combine_codeblocks with an input like "```python\nprint('x')\n``" or
similar partial fence and asserts the expected recovery behavior (either returns
the code when syntactically valid or '' when invalid); reference the existing
test naming style (e.g., test_incomplete_markdown_block_*) and use
extract_and_combine_codeblocks to validate the truncation recovery path.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5b825470-51b7-4a57-a20b-5eb280f24834

📥 Commits

Reviewing files that changed from the base of the PR and between 69b7e40 and 8a2b6b4.

📒 Files selected for processing (2)

src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_extract_codeblocks.py

sami-marreed · 2026-05-20T16:23:49Z

On which model did this happen

sami-marreed · 2026-05-20T16:24:24Z

    if code_blocks:
        return "\n\n".join(block.strip() for block in code_blocks)

+    # Recovery path for an unclosed ```python fence: without this the


after refactor i am consildating this function it has multiple implementations across the code

Noted — happy to coordinate. This PR keeps the change local to cuga_lite's copy (issue #204 is the immediate symptom there), so the consolidation can fold this in without needing to re-derive the recovery logic.

let's hold. this fix after refactor PR merge

The recovery path stripped only a full \`\`\` (three-backtick) trailing fence, so streamed responses truncated mid-fence (one or two trailing backticks) still failed `compile()` and dropped otherwise-valid code. Match 1–3 trailing backticks instead. Addresses CodeRabbit feedback on PR #252.

coderabbitai

🧹 Nitpick comments (1)

src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py (1)
734-734: 💤 Low value

Optional: Fix EN DASH character in comment.

The comment contains an EN DASH (–) instead of a HYPHEN-MINUS (-) in "1–3 backticks". While this doesn't affect functionality, the linter flags it for consistency. As per coding guidelines, static analysis tool Ruff suggests using the standard hyphen character.
✏️ Suggested fix
-        # Strip a trailing full OR partial markdown fence (1–3 backticks).
+        # Strip a trailing full OR partial markdown fence (1-3 backticks).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py` at line 734,
Replace the EN DASH in the comment "Strip a trailing full OR partial markdown
fence (1–3 backticks)." with a standard hyphen-minus so it reads "(1-3
backticks)"; locate this comment in cuga_lite_graph.py (search for the exact
phrase "Strip a trailing full OR partial markdown fence") and update the
character to '-' for linter consistency.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Line 734: Replace the EN DASH in the comment "Strip a trailing full OR partial
markdown fence (1–3 backticks)." with a standard hyphen-minus so it reads "(1-3
backticks)"; locate this comment in cuga_lite_graph.py (search for the exact
phrase "Strip a trailing full OR partial markdown fence") and update the
character to '-' for linter consistency.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9849e98b-9e6d-4947-b076-1953755efaf8

📥 Commits

Reviewing files that changed from the base of the PR and between 8a2b6b4 and 6197cf1.

📒 Files selected for processing (2)

src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_extract_codeblocks.py

haroldship · 2026-05-20T16:29:54Z

On which model did this happen

On azure/gpt-5.5 as per Issue #204

coderabbitai Bot reviewed May 20, 2026

View reviewed changes

Comment thread src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py Outdated

sami-marreed reviewed May 20, 2026

View reviewed changes

coderabbitai Bot reviewed May 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cuga_lite): recover code from unclosed ```python fence#252

fix(cuga_lite): recover code from unclosed ```python fence#252
haroldship wants to merge 2 commits into
mainfrom
fix/204-sdk-callback-unexecuted-code

haroldship commented May 20, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

haroldship commented May 20, 2026

Uh oh!

coderabbitai Bot commented May 20, 2026

Uh oh!

coderabbitai Bot commented May 20, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

sami-marreed commented May 20, 2026

Uh oh!

sami-marreed May 20, 2026

Uh oh!

haroldship May 20, 2026

Uh oh!

sami-marreed May 20, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

haroldship commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

haroldship commented May 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug fix

Summary

Why guard with compile()

Testing

Bug-first: tests written before the fix

Intentional replacement of an existing test

New regression tests

Test results (against this branch on main)

Testing

Summary by CodeRabbit

Uh oh!

haroldship commented May 20, 2026

Uh oh!

coderabbitai Bot commented May 20, 2026

Uh oh!

coderabbitai Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sami-marreed commented May 20, 2026

Uh oh!

sami-marreed May 20, 2026

Choose a reason for hiding this comment

Uh oh!

haroldship May 20, 2026

Choose a reason for hiding this comment

Uh oh!

sami-marreed May 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

haroldship commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

haroldship commented May 20, 2026 •

edited by coderabbitai Bot

Loading

Test results (against this branch on `main`)

coderabbitai Bot commented May 20, 2026 •

edited

Loading