Skip to content

block structured text payloads from user replies#209

Merged
jamiepine merged 1 commit intomainfrom
fix/block-structured-channel-output
Feb 25, 2026
Merged

block structured text payloads from user replies#209
jamiepine merged 1 commit intomainfrom
fix/block-structured-channel-output

Conversation

@jamiepine
Copy link
Member

@jamiepine jamiepine commented Feb 25, 2026

Summary

  • Add a shared should_block_user_visible_text guard to detect tool-like/structured payloads (JSON, bracket tool syntax, wrapper tags) before anything user-visible is sent.
  • Update channel turn handling to retry with a correction prompt up to 2 times when blocked output appears, nudging the model to call reply with plain text or skip.
  • Enforce the same guard inside the reply tool and expand the correction prompt wording so malformed payloads are blocked consistently across fallback and tool paths.

Example

if crate::tools::should_block_user_visible_text(text) {
    tracing::warn!(channel_id = %self.id, "blocked fallback output containing structured or tool syntax");
}

Testing

  • cargo test blocks_json_bracket_and_tool_syntax_output
  • cargo test allows_normal_plaintext_output

Note

This PR prevents LLM-generated structured payloads (JSON, bracket syntax, XML-like tags) from reaching end users as plain text. The solution introduces a shared validation function should_block_user_visible_text that filters output across three paths: the channel recovery loop (up to 2 retries), the fallback handler when retrigger produces text, and the reply tool directly. The correction prompt is now more explicit about what constitutes invalid output. All changes are in agent loop and tool enforcement with comprehensive test coverage.
Written by Tembo for commit 1d35d12.

@coderabbitai
Copy link

coderabbitai bot commented Feb 25, 2026

Walkthrough

The changes implement safeguards to prevent LLMs from outputting tool syntax or structured payloads to end users. Updates include a revised system prompt, a new detection function, retry logic for recovery attempts, and a runtime guard in the reply tool to block problematic content.

Changes

Cohort / File(s) Summary
System Prompt Update
prompts/en/fragments/system/tool_syntax_correction.md.j2
Updated system message to explicitly disallow tool call syntax, structured payloads, and JSON/XML-like output, requiring plain natural-language replies only.
Content Detection & Blocking
src/tools.rs, src/tools/reply.rs
Added should_block_user_visible_text() function to detect structured/tool payloads with unit tests. Integrated a runtime guard in ReplyTool::call to block and log detected problematic content before sending.
Tool Syntax Recovery Mechanism
src/agent/channel.rs
Implemented retry loop (up to 2 attempts) for recovering from malformed tool syntax output. Added conditional branches to differentiate blocked vs. safe content, improved logging with channel_id and attempt counters, and adjusted fallback send paths accordingly.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'block structured text payloads from user replies' directly and clearly summarizes the main objective of the PR, which is to prevent LLM-generated structured payloads from reaching end users.
Description check ✅ Passed The description is directly related to the changeset, providing a clear summary of the changes, examples of implementation, testing instructions, and detailed notes about how the solution works across three code paths.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/block-structured-channel-output

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jamiepine jamiepine marked this pull request as ready for review February 25, 2026 05:40
@jamiepine jamiepine merged commit ee8b6dd into main Feb 25, 2026
3 of 4 checks passed
Comment on lines +204 to +218
let trimmed = value.trim_start();
if trimmed.is_empty() {
return false;
}

if trimmed.starts_with('{') || trimmed.starts_with('[') {
return true;
}

let lower = trimmed.to_ascii_lowercase();
if TOOL_PREFIXES.iter().any(|prefix| lower.starts_with(prefix)) {
return true;
}

lower.starts_with("<system-reminder>") || lower.starts_with("<path>")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should_block_user_visible_text currently blocks any output starting with [ (so it’ll reject legit plaintext like [FYI] ...), and it also makes the "[reply]"/etc entries in TOOL_PREFIXES unreachable because the early starts_with('[') returns first. Suggestion: only block [ when it matches known tool prefixes, or when it parses as JSON (so bracketed prose is allowed).

Suggested change
let trimmed = value.trim_start();
if trimmed.is_empty() {
return false;
}
if trimmed.starts_with('{') || trimmed.starts_with('[') {
return true;
}
let lower = trimmed.to_ascii_lowercase();
if TOOL_PREFIXES.iter().any(|prefix| lower.starts_with(prefix)) {
return true;
}
lower.starts_with("<system-reminder>") || lower.starts_with("<path>")
let trimmed = value.trim_start();
if trimmed.is_empty() {
return false;
}
if trimmed.starts_with('{') {
return true;
}
let lower = trimmed.to_ascii_lowercase();
if TOOL_PREFIXES.iter().any(|prefix| lower.starts_with(prefix)) {
return true;
}
if trimmed.starts_with('[') && serde_json::from_str::<serde_json::Value>(trimmed).is_ok() {
return true;
}
lower.starts_with("<system-reminder>") || lower.starts_with("<path>")

Might also be worth adding a small regression test like assert!(!should_block_user_visible_text("[FYI] hello")); so this doesn’t accidentally get re-introduced.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/agent/channel.rs (1)

1421-1455: Consolidate repeated fallback-send logic to prevent policy drift.

The same block-check → extract/normalize → log/send flow appears in three branches. Centralizing this into one helper would make future guard changes consistent across retrigger and non-retrigger paths.

♻️ Refactor sketch
- if crate::tools::should_block_user_visible_text(text) {
-     ...
- } else {
-     let extracted = extract_reply_from_tool_syntax(text);
-     ...
-     if !final_text.is_empty() {
-         ...
-         self.response_tx.send(OutboundResponse::Text(final_text)).await
-     }
- }
+ self.send_safe_fallback_text(text, is_retrigger, skipped).await;
async fn send_safe_fallback_text(&self, text: &str, is_retrigger: bool, skipped: bool) {
    // single place for block check, extraction, mention normalization, logging, and send
}

Also applies to: 1474-1506, 1520-1552

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/channel.rs` around lines 1421 - 1455, There are three duplicated
flows (block check via crate::tools::should_block_user_visible_text,
extract_reply_from_tool_syntax,
crate::tools::reply::normalize_discord_mention_tokens,
conversation_logger.log_bot_message, and sending via
response_tx.send(OutboundResponse::Text)) scattered across retrigger and
non-retrigger branches; consolidate them into a single async helper method
(e.g., send_safe_fallback_text(&self, text: &str, is_retrigger: bool, skipped:
bool)) that runs the block check, performs extraction with
extract_reply_from_tool_syntax, normalizes mentions with
normalize_discord_mention_tokens (using the same source lookup from
self.conversation_id), logs warnings when extraction occurred, logs the final
bot message via self.state.conversation_logger.log_bot_message, and sends via
self.response_tx.send, preserving the existing tracing.warn/error messages and
returned behavior; replace the three duplicated blocks with calls to this helper
so future guard/policy changes are centralized.
src/tools.rs (1)

534-550: Add regression tests for bracket-prefixed plaintext and generic wrapper tags.

Current tests validate core cases, but they don’t protect against the Line 209 false-positive class or generic wrapper-tag detection gaps.

🧪 Suggested test additions
     #[test]
     fn allows_normal_plaintext_output() {
         assert!(!should_block_user_visible_text("hello team"));
         assert!(!should_block_user_visible_text("- first\n- second"));
+        assert!(!should_block_user_visible_text("[FYI] deployment finished"));
     }
+
+    #[test]
+    fn blocks_generic_wrapper_tag_output() {
+        assert!(should_block_user_visible_text(
+            "<file name=\"notes.txt\">hello</file>"
+        ));
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools.rs` around lines 534 - 550, Add regression tests to strengthen
should_block_user_visible_text: include negative assertions that plaintext
starting with bracket-prefixed labels (e.g., "[note] hello" or "[reply] hello"
where the content is plain text, not JSON/array) should NOT be blocked, and
include negative assertions for generic wrapper-like tags (e.g.,
"<wrapper>visible</wrapper>" or common tags like "<div>visible</div>") to ensure
the generic tag heuristic doesn't produce false positives; update the existing
test module (the functions blocks_json_bracket_and_tool_syntax_output and
allows_normal_plaintext_output) by adding these specific
assert!(!should_block_user_visible_text(...)) cases referencing
should_block_user_visible_text so future changes that reintroduce the Line 209
false-positive class or wrapper-tag detection gaps will be caught.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/tools.rs`:
- Around line 209-219: The current matcher over-blocks any text starting with
'[' and only allows two hardcoded XML tags, causing false positives/negatives;
update the helper that uses trimmed and TOOL_PREFIXES so it: (1) only treats
leading '{' as a tool if the text is valid JSON/object (attempt a quick
serde_json::from_str check), (2) treat leading '[' as a tool only when it
matches a tool invocation pattern like '^\[\s*[A-Za-z0-9_-]+.*\]' (use a small
regex or manual parse to ensure the bracketed token looks like a tool name, not
a generic bracketed heading), and (3) replace the hardcoded tag checks (the
current lower.starts_with("<system-reminder>") || lower.starts_with("<path>"))
with a generic XML-wrapper detection that matches a single opening tag like
'^<\s*[A-Za-z0-9_-]+>' (use a case-insensitive regex on trimmed or lower) so
arbitrary XML-like wrapper tags are caught; keep the existing TOOL_PREFIXES
check intact. Ensure to reference TOOL_PREFIXES and the trimmed variable when
editing this helper.

---

Nitpick comments:
In `@src/agent/channel.rs`:
- Around line 1421-1455: There are three duplicated flows (block check via
crate::tools::should_block_user_visible_text, extract_reply_from_tool_syntax,
crate::tools::reply::normalize_discord_mention_tokens,
conversation_logger.log_bot_message, and sending via
response_tx.send(OutboundResponse::Text)) scattered across retrigger and
non-retrigger branches; consolidate them into a single async helper method
(e.g., send_safe_fallback_text(&self, text: &str, is_retrigger: bool, skipped:
bool)) that runs the block check, performs extraction with
extract_reply_from_tool_syntax, normalizes mentions with
normalize_discord_mention_tokens (using the same source lookup from
self.conversation_id), logs warnings when extraction occurred, logs the final
bot message via self.state.conversation_logger.log_bot_message, and sends via
self.response_tx.send, preserving the existing tracing.warn/error messages and
returned behavior; replace the three duplicated blocks with calls to this helper
so future guard/policy changes are centralized.

In `@src/tools.rs`:
- Around line 534-550: Add regression tests to strengthen
should_block_user_visible_text: include negative assertions that plaintext
starting with bracket-prefixed labels (e.g., "[note] hello" or "[reply] hello"
where the content is plain text, not JSON/array) should NOT be blocked, and
include negative assertions for generic wrapper-like tags (e.g.,
"<wrapper>visible</wrapper>" or common tags like "<div>visible</div>") to ensure
the generic tag heuristic doesn't produce false positives; update the existing
test module (the functions blocks_json_bracket_and_tool_syntax_output and
allows_normal_plaintext_output) by adding these specific
assert!(!should_block_user_visible_text(...)) cases referencing
should_block_user_visible_text so future changes that reintroduce the Line 209
false-positive class or wrapper-tag detection gaps will be caught.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between faf5a9f and 1d35d12.

📒 Files selected for processing (4)
  • prompts/en/fragments/system/tool_syntax_correction.md.j2
  • src/agent/channel.rs
  • src/tools.rs
  • src/tools/reply.rs

Comment on lines +209 to +219
if trimmed.starts_with('{') || trimmed.starts_with('[') {
return true;
}

let lower = trimmed.to_ascii_lowercase();
if TOOL_PREFIXES.iter().any(|prefix| lower.starts_with(prefix)) {
return true;
}

lower.starts_with("<system-reminder>") || lower.starts_with("<path>")
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Refine matcher scope to avoid both over-blocking and wrapper-tag bypasses.

Line 209 blocks any leading [ text, so valid replies like bracketed headings can be suppressed. Lines 218-219 only check two hardcoded tags, so other XML-like wrappers can still pass. This helper is shared by fallback + reply, so these mismatches affect all user-visible paths.

🔧 Proposed fix
-    if trimmed.starts_with('{') || trimmed.starts_with('[') {
+    if (trimmed.starts_with('{') || trimmed.starts_with('['))
+        && serde_json::from_str::<serde_json::Value>(trimmed).is_ok()
+    {
         return true;
     }
@@
-    lower.starts_with("<system-reminder>") || lower.starts_with("<path>")
+    if lower.starts_with("<system-reminder>") || lower.starts_with("<path>") {
+        return true;
+    }
+
+    // Block generic XML-like wrapper tags at start of message, e.g. <file ...>, <path>, <system-reminder>.
+    if let Some(rest) = trimmed.strip_prefix('<') {
+        if let Some(end_index) = rest.find('>') {
+            let tag_name = rest[..end_index]
+                .split_whitespace()
+                .next()
+                .unwrap_or("");
+            if !tag_name.is_empty()
+                && tag_name
+                    .chars()
+                    .all(|character| character.is_ascii_alphanumeric() || character == '_' || character == '-')
+            {
+                return true;
+            }
+        }
+    }
+
+    false
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools.rs` around lines 209 - 219, The current matcher over-blocks any
text starting with '[' and only allows two hardcoded XML tags, causing false
positives/negatives; update the helper that uses trimmed and TOOL_PREFIXES so
it: (1) only treats leading '{' as a tool if the text is valid JSON/object
(attempt a quick serde_json::from_str check), (2) treat leading '[' as a tool
only when it matches a tool invocation pattern like '^\[\s*[A-Za-z0-9_-]+.*\]'
(use a small regex or manual parse to ensure the bracketed token looks like a
tool name, not a generic bracketed heading), and (3) replace the hardcoded tag
checks (the current lower.starts_with("<system-reminder>") ||
lower.starts_with("<path>")) with a generic XML-wrapper detection that matches a
single opening tag like '^<\s*[A-Za-z0-9_-]+>' (use a case-insensitive regex on
trimmed or lower) so arbitrary XML-like wrapper tags are caught; keep the
existing TOOL_PREFIXES check intact. Ensure to reference TOOL_PREFIXES and the
trimmed variable when editing this helper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant