block structured text payloads from user replies#209
Conversation
WalkthroughThe changes implement safeguards to prevent LLMs from outputting tool syntax or structured payloads to end users. Updates include a revised system prompt, a new detection function, retry logic for recovery attempts, and a runtime guard in the reply tool to block problematic content. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| let trimmed = value.trim_start(); | ||
| if trimmed.is_empty() { | ||
| return false; | ||
| } | ||
|
|
||
| if trimmed.starts_with('{') || trimmed.starts_with('[') { | ||
| return true; | ||
| } | ||
|
|
||
| let lower = trimmed.to_ascii_lowercase(); | ||
| if TOOL_PREFIXES.iter().any(|prefix| lower.starts_with(prefix)) { | ||
| return true; | ||
| } | ||
|
|
||
| lower.starts_with("<system-reminder>") || lower.starts_with("<path>") |
There was a problem hiding this comment.
should_block_user_visible_text currently blocks any output starting with [ (so it’ll reject legit plaintext like [FYI] ...), and it also makes the "[reply]"/etc entries in TOOL_PREFIXES unreachable because the early starts_with('[') returns first. Suggestion: only block [ when it matches known tool prefixes, or when it parses as JSON (so bracketed prose is allowed).
| let trimmed = value.trim_start(); | |
| if trimmed.is_empty() { | |
| return false; | |
| } | |
| if trimmed.starts_with('{') || trimmed.starts_with('[') { | |
| return true; | |
| } | |
| let lower = trimmed.to_ascii_lowercase(); | |
| if TOOL_PREFIXES.iter().any(|prefix| lower.starts_with(prefix)) { | |
| return true; | |
| } | |
| lower.starts_with("<system-reminder>") || lower.starts_with("<path>") | |
| let trimmed = value.trim_start(); | |
| if trimmed.is_empty() { | |
| return false; | |
| } | |
| if trimmed.starts_with('{') { | |
| return true; | |
| } | |
| let lower = trimmed.to_ascii_lowercase(); | |
| if TOOL_PREFIXES.iter().any(|prefix| lower.starts_with(prefix)) { | |
| return true; | |
| } | |
| if trimmed.starts_with('[') && serde_json::from_str::<serde_json::Value>(trimmed).is_ok() { | |
| return true; | |
| } | |
| lower.starts_with("<system-reminder>") || lower.starts_with("<path>") |
Might also be worth adding a small regression test like assert!(!should_block_user_visible_text("[FYI] hello")); so this doesn’t accidentally get re-introduced.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/agent/channel.rs (1)
1421-1455: Consolidate repeated fallback-send logic to prevent policy drift.The same block-check → extract/normalize → log/send flow appears in three branches. Centralizing this into one helper would make future guard changes consistent across retrigger and non-retrigger paths.
♻️ Refactor sketch
- if crate::tools::should_block_user_visible_text(text) { - ... - } else { - let extracted = extract_reply_from_tool_syntax(text); - ... - if !final_text.is_empty() { - ... - self.response_tx.send(OutboundResponse::Text(final_text)).await - } - } + self.send_safe_fallback_text(text, is_retrigger, skipped).await;async fn send_safe_fallback_text(&self, text: &str, is_retrigger: bool, skipped: bool) { // single place for block check, extraction, mention normalization, logging, and send }Also applies to: 1474-1506, 1520-1552
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/agent/channel.rs` around lines 1421 - 1455, There are three duplicated flows (block check via crate::tools::should_block_user_visible_text, extract_reply_from_tool_syntax, crate::tools::reply::normalize_discord_mention_tokens, conversation_logger.log_bot_message, and sending via response_tx.send(OutboundResponse::Text)) scattered across retrigger and non-retrigger branches; consolidate them into a single async helper method (e.g., send_safe_fallback_text(&self, text: &str, is_retrigger: bool, skipped: bool)) that runs the block check, performs extraction with extract_reply_from_tool_syntax, normalizes mentions with normalize_discord_mention_tokens (using the same source lookup from self.conversation_id), logs warnings when extraction occurred, logs the final bot message via self.state.conversation_logger.log_bot_message, and sends via self.response_tx.send, preserving the existing tracing.warn/error messages and returned behavior; replace the three duplicated blocks with calls to this helper so future guard/policy changes are centralized.src/tools.rs (1)
534-550: Add regression tests for bracket-prefixed plaintext and generic wrapper tags.Current tests validate core cases, but they don’t protect against the Line 209 false-positive class or generic wrapper-tag detection gaps.
🧪 Suggested test additions
#[test] fn allows_normal_plaintext_output() { assert!(!should_block_user_visible_text("hello team")); assert!(!should_block_user_visible_text("- first\n- second")); + assert!(!should_block_user_visible_text("[FYI] deployment finished")); } + + #[test] + fn blocks_generic_wrapper_tag_output() { + assert!(should_block_user_visible_text( + "<file name=\"notes.txt\">hello</file>" + )); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/tools.rs` around lines 534 - 550, Add regression tests to strengthen should_block_user_visible_text: include negative assertions that plaintext starting with bracket-prefixed labels (e.g., "[note] hello" or "[reply] hello" where the content is plain text, not JSON/array) should NOT be blocked, and include negative assertions for generic wrapper-like tags (e.g., "<wrapper>visible</wrapper>" or common tags like "<div>visible</div>") to ensure the generic tag heuristic doesn't produce false positives; update the existing test module (the functions blocks_json_bracket_and_tool_syntax_output and allows_normal_plaintext_output) by adding these specific assert!(!should_block_user_visible_text(...)) cases referencing should_block_user_visible_text so future changes that reintroduce the Line 209 false-positive class or wrapper-tag detection gaps will be caught.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/tools.rs`:
- Around line 209-219: The current matcher over-blocks any text starting with
'[' and only allows two hardcoded XML tags, causing false positives/negatives;
update the helper that uses trimmed and TOOL_PREFIXES so it: (1) only treats
leading '{' as a tool if the text is valid JSON/object (attempt a quick
serde_json::from_str check), (2) treat leading '[' as a tool only when it
matches a tool invocation pattern like '^\[\s*[A-Za-z0-9_-]+.*\]' (use a small
regex or manual parse to ensure the bracketed token looks like a tool name, not
a generic bracketed heading), and (3) replace the hardcoded tag checks (the
current lower.starts_with("<system-reminder>") || lower.starts_with("<path>"))
with a generic XML-wrapper detection that matches a single opening tag like
'^<\s*[A-Za-z0-9_-]+>' (use a case-insensitive regex on trimmed or lower) so
arbitrary XML-like wrapper tags are caught; keep the existing TOOL_PREFIXES
check intact. Ensure to reference TOOL_PREFIXES and the trimmed variable when
editing this helper.
---
Nitpick comments:
In `@src/agent/channel.rs`:
- Around line 1421-1455: There are three duplicated flows (block check via
crate::tools::should_block_user_visible_text, extract_reply_from_tool_syntax,
crate::tools::reply::normalize_discord_mention_tokens,
conversation_logger.log_bot_message, and sending via
response_tx.send(OutboundResponse::Text)) scattered across retrigger and
non-retrigger branches; consolidate them into a single async helper method
(e.g., send_safe_fallback_text(&self, text: &str, is_retrigger: bool, skipped:
bool)) that runs the block check, performs extraction with
extract_reply_from_tool_syntax, normalizes mentions with
normalize_discord_mention_tokens (using the same source lookup from
self.conversation_id), logs warnings when extraction occurred, logs the final
bot message via self.state.conversation_logger.log_bot_message, and sends via
self.response_tx.send, preserving the existing tracing.warn/error messages and
returned behavior; replace the three duplicated blocks with calls to this helper
so future guard/policy changes are centralized.
In `@src/tools.rs`:
- Around line 534-550: Add regression tests to strengthen
should_block_user_visible_text: include negative assertions that plaintext
starting with bracket-prefixed labels (e.g., "[note] hello" or "[reply] hello"
where the content is plain text, not JSON/array) should NOT be blocked, and
include negative assertions for generic wrapper-like tags (e.g.,
"<wrapper>visible</wrapper>" or common tags like "<div>visible</div>") to ensure
the generic tag heuristic doesn't produce false positives; update the existing
test module (the functions blocks_json_bracket_and_tool_syntax_output and
allows_normal_plaintext_output) by adding these specific
assert!(!should_block_user_visible_text(...)) cases referencing
should_block_user_visible_text so future changes that reintroduce the Line 209
false-positive class or wrapper-tag detection gaps will be caught.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
prompts/en/fragments/system/tool_syntax_correction.md.j2src/agent/channel.rssrc/tools.rssrc/tools/reply.rs
| if trimmed.starts_with('{') || trimmed.starts_with('[') { | ||
| return true; | ||
| } | ||
|
|
||
| let lower = trimmed.to_ascii_lowercase(); | ||
| if TOOL_PREFIXES.iter().any(|prefix| lower.starts_with(prefix)) { | ||
| return true; | ||
| } | ||
|
|
||
| lower.starts_with("<system-reminder>") || lower.starts_with("<path>") | ||
| } |
There was a problem hiding this comment.
Refine matcher scope to avoid both over-blocking and wrapper-tag bypasses.
Line 209 blocks any leading [ text, so valid replies like bracketed headings can be suppressed. Lines 218-219 only check two hardcoded tags, so other XML-like wrappers can still pass. This helper is shared by fallback + reply, so these mismatches affect all user-visible paths.
🔧 Proposed fix
- if trimmed.starts_with('{') || trimmed.starts_with('[') {
+ if (trimmed.starts_with('{') || trimmed.starts_with('['))
+ && serde_json::from_str::<serde_json::Value>(trimmed).is_ok()
+ {
return true;
}
@@
- lower.starts_with("<system-reminder>") || lower.starts_with("<path>")
+ if lower.starts_with("<system-reminder>") || lower.starts_with("<path>") {
+ return true;
+ }
+
+ // Block generic XML-like wrapper tags at start of message, e.g. <file ...>, <path>, <system-reminder>.
+ if let Some(rest) = trimmed.strip_prefix('<') {
+ if let Some(end_index) = rest.find('>') {
+ let tag_name = rest[..end_index]
+ .split_whitespace()
+ .next()
+ .unwrap_or("");
+ if !tag_name.is_empty()
+ && tag_name
+ .chars()
+ .all(|character| character.is_ascii_alphanumeric() || character == '_' || character == '-')
+ {
+ return true;
+ }
+ }
+ }
+
+ false
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/tools.rs` around lines 209 - 219, The current matcher over-blocks any
text starting with '[' and only allows two hardcoded XML tags, causing false
positives/negatives; update the helper that uses trimmed and TOOL_PREFIXES so
it: (1) only treats leading '{' as a tool if the text is valid JSON/object
(attempt a quick serde_json::from_str check), (2) treat leading '[' as a tool
only when it matches a tool invocation pattern like '^\[\s*[A-Za-z0-9_-]+.*\]'
(use a small regex or manual parse to ensure the bracketed token looks like a
tool name, not a generic bracketed heading), and (3) replace the hardcoded tag
checks (the current lower.starts_with("<system-reminder>") ||
lower.starts_with("<path>")) with a generic XML-wrapper detection that matches a
single opening tag like '^<\s*[A-Za-z0-9_-]+>' (use a case-insensitive regex on
trimmed or lower) so arbitrary XML-like wrapper tags are caught; keep the
existing TOOL_PREFIXES check intact. Ensure to reference TOOL_PREFIXES and the
trimmed variable when editing this helper.
Summary
should_block_user_visible_textguard to detect tool-like/structured payloads (JSON, bracket tool syntax, wrapper tags) before anything user-visible is sent.replywith plain text orskip.replytool and expand the correction prompt wording so malformed payloads are blocked consistently across fallback and tool paths.Example
Testing
cargo test blocks_json_bracket_and_tool_syntax_outputcargo test allows_normal_plaintext_outputNote
This PR prevents LLM-generated structured payloads (JSON, bracket syntax, XML-like tags) from reaching end users as plain text. The solution introduces a shared validation function
should_block_user_visible_textthat filters output across three paths: the channel recovery loop (up to 2 retries), the fallback handler when retrigger produces text, and thereplytool directly. The correction prompt is now more explicit about what constitutes invalid output. All changes are in agent loop and tool enforcement with comprehensive test coverage.Written by Tembo for commit 1d35d12.