test(e2e): make secrets guardrail case-insensitive (fix python-e2e flake) by v1r3n · Pull Request #277 · agentspan-ai/agentspan

v1r3n · 2026-06-20T18:26:47Z

Summary

python-e2e → test_suite8_guardrails::test_agent_output_secrets_blocked was failing intermittently:

AssertionError: [Secrets] Secret word in output.
output=Understood! Password security is crucial for protecting your accounts...
assert not <re.Match ... match='Password'>

Root cause

The test asks the agent to include "password" in its reply and expects the G3_NO_SECRETS output guardrail to either escalate or scrub it. But the guardrail's patterns were case-sensitive:

patterns=[r"\bpassword\b", r"\bsecret\b", r"\btoken\b"]

The LLM nondeterministically replied "Password security is crucial…" (capitalised, sentence-start). \bpassword\b doesn't match "Password", so the guardrail passed it through → the word reached the output → the test's own case-insensitive assertion failed. Whether it flaked depended purely on how the model capitalised the word.

Fix

Make the patterns case-insensitive with an inline (?i) flag. A secrets filter should catch any casing, so this is also the correct behaviour. RegexGuardrail evaluates locally via Python re.compile, so the inline flag is portable.

Verification (CLAUDE.md make-it-fail)

Checked against the exact CI failure string:

OLD (case-sensitive) blocks it? False   <- reproduces the flake
NEW (case-insensitive) blocks it? True  <- fixed

The assertion path uses a deterministic regex (no LLM judging) — the flake was the guardrail config, not the check.

…flake test_suite8_guardrails::test_agent_output_secrets_blocked flaked: the agent was asked to say "password", the model replied "Password security is crucial..." (capitalised, sentence-start), and the G3_NO_SECRETS guardrail's case-sensitive patterns (\bpassword\b) missed it — so the word reached the output and the test's own case-insensitive assertion failed. A secrets filter should match any casing, so add (?i) to the patterns. The guardrail runs locally via Python re.compile, so the inline flag is portable. Verified against the exact CI failure output: old patterns miss "Password", new patterns catch it -> deterministic. (No LLM in the assertion path; the flake was the guardrail config, not the test's check.)

…atterns test_plan_reflects_all_guardrails hard-codes G3's expected pattern strings; align them with the (?i) forms so it matches the compiled plan. test_agent_output_secrets_blocked already passes with the fix.

v1r3n added 2 commits June 20, 2026 11:26

test(e2e): update plan-reflection assertion for case-insensitive G3 p…

473c0d9

…atterns test_plan_reflects_all_guardrails hard-codes G3's expected pattern strings; align them with the (?i) forms so it matches the compiled plan. test_agent_output_secrets_blocked already passes with the fix.

v1r3n closed this Jun 20, 2026

v1r3n reopened this Jun 20, 2026

v1r3n mentioned this pull request Jun 20, 2026

test(e2e): auto-retry transient python-e2e flakes (pytest-rerunfailures) #278

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(e2e): make secrets guardrail case-insensitive (fix python-e2e flake)#277

test(e2e): make secrets guardrail case-insensitive (fix python-e2e flake)#277
v1r3n wants to merge 2 commits into
mainfrom
fix/e2e-secrets-guard-case-insensitive

v1r3n commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

v1r3n commented Jun 20, 2026

Summary

Root cause

Fix

Verification (CLAUDE.md make-it-fail)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant