test(challenges): match flags config to production so both iframes surface#107
test(challenges): match flags config to production so both iframes surface#107Rinse12 wants to merge 3 commits into
Conversation
…nges in one ChallengeMessage Reproduces the scenario where two interactive iframe challenges (spam-blocker and flags) must share a single ChallengeMessage. The orchestrator currently surfaces only one interactive challenge per request and silently drops the other, so a comment can publish with one challenge never verified. These tests assert the correct behaviour and are expected to fail until the orchestrator is fixed.
…orrect.bso config
Expand flags-two-iframe.test.ts beyond the single regular-user/two-iframe path
to assert every user path the config's comment block describes:
- mods/admins/owners run no challenges (passes today)
- .bso authors run only flags (passes today)
- whitelisted authors run only flags (passes today)
- regular author, AI allow -> publishes, no pending approval (red until orchestrator fix)
- regular author, AI review -> pending approval (red until orchestrator fix)
- spam-blocker failed -> AI moderation must not run, token-cost guard (red until orchestrator fix)
Parameterize the existing mocks (spamBlockerShouldFail, aiVerdict) and mirror
@bitsocial/ai-moderation-challenge: one verdict ("allow" | "review") drives the
allow/review branch entries, with pendingApproval on the review entry.
The 6 red tests pin the same deferred-interactive-challenge defect the file
already gates, and show it also breaks AI allow/pending routing and the
spam-blocker-fail short-circuit for regular users.
…rface
Restore the production politically-incorrect.bso flags exclude rule
{ challenges: [0, 1] } to FLAGS_EXCLUDE. That rule holds the flags
challenge back until publication-match (0) and whitelist (1) are decided,
which lines its readiness up with the spam-blocker so both interactive
iframes flip to pending in the same round and both are presented.
The earlier config dropped that rule, making flags ready in round 1 and
pending alone; the orchestrator then deferred the spam-blocker and
silently dropped it at verify time. Rewrite the header and FLAGS_EXCLUDE
comments to explain why the tests are now green and how the underlying
drop is still latent (tracked in #106).
📝 WalkthroughWalkthroughThis PR adds a new test suite for the politically-incorrect.bso challenge orchestration, validating the case where two iframe challenges (spam-blocker and flags) must execute together in a single ChallengeMessage. The suite includes mock challenge factories, call-count tracking, per-role behavior validation, and AI moderation deferral verification. ChangesTwo-iframe challenge orchestration validation
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/challenges/flags-two-iframe.test.ts (1)
39-39: ⚡ Quick winUse deterministic test addresses to avoid flaky collisions.
Math.random()in tests makes failures harder to reproduce and can (rarely) collide in role-mapped cases. Prefer a monotonic deterministic generator.Suggested change
-const getRandomAddress = () => String(Math.random()); +let addressSeq = 0; +const getRandomAddress = () => `test-author-${addressSeq++}`;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/challenges/flags-two-iframe.test.ts` at line 39, Replace the non-deterministic Math.random() in getRandomAddress with a deterministic monotonic generator to avoid flaky collisions: implement getRandomAddress as a closure-scoped counter (e.g., let addrCounter = 1; return () => String(addrCounter++)) or a seeded PRNG if needed, and update any test expectations that assumed the previous format; locate and replace the getRandomAddress function in the test to return incrementing unique addresses instead of String(Math.random()).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@test/challenges/flags-two-iframe.test.ts`:
- Line 39: Replace the non-deterministic Math.random() in getRandomAddress with
a deterministic monotonic generator to avoid flaky collisions: implement
getRandomAddress as a closure-scoped counter (e.g., let addrCounter = 1; return
() => String(addrCounter++)) or a seeded PRNG if needed, and update any test
expectations that assumed the previous format; locate and replace the
getRandomAddress function in the test to return incrementing unique addresses
instead of String(Math.random()).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 356a434f-0ff7-4dfc-a14c-1cde5b0bc984
📒 Files selected for processing (1)
test/challenges/flags-two-iframe.test.ts
What
The
politically-incorrect.bsoconfig has two interactive iframe challenges, spam-blocker (C2) and flags (C5), and in production both are presented together so a regular author must solve both. The test suite intest/challenges/flags-two-iframe.test.tswas using a flags config that diverged from production by dropping the{ challenges: [0, 1] }exclude rule, which made the tests red for the wrong reason.This PR restores
FLAGS_EXCLUDEto match production and rewrites the now-inaccurate comments.Why this turns the tests green
The orchestrator defers every remaining undecided challenge the moment any challenge goes pending. So whether both iframes surface depends on them becoming "ready" in the same orchestrator round. The production flags exclude
{ challenges: [0, 1] }(skip flags only if BOTH publication-match and whitelist succeeded) holds the flags challenge back until C0 and C1 are decided. For a regular author both fail, and on that same round the spam-blocker (whose excludes also depend on 0 and 1) becomes ready too, so both flip to pending together and both are shown.The earlier config dropped that rule, making flags ready in round 1 and pending alone; the orchestrator then deferred the spam-blocker and silently dropped it at verify time. The original header comment had this causality backwards (it framed the index excludes as the cause of the deferral when they are the workaround that avoids it), so the comments are corrected too.
Note on the underlying bug
This is a config that happens to dodge a real orchestrator limitation: a second interactive challenge that is deferred is silently dropped at verify time. That latent bug is tracked separately in #106. This PR only aligns the test with production behaviour and documents the mechanism.
Test
All 10 tests pass.
Summary by CodeRabbit
Tests