Skip to content

test(challenges): match flags config to production so both iframes surface#107

Open
Rinse12 wants to merge 3 commits into
masterfrom
test/flags-two-iframe-challenge
Open

test(challenges): match flags config to production so both iframes surface#107
Rinse12 wants to merge 3 commits into
masterfrom
test/flags-two-iframe-challenge

Conversation

@Rinse12

@Rinse12 Rinse12 commented May 31, 2026

Copy link
Copy Markdown
Collaborator

What

The politically-incorrect.bso config has two interactive iframe challenges, spam-blocker (C2) and flags (C5), and in production both are presented together so a regular author must solve both. The test suite in test/challenges/flags-two-iframe.test.ts was using a flags config that diverged from production by dropping the { challenges: [0, 1] } exclude rule, which made the tests red for the wrong reason.

This PR restores FLAGS_EXCLUDE to match production and rewrites the now-inaccurate comments.

Why this turns the tests green

The orchestrator defers every remaining undecided challenge the moment any challenge goes pending. So whether both iframes surface depends on them becoming "ready" in the same orchestrator round. The production flags exclude { challenges: [0, 1] } (skip flags only if BOTH publication-match and whitelist succeeded) holds the flags challenge back until C0 and C1 are decided. For a regular author both fail, and on that same round the spam-blocker (whose excludes also depend on 0 and 1) becomes ready too, so both flip to pending together and both are shown.

The earlier config dropped that rule, making flags ready in round 1 and pending alone; the orchestrator then deferred the spam-blocker and silently dropped it at verify time. The original header comment had this causality backwards (it framed the index excludes as the cause of the deferral when they are the workaround that avoids it), so the comments are corrected too.

Note on the underlying bug

This is a config that happens to dodge a real orchestrator limitation: a second interactive challenge that is deferred is silently dropped at verify time. That latent bug is tracked separately in #106. This PR only aligns the test with production behaviour and documents the mechanism.

Test

node test/run-test-config.js --environment node test/challenges/flags-two-iframe.test.ts

All 10 tests pass.

Summary by CodeRabbit

Tests

  • Added comprehensive test coverage for multiple interactive iframe challenges displayed together in a single message.
  • Tests validate correct orchestration across different user roles and configurations, ensuring proper challenge sequencing and verification flows.

Rinse12 added 3 commits May 31, 2026 08:04
…nges in one ChallengeMessage

Reproduces the scenario where two interactive iframe challenges (spam-blocker and
flags) must share a single ChallengeMessage. The orchestrator currently surfaces only
one interactive challenge per request and silently drops the other, so a comment can
publish with one challenge never verified. These tests assert the correct behaviour
and are expected to fail until the orchestrator is fixed.
…orrect.bso config

Expand flags-two-iframe.test.ts beyond the single regular-user/two-iframe path
to assert every user path the config's comment block describes:

- mods/admins/owners run no challenges (passes today)
- .bso authors run only flags (passes today)
- whitelisted authors run only flags (passes today)
- regular author, AI allow -> publishes, no pending approval (red until orchestrator fix)
- regular author, AI review -> pending approval (red until orchestrator fix)
- spam-blocker failed -> AI moderation must not run, token-cost guard (red until orchestrator fix)

Parameterize the existing mocks (spamBlockerShouldFail, aiVerdict) and mirror
@bitsocial/ai-moderation-challenge: one verdict ("allow" | "review") drives the
allow/review branch entries, with pendingApproval on the review entry.

The 6 red tests pin the same deferred-interactive-challenge defect the file
already gates, and show it also breaks AI allow/pending routing and the
spam-blocker-fail short-circuit for regular users.
…rface

Restore the production politically-incorrect.bso flags exclude rule
{ challenges: [0, 1] } to FLAGS_EXCLUDE. That rule holds the flags
challenge back until publication-match (0) and whitelist (1) are decided,
which lines its readiness up with the spam-blocker so both interactive
iframes flip to pending in the same round and both are presented.

The earlier config dropped that rule, making flags ready in round 1 and
pending alone; the orchestrator then deferred the spam-blocker and
silently dropped it at verify time. Rewrite the header and FLAGS_EXCLUDE
comments to explain why the tests are now green and how the underlying
drop is still latent (tracked in #106).
@coderabbitai

coderabbitai Bot commented May 31, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds a new test suite for the politically-incorrect.bso challenge orchestration, validating the case where two iframe challenges (spam-blocker and flags) must execute together in a single ChallengeMessage. The suite includes mock challenge factories, call-count tracking, per-role behavior validation, and AI moderation deferral verification.

Changes

Two-iframe challenge orchestration validation

Layer / File(s) Summary
Test setup and shared state
test/challenges/flags-two-iframe.test.ts (lines 1–62)
Random-address helper, per-factory call counters, AI verdict toggles, and reset function to zero test state between cases.
Mock challenge factories and configuration
test/challenges/flags-two-iframe.test.ts (lines 63–171)
Four mock challenges (spam-blocker iframe, flags iframe, two AI moderation outcomes) injected into community settings; C0–C4 configuration with FLAGS_EXCLUDE to synchronize flags readiness with spam-blocker completion and publication decisions.
Community builder and request utilities
test/challenges/flags-two-iframe.test.ts (lines 172–207)
buildCommunity() assembles LocalCommunity with mocked challenges; request builders for regular, .bso, whitelisted, and role-based authors; answerEverything() helper for automated challenge answering.
Two-iframe behavior test suite
test/challenges/flags-two-iframe.test.ts (lines 209–293)
Asserts both iframe challenges appear together in a single pending round; ChallengeMessage wire payload contains both iframes and strips verify/index; both verify() methods execute without dropping either iframe; AI moderation deferred during iframe completion.
Per-user-type scenario coverage
test/challenges/flags-two-iframe.test.ts (lines 295–401)
Validates challenge selection across roles: owners/admins/moderators skip all challenges; .bso and whitelisted authors run flags only; regular authors trigger AI moderation with allow or review outcomes; AI moderation guarded against execution when spam-blocker fails verification.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding a test that validates the flags configuration matches production behavior so both interactive iframe challenges surface together.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test/flags-two-iframe-challenge

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/challenges/flags-two-iframe.test.ts (1)

39-39: ⚡ Quick win

Use deterministic test addresses to avoid flaky collisions.

Math.random() in tests makes failures harder to reproduce and can (rarely) collide in role-mapped cases. Prefer a monotonic deterministic generator.

Suggested change
-const getRandomAddress = () => String(Math.random());
+let addressSeq = 0;
+const getRandomAddress = () => `test-author-${addressSeq++}`;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/challenges/flags-two-iframe.test.ts` at line 39, Replace the
non-deterministic Math.random() in getRandomAddress with a deterministic
monotonic generator to avoid flaky collisions: implement getRandomAddress as a
closure-scoped counter (e.g., let addrCounter = 1; return () =>
String(addrCounter++)) or a seeded PRNG if needed, and update any test
expectations that assumed the previous format; locate and replace the
getRandomAddress function in the test to return incrementing unique addresses
instead of String(Math.random()).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/challenges/flags-two-iframe.test.ts`:
- Line 39: Replace the non-deterministic Math.random() in getRandomAddress with
a deterministic monotonic generator to avoid flaky collisions: implement
getRandomAddress as a closure-scoped counter (e.g., let addrCounter = 1; return
() => String(addrCounter++)) or a seeded PRNG if needed, and update any test
expectations that assumed the previous format; locate and replace the
getRandomAddress function in the test to return incrementing unique addresses
instead of String(Math.random()).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 356a434f-0ff7-4dfc-a14c-1cde5b0bc984

📥 Commits

Reviewing files that changed from the base of the PR and between 9847473 and d1dc04d.

📒 Files selected for processing (1)
  • test/challenges/flags-two-iframe.test.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant