ci: Move realm server tests into shards#4269
Conversation
# Conflicts: # .github/workflows/ci.yaml # README.md # packages/realm-server/tests/index.ts
Preview deployments |
Realm Server Test Results 1 files 1 suites 11m 44s ⏱️ Results for commit eace2d9. ♻️ This comment has been updated with latest results. |
lukemelia
left a comment
There was a problem hiding this comment.
Does AGENTS.md need updating?
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 48365eaac5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
I missed that thanks, it now has an extra line about how to run multiple test files at once. |
There was a problem hiding this comment.
Pull request overview
This PR restructures realm-server CI to run tests in 6 shards (instead of one job per test module) to reduce repeated setup overhead, and adds JUnit reporting to improve visibility into failures.
Changes:
- Replace per-test-module CI matrix with a 6-shard matrix and compute module lists per shard automatically.
- Add support for
TEST_MODULESinpackages/realm-server/tests/index.tsto filter the test suite to a subset of modules. - Add per-shard JUnit output and a workflow job to merge + publish the combined test report.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| README.md | Updates realm-server test-running instructions, including TEST_MODULES. |
| AGENTS.md | Documents running multiple modules via TEST_MODULES. |
| packages/realm-server/tests/scripts/run-qunit-with-test-pg.sh | Optionally attaches a JUnit reporter when JUNIT_OUTPUT_FILE is set. |
| packages/realm-server/tests/indexing-test.ts | Tweaks a test to wait for coalesced indexing URLs before asserting. |
| packages/realm-server/tests/index.ts | Implements TEST_MODULES parsing and QUnit module filtering. |
| packages/realm-server/scripts/shard-test-modules.js | Discovers and round-robins *-test.ts modules into shards for CI. |
| packages/realm-server/scripts/run-test-modules.js | Utility runner that applies a QUnit filter derived from TEST_MODULES. |
| packages/realm-server/scripts/lint-test-shards.ts | Removes the previous CI lint that required enumerating every test module in the workflow. |
| packages/realm-server/scripts/junit-reporter.js | Adds a QUnit event-based JUnit XML reporter used in CI. |
| packages/realm-server/package.json | Updates test:wait-for-servers to run test (filtered by TEST_MODULES) and removes lint:test-shards. |
| .github/workflows/ci.yaml | Converts realm-server tests to a 6-shard matrix, uploads per-shard JUnit, merges & publishes results. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - Run a single module: | ||
| `TEST_MODULE=card-endpoints-test.ts pnpm test-module` | ||
| - Run a list of modules: | ||
| `TEST_MODULES=card-endpoints-test.ts|another-module-test.ts pnpm test` |
There was a problem hiding this comment.
The TEST_MODULES example needs quoting (or escaping) because | is a shell pipe. As written, TEST_MODULES=card-endpoints-test.ts|another-module-test.ts pnpm test will try to pipe to another-module-test.ts as a command instead of setting the env var. Use TEST_MODULES="card-endpoints-test.ts|another-module-test.ts" pnpm test (or single quotes).
| `TEST_MODULES=card-endpoints-test.ts|another-module-test.ts pnpm test` | |
| `TEST_MODULES="card-endpoints-test.ts|another-module-test.ts" pnpm test` |
| if (rows.length !== 1) { | ||
| return undefined; | ||
| } | ||
| let urls = rows[0].args.changes | ||
| .map((change) => change.url) | ||
| .sort(); | ||
| return urls.length === expectedUrls.length ? rows[0] : undefined; |
There was a problem hiding this comment.
The waitUntil predicate only checks that the job contains the expected count of URLs, not that the set of URLs matches expectedUrls. That can cause the wait to terminate early (and then fail the later assertion) instead of waiting until the coalesced URLs are actually present, which makes the test more flaky. Consider comparing the sorted URL list to the expected sorted list inside the predicate and only returning the row when they match.
| cp -a .test-web-assets-artifact/. ./ | ||
| - name: Compute shard test modules | ||
| id: shard_modules | ||
| run: echo "modules=$(node scripts/shard-test-modules.js ${{ matrix.shardIndex }} 6)" >> "$GITHUB_OUTPUT" |
There was a problem hiding this comment.
This step hardcodes the total shard count as 6 even though the matrix already defines shardTotal. To avoid drift if the shard count changes later, pass ${{ matrix.shardTotal }} to shard-test-modules.js instead of a constant.
| run: echo "modules=$(node scripts/shard-test-modules.js ${{ matrix.shardIndex }} 6)" >> "$GITHUB_OUTPUT" | |
| run: echo "modules=$(node scripts/shard-test-modules.js ${{ matrix.shardIndex }} ${{ matrix.shardTotal }})" >> "$GITHUB_OUTPUT" |
| JUNIT_REPORTER_ARGS="" | ||
| if [ -n "${JUNIT_OUTPUT_FILE-}" ]; then | ||
| JUNIT_REPORTER_ARGS="--require ${SCRIPT_DIR}/../../scripts/junit-reporter.js" | ||
| fi | ||
|
|
||
| LOG_LEVELS="$EFFECTIVE_LOG_LEVELS" \ | ||
| NODE_NO_WARNINGS=1 \ | ||
| PGPORT=55436 \ | ||
| STRIPE_WEBHOOK_SECRET=stripe-webhook-secret \ | ||
| STRIPE_API_KEY=stripe-api-key \ | ||
| qunit --require ts-node/register/transpile-only "$@" tests/index.ts | ||
| qunit --require ts-node/register/transpile-only $JUNIT_REPORTER_ARGS "$@" tests/index.ts |
There was a problem hiding this comment.
$JUNIT_REPORTER_ARGS is expanded unquoted, and it's built as a single string that contains both the flag and the path. This relies on word-splitting and can break if the workspace path contains spaces (or other IFS characters). Consider using a Bash array for optional args (e.g., JUNIT_REPORTER_ARGS=(--require "...")) and expanding it as "${JUNIT_REPORTER_ARGS[@]}".
This is #3831, continued. It groups realm server tests into six shards, because having one job per test file was becoming untenable. See below for further detail on the timing metrics, but high-level: we’re wasting hours of CI time per run with repetitive setup for each test file.
The grouping is via naïve round robin, which means some shards will take longer than others. We could rig up some kind of load balancing using shard history from CI, but that seemed excessive at this point.
We no longer need lint confirmation that every test file is manually listed in the CI workflow, as the grouping finds every file automatically.
I added a test report akin to the host tests, which makes it easier to see failures at a high level instead of having to scroll through Actions job logs:
Metrics for the existing approach
From the last 20 runs on
main:realm-endpoints-test.ts)~4 minutes of every shard is pure setup overhead (checkout, init, download artifacts, start services, register users). The actual test execution in most modules is only ~1 minute. Running 71 shards means paying that 4-minute overhead 71 times = ~284 minutes of wasted setup per run.