Skip to content

ci: run the iOS test target on a simulator and fix the rot it surfaced#635

Merged
turnipdabeets merged 10 commits into
mainfrom
fix/ios-tests-in-ci
Jun 9, 2026
Merged

ci: run the iOS test target on a simulator and fix the rot it surfaced#635
turnipdabeets merged 10 commits into
mainfrom
fix/ios-tests-in-ci

Conversation

@turnipdabeets

Copy link
Copy Markdown
Contributor

💡 Motivation and Context

The iOS test target was never running in CI. make test runs swift test on macOS, which compiles out every #if os(iOS) block, so the session replay, autocapture, tracing-header, and crash-reporting suites never executed. They silently rotted: broken/outdated tests "passed" because they weren't running.

This adds a test-ios-simulator CI job that runs the suite on a real simulator, then fixes everything that turned red once the tests actually ran. No SDK source changes — pure test/CI infrastructure.

What the run surfaced and we fixed:

  • Crash-processor tests were a no-op (compiled out because the SDK imports PHPLCrashReporter @_implementationOnly). Wired the module into the test target so all 13 now run.
  • Replay/sampling tests still read the old .sessionReplay storage slice that fix: keep session replay active across identify/reset #630 removed. Migrated them to the whole-config .remoteConfig model.
  • Cross-suite leak: a React Native test set the global postHogSdkName and never restored it, dropping every later test into RN mode (which no-ops sessions) and breaking the replay-trigger + tracing-header suites. Now restored on teardown.
  • Missing survey fixture in the Xcode project caused a fatal crash under xcodebuild. Added it.
  • Flaky queue test (flush respecting flushAt) had a real async race against the background flush. Deflaked it.
  • Plus a few async-timing fixes and a missing testAPIKey test constant.
  • Made testOniOSSimulator pick an available simulator dynamically (was pinned to iPhone 15).

💚 How did you test it?

  • Full suite green locally: 564 tests under swift test (macOS), 557 Swift Testing + 161 XCTest on the iOS simulator.
  • Crash-processor suite (13 tests) now runs on both SPM and simulator (was compiled out).
  • Ran the previously-flaky queue test 8× consecutively on the simulator — all green.

📝 Checklist

  • I reviewed the submitted code.
  • I added tests to verify the changes.
  • I updated the docs if needed.
  • No breaking change or entry added to the changelog.

If releasing new changes

  • Ran pnpm changeset to generate a changeset file

@turnipdabeets turnipdabeets force-pushed the fix/ios-tests-in-ci branch 10 times, most recently from c639ca3 to ea22b4f Compare June 5, 2026 16:36
The iOS test target (#if os(iOS)) was never run in CI: `make test` runs
`swift test` on macOS, which compiles out every `#if os(iOS)` block, so the
replay, autocapture, tracing, crash-reporting and surveys suites never ran and
had silently rotted. Add a `test-ios-simulator` job that runs
`make testOniOSSimulator` (now picks an available simulator dynamically) and fix
everything that running the full suite surfaced. No SDK source changes.

- TestPostHog: define the missing `testAPIKey`... then realized two tests only
  used the deprecated PostHogConfig(apiKey:) init incidentally (they test the
  tracingHeaders default / are a tracing makeSut helper, not the deprecated API).
  Migrated both to projectToken and removed testAPIKey. The deliberate
  init(apiKey:) regression tests in PostHogConfigTest stay (they use projectToken).
- PostHogCrashReportProcessor tests now actually run: the test target takes its
  own direct dependency on the vendored PHPLCrashReporter module (Package.swift
  for SPM; SWIFT_INCLUDE_PATHS on the xcodeproj test target so the modulemap
  resolves) and imports it directly, since the SDK imports it @_implementationOnly
  and @testable import PostHog doesn't re-export it. 13 tests, previously a no-op
  TODO that compiled out.
- project.pbxproj: add fixture_survey_conditions_event_property_filters.json to
  the PostHogTests resources (on disk but not bundled, so Bundle.test couldn't
  find it under xcodebuild -> fatal crash; it passed under SPM).
- PostHogSDKTest feature-flag-event tests: deflake on slow CI. waitFlagsRequest
  only proves the /flags request arrived, not that the SDK stored the response, so
  the immediate flag asserts could race ahead (read nil flags, or an empty
  request_id). New waitForFeatureFlagsLoaded polls remoteConfig.lastRequestId,
  which is set atomically with the flags + v4 metadata once the response lands.
- Nimble polling timeout + queue async assertions: shared CI runners run several times slower
  than local (one job was ~6x: 174s for 166 tests vs ~28s), starving background work so the
  queue cap-halving / 413-retry toEventually assertions time out. A QuickConfiguration raises
  PollingDefaults.timeout to 30s for the whole suite (toEventually still returns early on pass),
  PostHogQueueTest's post-getBatchedEvents depth check polls instead of asserting synchronously,
  and a too-tight explicit 5s override was dropped to inherit the global ceiling.
- PostHogQueueTest "flush respecting flushAt": deflake a real async race. flush()
  takes the batch asynchronously via peek(), so adding a third event right after
  the one that crosses flushAt could race into the in-flight peek() and join the
  batch (count 3 / depth 0 instead of 2 / 1). Drain the flush before adding more.
- ApplicationViewLayoutPublisherTest: poll for the async effect instead of a fixed
  sleep for the should-fire assertions (a fixed sleep can lose to a slow runner);
  keep a small settle for the throttle-suppression checks, which must let each
  async block read the mocked clock before the clock advances.
- Wipe persisted state before every Quick test. storage.reset() deliberately keeps the on-disk
  event queue (events are disk-backed from add()), so a prior test's unsent event leaks into
  the next test's batch and inflates counts (e.g. opt-out expected 0 got 1). The global
  QuickConfiguration.beforeEach now deletes the app-support dir so each test starts clean.
- Close leaked SDK instances. Tests create a PostHogSDK per case but many never call close(),
  so the SDK's queues, timers, URLSession, reachability notifier and integration observers
  stay live; across ~40 instances per run they pile up and starve the background thread pool,
  stalling async work (a flag load timed out at 30s on a loaded CI runner, and the process
  hung past teardown). PostHogSDKTest tracks every getSut() SDK and closes them in afterEach;
  PostHogFeatureFlagsTest does the same via its BaseTestClass deinit. close() is idempotent.
- Reset the global `now` clock before every test. Several suites mock `now` to a fixed date
  and never restore it (ApplicationViewLayoutPublisherTest, PostHogMulticastCallbackTest, some
  PostHogSDKTest cases); a leaked fixed clock makes the timestamp-keyed event queue collide
  records so batches are lost — surfacing as order-dependent "requests never arrived" /
  flag-not-loaded failures that retries can't fix (same process keeps the leak). A
  QuickConfiguration.beforeEach resets it for every Quick test; the Swift Testing leakers
  restore it in a defer.
- PostHogRemoteConfigTest / PostHogSamplingTest: migrate the session-replay tests
  to the whole-config model (read recording config from .remoteConfig, not the
  removed .sessionReplay slice; drive it via reloadRemoteConfig; expect the
  /config endpoint /s/). Clear .remoteConfig in setup since it survives reset().
- MockPostHogServer: /config now emits sessionRecording.linkedFlag (parity /flags).
- PostHogSessionManagerTest.ReactNativeTests: was a struct that set the global
  postHogSdkName = "posthog-react-native" and never restored it, leaking React
  Native mode into every later serialized suite and no-opping their session
  managers (no session) — which broke the session-dependent replay event-trigger
  and tracing-header tests. Made it a class that restores postHogSdkName / now /
  the lifecycle publisher in deinit.
- PostHogTracingHeadersTest: scope the backgrounded-lifecycle mock to the one
  session-ended test (it was injected for the whole suite, breaking the others).
- PostHogSessionReplayEventTriggersTest: seed recording config in .remoteConfig and
  reset the replay integration's static install flag so each test installs fresh.

- testOniOSSimulator: -retry-tests-on-failure -test-iterations 3. After the leak/contamination
  fixes a small tail remains: autocapture debounce/flush tests assert real-time windows that
  can't be made deterministic and occasionally slip on slow CI runners. Rerun a failed test up
  to 3x so transient misses don't fail the job; a broken test fails all 3 and stays red. (The
  earlier retry-induced post-test hang was the resource leak above — fixed — not the retries.)

Full suite passes locally: 564 tests under `swift test` (macOS), 557 + 161 on the
simulator.
@turnipdabeets turnipdabeets force-pushed the fix/ios-tests-in-ci branch from ea22b4f to 0e9df4b Compare June 5, 2026 16:53
@turnipdabeets turnipdabeets marked this pull request as ready for review June 5, 2026 17:19
@turnipdabeets turnipdabeets requested a review from a team as a code owner June 5, 2026 17:19
@greptile-apps

greptile-apps Bot commented Jun 5, 2026

Copy link
Copy Markdown
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
PostHogTests/PostHogRemoteConfigTest.swift:537-544
**Repeated two-step reload sequence (OnceAndOnlyOnce)**

The `await reloadRemoteConfig` + `await loadFeatureFlags(distinctId:anonymousId:groups:callback:)` block appears verbatim ~12 times across the `TestSessionReplayFlags` and `TestLinkedFlags` suites (lines 537–544, 563–570, 589–596, 617–624, 645–652, 671–678, 704–711, 739–746, 769–776, 795–802, 861–876, 891–906 …). Extracting it into a single helper — e.g. `func reloadConfigThenFlags(_ sut: PostHogRemoteConfig) async` — would honour the team's OnceAndOnlyOnce rule and make each test's intent clearer.

### Issue 2 of 2
Makefile:82-84
If `xcrun simctl list devices available` returns no matching lines (e.g. on a runner where simulators haven't been downloaded yet), `device` will be empty and `xcodebuild` will fail with a generic "Unable to find a destination" error that's hard to diagnose. An early check produces a clear message.

```suggestion
	@device="$$(xcrun simctl list devices available | grep -E '^[[:space:]]*iPhone' | head -1 | sed -E 's/^[[:space:]]*//; s/ \(.*//')"; \
	[ -n "$$device" ] || { echo "No available iPhone simulator found; install one via Xcode or 'xcrun simctl create'."; exit 1; }; \
	echo "Testing on simulator: $$device"; \
	set -o pipefail && xcrun xcodebuild test -scheme PostHog -destination "platform=iOS Simulator,name=$$device" -retry-tests-on-failure -test-iterations 3 | xcpretty
```

Reviews (1): Last reviewed commit: "ci: run the iOS test target on a simulat..." | Re-trigger Greptile

Comment thread PostHogTests/PostHogRemoteConfigTest.swift Outdated
Comment thread Makefile Outdated

@dustinbyrne dustinbyrne left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile comments look legit

Comment thread PostHogTests/PostHogSessionReplayEventTriggersTest.swift
Comment thread PostHogTests/PostHogSessionManagerTest.swift
Comment thread PostHogTests/TestUtils/TestPostHog.swift Outdated
turnipdabeets and others added 3 commits June 8, 2026 18:30
- Extract reload/flags bridge helpers in PostHogRemoteConfigTest (dedup ~24 inline sites)
- Makefile: fail with a clear message when no iOS simulator is available
- Drop the dead waitForRemoteConfig busy-wait (remote config is disabled, seeded synchronously)
- Close the suite-held SDK in the RN session-management teardown
- Wait on didReceiveFeatureFlags instead of the can-be-stale lastRequestId non-nil check

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The iOS simulator job flaked on the feature-flag suite: under a slow/contended
CI runner the /flags (and batch) request didn't reach the mock server within the
hardcoded 15s XCTWaiter timeout, so waitFlagsRequest/getBatchedEvents failed with
"the expected requests never arrived" and the flag getters then read empty state.

A re-run of the same commit passed, confirming it's slow-CI flakiness, not a
regression. PollingDefaults.timeout was already bumped to 30s for the Nimble
poll path; the XCTWaiter request-arrival helpers were left at 15s. Unify them on
a single 30s constant so the fast path still returns immediately and only a
contended runner pays the longer ceiling.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The remote-config and sampling suites waited on async load callbacks with
`while !flag, Date() < timeout {}` — a CPU-spinning busy-wait that blocks a
thread for the full timeout and, on the Swift cooperative pool, can starve the
very callback it's waiting on (the same anti-pattern we already removed from the
session-replay trigger tests).

Replace all 16 with AsyncLatch: a small thread-safe counting latch that resumes
the instant the awaited callback(s) fire. The timeout becomes a safety net only
(so a regression fails fast instead of hanging), and an optional `settle`
preserves the few small post-callback delays those tests relied on.

Verified on the iOS simulator (4x, deterministic) and macOS (make test).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@turnipdabeets turnipdabeets requested a review from ioannisj June 9, 2026 00:24
turnipdabeets and others added 2 commits June 8, 2026 20:33
Follow-ups on the latch refactor and the logs-queue flake:

- AsyncLatch: 6 of the 7 preserved settle delays were unnecessary — the awaited
  callback already runs after the asserted state lands. Removed them and the
  settle param entirely, so the happy path is now purely event-driven (the only
  remaining Task.sleep is the timeout watchdog, which is cancelled the instant
  the latch opens and never completes in a passing run).
- The one site that genuinely needed it (cache cleared asynchronously after
  onRemoteConfigLoaded, with no signal to await) now polls the real end state via
  a shared waitUntil helper — returns the instant it clears instead of a blind delay.
- PostHogLogsQueueTest reachability test: construct Reachability with
  notificationQueue: nil so the initial check notifies synchronously during
  start(), instead of dispatching an async onReachable to main that lands after
  our manual onUnreachable and wipes the paused flag. Deflakes it deterministically.

Verified: RemoteConfig 5x, LogsQueue 3x (macOS) and all three suites on the iOS simulator.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…flake visibility

Hardening the test system after the flakes this branch surfaced:

- #1 Central isolation: Swift Testing has no global beforeEach (only the Quick
  side did, via TestPollingConfiguration). Add a ResetGlobalStateTrait
  (.resetsGlobalState) that resets the mocked 'now' clock and the RN
  'postHogSdkName' before and after every test, and apply it to the suites that
  mutate those globals (session manager, logs capture/queue, multicast,
  app-view-layout). Stops a value one test leaves behind from bleeding into the next.
- #4 Scoped clock: add withMockedNow / withMockedClock so a 'now' override
  auto-restores via the call scope instead of a hand-written defer. Migrated the
  logs-queue rate-cap test to it as the first adopter.
- #2 CI flake visibility: the iOS-sim job retries failed tests 3x, which can mask
  flakiness. Tee the raw xcodebuild log and add a best-effort step that surfaces
  any test which only passed after a retry to the job summary (handles both the
  Swift Testing and XCTest line formats); never fails the job. Documented the
  intentional macOS-strict / iOS-retry asymmetry.
- #3 Reachability: confirmed the remaining real-Reachability tests
  (context, reachability-multicast) never arm the system notifier, so they have no
  async-callback race to fix.

Verified: full macOS suite (564 tests / 115 suites) green; trait-applied suites
green on the iOS simulator; flake-detection parser validated against both output formats.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@turnipdabeets turnipdabeets force-pushed the fix/ios-tests-in-ci branch from fba87ff to b8b2882 Compare June 9, 2026 01:25
turnipdabeets and others added 4 commits June 8, 2026 22:16
Addressing an independent testing review:

- waitForFeatureFlagsLoaded observed the global didReceiveFeatureFlags notification
  (posted object: nil for every load), which a prior, still-draining SUT could
  satisfy. Observe this instance's onFeatureFlagsLoaded instead, subscribed before
  the request so it can't be missed — ties the wait to this test's /flags response.
- Document AsyncLatch's single-waiter contract and that its timeout watchdog
  deliberately does not fail the test (some callers wait on a signal that may
  legitimately never arrive, then assert regardless — surfacing those timeouts as
  failures would require reworking those tests, tracked separately).

Verified: full macOS suite (564/115) green; PostHogSDKTest (77) and remote config
suites green on the iOS simulator.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 'should not clear flags if hasFeatureFlags key is missing' test waited on
onFeatureFlagsLoaded, but when the key is absent the SDK takes the 'leave flags
alone' branch and never fires it (PostHogRemoteConfig.swift:118-126) — so the latch
silently ran to its 10s timeout every run and the test passed only because its
assertion holds regardless. Wait on onRemoteConfigLoaded, which fires after the
clear-or-keep decision. Test now completes in ~5ms instead of 10s (and the suite
drops from ~19s to ~9s).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tle at 2s

The previous 'wait on onRemoteConfigLoaded' change was faster but not provably
equivalent: onRemoteConfigLoaded is posted to main (PostHogRemoteConfig.swift:180)
*before* the clear-or-keep decision runs in the reload callback (line 190) on the
API queue, so a regression that wrongly cleared flags on a missing key could slip
past the assertion. Restore the original onFeatureFlagsLoaded signal.

That signal never actually fires for this path (reloadFeatureFlags early-returns
because canReloadFlagsForTesting is false), so the wait is inherently a bounded
settle — documented as such, and capped at 2s (the pre-existing value; the latch
conversion had inflated it to the 10s default). Test asserts the cache survives
after the async decision has run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@turnipdabeets turnipdabeets merged commit 853e716 into main Jun 9, 2026
36 checks passed
@turnipdabeets turnipdabeets deleted the fix/ios-tests-in-ci branch June 9, 2026 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants