ci: run the iOS test target on a simulator and fix the rot it surfaced by turnipdabeets · Pull Request #635 · PostHog/posthog-ios

turnipdabeets · 2026-06-05T01:04:01Z

💡 Motivation and Context

The iOS test target was never running in CI. make test runs swift test on macOS, which compiles out every #if os(iOS) block, so the session replay, autocapture, tracing-header, and crash-reporting suites never executed. They silently rotted: broken/outdated tests "passed" because they weren't running.

This adds a test-ios-simulator CI job that runs the suite on a real simulator, then fixes everything that turned red once the tests actually ran. No SDK source changes — pure test/CI infrastructure.

What the run surfaced and we fixed:

Crash-processor tests were a no-op (compiled out because the SDK imports PHPLCrashReporter @_implementationOnly). Wired the module into the test target so all 13 now run.
Replay/sampling tests still read the old .sessionReplay storage slice that fix: keep session replay active across identify/reset #630 removed. Migrated them to the whole-config .remoteConfig model.
Cross-suite leak: a React Native test set the global postHogSdkName and never restored it, dropping every later test into RN mode (which no-ops sessions) and breaking the replay-trigger + tracing-header suites. Now restored on teardown.
Missing survey fixture in the Xcode project caused a fatal crash under xcodebuild. Added it.
Flaky queue test (flush respecting flushAt) had a real async race against the background flush. Deflaked it.
Plus a few async-timing fixes and a missing testAPIKey test constant.
Made testOniOSSimulator pick an available simulator dynamically (was pinned to iPhone 15).

💚 How did you test it?

Full suite green locally: 564 tests under swift test (macOS), 557 Swift Testing + 161 XCTest on the iOS simulator.
Crash-processor suite (13 tests) now runs on both SPM and simulator (was compiled out).
Ran the previously-flaky queue test 8× consecutively on the simulator — all green.

📝 Checklist

I reviewed the submitted code.
I added tests to verify the changes.
I updated the docs if needed.
No breaking change or entry added to the changelog.

If releasing new changes

Ran pnpm changeset to generate a changeset file

The iOS test target (#if os(iOS)) was never run in CI: `make test` runs `swift test` on macOS, which compiles out every `#if os(iOS)` block, so the replay, autocapture, tracing, crash-reporting and surveys suites never ran and had silently rotted. Add a `test-ios-simulator` job that runs `make testOniOSSimulator` (now picks an available simulator dynamically) and fix everything that running the full suite surfaced. No SDK source changes. - TestPostHog: define the missing `testAPIKey`... then realized two tests only used the deprecated PostHogConfig(apiKey:) init incidentally (they test the tracingHeaders default / are a tracing makeSut helper, not the deprecated API). Migrated both to projectToken and removed testAPIKey. The deliberate init(apiKey:) regression tests in PostHogConfigTest stay (they use projectToken). - PostHogCrashReportProcessor tests now actually run: the test target takes its own direct dependency on the vendored PHPLCrashReporter module (Package.swift for SPM; SWIFT_INCLUDE_PATHS on the xcodeproj test target so the modulemap resolves) and imports it directly, since the SDK imports it @_implementationOnly and @testable import PostHog doesn't re-export it. 13 tests, previously a no-op TODO that compiled out. - project.pbxproj: add fixture_survey_conditions_event_property_filters.json to the PostHogTests resources (on disk but not bundled, so Bundle.test couldn't find it under xcodebuild -> fatal crash; it passed under SPM). - PostHogSDKTest feature-flag-event tests: deflake on slow CI. waitFlagsRequest only proves the /flags request arrived, not that the SDK stored the response, so the immediate flag asserts could race ahead (read nil flags, or an empty request_id). New waitForFeatureFlagsLoaded polls remoteConfig.lastRequestId, which is set atomically with the flags + v4 metadata once the response lands. - Nimble polling timeout + queue async assertions: shared CI runners run several times slower than local (one job was ~6x: 174s for 166 tests vs ~28s), starving background work so the queue cap-halving / 413-retry toEventually assertions time out. A QuickConfiguration raises PollingDefaults.timeout to 30s for the whole suite (toEventually still returns early on pass), PostHogQueueTest's post-getBatchedEvents depth check polls instead of asserting synchronously, and a too-tight explicit 5s override was dropped to inherit the global ceiling. - PostHogQueueTest "flush respecting flushAt": deflake a real async race. flush() takes the batch asynchronously via peek(), so adding a third event right after the one that crosses flushAt could race into the in-flight peek() and join the batch (count 3 / depth 0 instead of 2 / 1). Drain the flush before adding more. - ApplicationViewLayoutPublisherTest: poll for the async effect instead of a fixed sleep for the should-fire assertions (a fixed sleep can lose to a slow runner); keep a small settle for the throttle-suppression checks, which must let each async block read the mocked clock before the clock advances. - Wipe persisted state before every Quick test. storage.reset() deliberately keeps the on-disk event queue (events are disk-backed from add()), so a prior test's unsent event leaks into the next test's batch and inflates counts (e.g. opt-out expected 0 got 1). The global QuickConfiguration.beforeEach now deletes the app-support dir so each test starts clean. - Close leaked SDK instances. Tests create a PostHogSDK per case but many never call close(), so the SDK's queues, timers, URLSession, reachability notifier and integration observers stay live; across ~40 instances per run they pile up and starve the background thread pool, stalling async work (a flag load timed out at 30s on a loaded CI runner, and the process hung past teardown). PostHogSDKTest tracks every getSut() SDK and closes them in afterEach; PostHogFeatureFlagsTest does the same via its BaseTestClass deinit. close() is idempotent. - Reset the global `now` clock before every test. Several suites mock `now` to a fixed date and never restore it (ApplicationViewLayoutPublisherTest, PostHogMulticastCallbackTest, some PostHogSDKTest cases); a leaked fixed clock makes the timestamp-keyed event queue collide records so batches are lost — surfacing as order-dependent "requests never arrived" / flag-not-loaded failures that retries can't fix (same process keeps the leak). A QuickConfiguration.beforeEach resets it for every Quick test; the Swift Testing leakers restore it in a defer. - PostHogRemoteConfigTest / PostHogSamplingTest: migrate the session-replay tests to the whole-config model (read recording config from .remoteConfig, not the removed .sessionReplay slice; drive it via reloadRemoteConfig; expect the /config endpoint /s/). Clear .remoteConfig in setup since it survives reset(). - MockPostHogServer: /config now emits sessionRecording.linkedFlag (parity /flags). - PostHogSessionManagerTest.ReactNativeTests: was a struct that set the global postHogSdkName = "posthog-react-native" and never restored it, leaking React Native mode into every later serialized suite and no-opping their session managers (no session) — which broke the session-dependent replay event-trigger and tracing-header tests. Made it a class that restores postHogSdkName / now / the lifecycle publisher in deinit. - PostHogTracingHeadersTest: scope the backgrounded-lifecycle mock to the one session-ended test (it was injected for the whole suite, breaking the others). - PostHogSessionReplayEventTriggersTest: seed recording config in .remoteConfig and reset the replay integration's static install flag so each test installs fresh. - testOniOSSimulator: -retry-tests-on-failure -test-iterations 3. After the leak/contamination fixes a small tail remains: autocapture debounce/flush tests assert real-time windows that can't be made deterministic and occasionally slip on slow CI runners. Rerun a failed test up to 3x so transient misses don't fail the job; a broken test fails all 3 and stays red. (The earlier retry-induced post-test hang was the resource leak above — fixed — not the retries.) Full suite passes locally: 564 tests under `swift test` (macOS), 557 + 161 on the simulator.

greptile-apps · 2026-06-05T17:24:33Z

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
PostHogTests/PostHogRemoteConfigTest.swift:537-544
**Repeated two-step reload sequence (OnceAndOnlyOnce)**

The `await reloadRemoteConfig` + `await loadFeatureFlags(distinctId:anonymousId:groups:callback:)` block appears verbatim ~12 times across the `TestSessionReplayFlags` and `TestLinkedFlags` suites (lines 537–544, 563–570, 589–596, 617–624, 645–652, 671–678, 704–711, 739–746, 769–776, 795–802, 861–876, 891–906 …). Extracting it into a single helper — e.g. `func reloadConfigThenFlags(_ sut: PostHogRemoteConfig) async` — would honour the team's OnceAndOnlyOnce rule and make each test's intent clearer.

### Issue 2 of 2
Makefile:82-84
If `xcrun simctl list devices available` returns no matching lines (e.g. on a runner where simulators haven't been downloaded yet), `device` will be empty and `xcodebuild` will fail with a generic "Unable to find a destination" error that's hard to diagnose. An early check produces a clear message.

```suggestion
	@device="$$(xcrun simctl list devices available | grep -E '^[[:space:]]*iPhone' | head -1 | sed -E 's/^[[:space:]]*//; s/ \(.*//')"; \
	[ -n "$$device" ] || { echo "No available iPhone simulator found; install one via Xcode or 'xcrun simctl create'."; exit 1; }; \
	echo "Testing on simulator: $$device"; \
	set -o pipefail && xcrun xcodebuild test -scheme PostHog -destination "platform=iOS Simulator,name=$$device" -retry-tests-on-failure -test-iterations 3 | xcpretty
```

_{Reviews (1): Last reviewed commit: "ci: run the iOS test target on a simulat..." | Re-trigger Greptile}

dustinbyrne

Greptile comments look legit

- Extract reload/flags bridge helpers in PostHogRemoteConfigTest (dedup ~24 inline sites) - Makefile: fail with a clear message when no iOS simulator is available - Drop the dead waitForRemoteConfig busy-wait (remote config is disabled, seeded synchronously) - Close the suite-held SDK in the RN session-management teardown - Wait on didReceiveFeatureFlags instead of the can-be-stale lastRequestId non-nil check Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The iOS simulator job flaked on the feature-flag suite: under a slow/contended CI runner the /flags (and batch) request didn't reach the mock server within the hardcoded 15s XCTWaiter timeout, so waitFlagsRequest/getBatchedEvents failed with "the expected requests never arrived" and the flag getters then read empty state. A re-run of the same commit passed, confirming it's slow-CI flakiness, not a regression. PollingDefaults.timeout was already bumped to 30s for the Nimble poll path; the XCTWaiter request-arrival helpers were left at 15s. Unify them on a single 30s constant so the fast path still returns immediately and only a contended runner pays the longer ceiling. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The remote-config and sampling suites waited on async load callbacks with `while !flag, Date() < timeout {}` — a CPU-spinning busy-wait that blocks a thread for the full timeout and, on the Swift cooperative pool, can starve the very callback it's waiting on (the same anti-pattern we already removed from the session-replay trigger tests). Replace all 16 with AsyncLatch: a small thread-safe counting latch that resumes the instant the awaited callback(s) fire. The timeout becomes a safety net only (so a regression fails fast instead of hanging), and an optional `settle` preserves the few small post-callback delays those tests relied on. Verified on the iOS simulator (4x, deterministic) and macOS (make test). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Follow-ups on the latch refactor and the logs-queue flake: - AsyncLatch: 6 of the 7 preserved settle delays were unnecessary — the awaited callback already runs after the asserted state lands. Removed them and the settle param entirely, so the happy path is now purely event-driven (the only remaining Task.sleep is the timeout watchdog, which is cancelled the instant the latch opens and never completes in a passing run). - The one site that genuinely needed it (cache cleared asynchronously after onRemoteConfigLoaded, with no signal to await) now polls the real end state via a shared waitUntil helper — returns the instant it clears instead of a blind delay. - PostHogLogsQueueTest reachability test: construct Reachability with notificationQueue: nil so the initial check notifies synchronously during start(), instead of dispatching an async onReachable to main that lands after our manual onUnreachable and wipes the paused flag. Deflakes it deterministically. Verified: RemoteConfig 5x, LogsQueue 3x (macOS) and all three suites on the iOS simulator. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…flake visibility Hardening the test system after the flakes this branch surfaced: - #1 Central isolation: Swift Testing has no global beforeEach (only the Quick side did, via TestPollingConfiguration). Add a ResetGlobalStateTrait (.resetsGlobalState) that resets the mocked 'now' clock and the RN 'postHogSdkName' before and after every test, and apply it to the suites that mutate those globals (session manager, logs capture/queue, multicast, app-view-layout). Stops a value one test leaves behind from bleeding into the next. - #4 Scoped clock: add withMockedNow / withMockedClock so a 'now' override auto-restores via the call scope instead of a hand-written defer. Migrated the logs-queue rate-cap test to it as the first adopter. - #2 CI flake visibility: the iOS-sim job retries failed tests 3x, which can mask flakiness. Tee the raw xcodebuild log and add a best-effort step that surfaces any test which only passed after a retry to the job summary (handles both the Swift Testing and XCTest line formats); never fails the job. Documented the intentional macOS-strict / iOS-retry asymmetry. - #3 Reachability: confirmed the remaining real-Reachability tests (context, reachability-multicast) never arm the system notifier, so they have no async-callback race to fix. Verified: full macOS suite (564 tests / 115 suites) green; trait-applied suites green on the iOS simulator; flake-detection parser validated against both output formats. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Addressing an independent testing review: - waitForFeatureFlagsLoaded observed the global didReceiveFeatureFlags notification (posted object: nil for every load), which a prior, still-draining SUT could satisfy. Observe this instance's onFeatureFlagsLoaded instead, subscribed before the request so it can't be missed — ties the wait to this test's /flags response. - Document AsyncLatch's single-waiter contract and that its timeout watchdog deliberately does not fail the test (some callers wait on a signal that may legitimately never arrive, then assert regardless — surfacing those timeouts as failures would require reworking those tests, tracked separately). Verified: full macOS suite (564/115) green; PostHogSDKTest (77) and remote config suites green on the iOS simulator. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The 'should not clear flags if hasFeatureFlags key is missing' test waited on onFeatureFlagsLoaded, but when the key is absent the SDK takes the 'leave flags alone' branch and never fires it (PostHogRemoteConfig.swift:118-126) — so the latch silently ran to its 10s timeout every run and the test passed only because its assertion holds regardless. Wait on onRemoteConfigLoaded, which fires after the clear-or-keep decision. Test now completes in ~5ms instead of 10s (and the suite drops from ~19s to ~9s). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…tle at 2s The previous 'wait on onRemoteConfigLoaded' change was faster but not provably equivalent: onRemoteConfigLoaded is posted to main (PostHogRemoteConfig.swift:180) *before* the clear-or-keep decision runs in the reload callback (line 190) on the API queue, so a regression that wrongly cleared flags on a missing key could slip past the assertion. Restore the original onFeatureFlagsLoaded signal. That signal never actually fires for this path (reloadFeatureFlags early-returns because canReloadFlagsForTesting is false), so the wait is inherently a bounded settle — documented as such, and capped at 2s (the pre-existing value; the latch conversion had inflated it to the 10s default). Test asserts the cache survives after the async decision has run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

turnipdabeets force-pushed the fix/ios-tests-in-ci branch 10 times, most recently from c639ca3 to ea22b4f Compare June 5, 2026 16:36

turnipdabeets force-pushed the fix/ios-tests-in-ci branch from ea22b4f to 0e9df4b Compare June 5, 2026 16:53

turnipdabeets marked this pull request as ready for review June 5, 2026 17:19

turnipdabeets requested a review from a team as a code owner June 5, 2026 17:19

greptile-apps Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread PostHogTests/PostHogRemoteConfigTest.swift Outdated

Comment thread Makefile Outdated

dustinbyrne approved these changes Jun 8, 2026

View reviewed changes

Comment thread PostHogTests/PostHogSessionReplayEventTriggersTest.swift

Comment thread PostHogTests/PostHogSessionManagerTest.swift

Comment thread PostHogTests/TestUtils/TestPostHog.swift Outdated

turnipdabeets and others added 3 commits June 8, 2026 18:30

turnipdabeets requested a review from ioannisj June 9, 2026 00:24

turnipdabeets and others added 2 commits June 8, 2026 20:33

turnipdabeets force-pushed the fix/ios-tests-in-ci branch from fba87ff to b8b2882 Compare June 9, 2026 01:25

turnipdabeets and others added 4 commits June 8, 2026 22:16

Merge remote-tracking branch 'origin/main' into fix/ios-tests-in-ci

2726b09

turnipdabeets merged commit 853e716 into main Jun 9, 2026
36 checks passed

turnipdabeets deleted the fix/ios-tests-in-ci branch June 9, 2026 14:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: run the iOS test target on a simulator and fix the rot it surfaced#635

ci: run the iOS test target on a simulator and fix the rot it surfaced#635
turnipdabeets merged 10 commits into
mainfrom
fix/ios-tests-in-ci

turnipdabeets commented Jun 5, 2026

Uh oh!

greptile-apps Bot commented Jun 5, 2026

Uh oh!

Uh oh!

Uh oh!

dustinbyrne left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

turnipdabeets commented Jun 5, 2026

💡 Motivation and Context

💚 How did you test it?

📝 Checklist

If releasing new changes

Uh oh!

greptile-apps Bot commented Jun 5, 2026

Uh oh!

Uh oh!

Uh oh!

dustinbyrne left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants