ci: run the iOS test target on a simulator and fix the rot it surfaced#635
Merged
Conversation
c639ca3 to
ea22b4f
Compare
The iOS test target (#if os(iOS)) was never run in CI: `make test` runs `swift test` on macOS, which compiles out every `#if os(iOS)` block, so the replay, autocapture, tracing, crash-reporting and surveys suites never ran and had silently rotted. Add a `test-ios-simulator` job that runs `make testOniOSSimulator` (now picks an available simulator dynamically) and fix everything that running the full suite surfaced. No SDK source changes. - TestPostHog: define the missing `testAPIKey`... then realized two tests only used the deprecated PostHogConfig(apiKey:) init incidentally (they test the tracingHeaders default / are a tracing makeSut helper, not the deprecated API). Migrated both to projectToken and removed testAPIKey. The deliberate init(apiKey:) regression tests in PostHogConfigTest stay (they use projectToken). - PostHogCrashReportProcessor tests now actually run: the test target takes its own direct dependency on the vendored PHPLCrashReporter module (Package.swift for SPM; SWIFT_INCLUDE_PATHS on the xcodeproj test target so the modulemap resolves) and imports it directly, since the SDK imports it @_implementationOnly and @testable import PostHog doesn't re-export it. 13 tests, previously a no-op TODO that compiled out. - project.pbxproj: add fixture_survey_conditions_event_property_filters.json to the PostHogTests resources (on disk but not bundled, so Bundle.test couldn't find it under xcodebuild -> fatal crash; it passed under SPM). - PostHogSDKTest feature-flag-event tests: deflake on slow CI. waitFlagsRequest only proves the /flags request arrived, not that the SDK stored the response, so the immediate flag asserts could race ahead (read nil flags, or an empty request_id). New waitForFeatureFlagsLoaded polls remoteConfig.lastRequestId, which is set atomically with the flags + v4 metadata once the response lands. - Nimble polling timeout + queue async assertions: shared CI runners run several times slower than local (one job was ~6x: 174s for 166 tests vs ~28s), starving background work so the queue cap-halving / 413-retry toEventually assertions time out. A QuickConfiguration raises PollingDefaults.timeout to 30s for the whole suite (toEventually still returns early on pass), PostHogQueueTest's post-getBatchedEvents depth check polls instead of asserting synchronously, and a too-tight explicit 5s override was dropped to inherit the global ceiling. - PostHogQueueTest "flush respecting flushAt": deflake a real async race. flush() takes the batch asynchronously via peek(), so adding a third event right after the one that crosses flushAt could race into the in-flight peek() and join the batch (count 3 / depth 0 instead of 2 / 1). Drain the flush before adding more. - ApplicationViewLayoutPublisherTest: poll for the async effect instead of a fixed sleep for the should-fire assertions (a fixed sleep can lose to a slow runner); keep a small settle for the throttle-suppression checks, which must let each async block read the mocked clock before the clock advances. - Wipe persisted state before every Quick test. storage.reset() deliberately keeps the on-disk event queue (events are disk-backed from add()), so a prior test's unsent event leaks into the next test's batch and inflates counts (e.g. opt-out expected 0 got 1). The global QuickConfiguration.beforeEach now deletes the app-support dir so each test starts clean. - Close leaked SDK instances. Tests create a PostHogSDK per case but many never call close(), so the SDK's queues, timers, URLSession, reachability notifier and integration observers stay live; across ~40 instances per run they pile up and starve the background thread pool, stalling async work (a flag load timed out at 30s on a loaded CI runner, and the process hung past teardown). PostHogSDKTest tracks every getSut() SDK and closes them in afterEach; PostHogFeatureFlagsTest does the same via its BaseTestClass deinit. close() is idempotent. - Reset the global `now` clock before every test. Several suites mock `now` to a fixed date and never restore it (ApplicationViewLayoutPublisherTest, PostHogMulticastCallbackTest, some PostHogSDKTest cases); a leaked fixed clock makes the timestamp-keyed event queue collide records so batches are lost — surfacing as order-dependent "requests never arrived" / flag-not-loaded failures that retries can't fix (same process keeps the leak). A QuickConfiguration.beforeEach resets it for every Quick test; the Swift Testing leakers restore it in a defer. - PostHogRemoteConfigTest / PostHogSamplingTest: migrate the session-replay tests to the whole-config model (read recording config from .remoteConfig, not the removed .sessionReplay slice; drive it via reloadRemoteConfig; expect the /config endpoint /s/). Clear .remoteConfig in setup since it survives reset(). - MockPostHogServer: /config now emits sessionRecording.linkedFlag (parity /flags). - PostHogSessionManagerTest.ReactNativeTests: was a struct that set the global postHogSdkName = "posthog-react-native" and never restored it, leaking React Native mode into every later serialized suite and no-opping their session managers (no session) — which broke the session-dependent replay event-trigger and tracing-header tests. Made it a class that restores postHogSdkName / now / the lifecycle publisher in deinit. - PostHogTracingHeadersTest: scope the backgrounded-lifecycle mock to the one session-ended test (it was injected for the whole suite, breaking the others). - PostHogSessionReplayEventTriggersTest: seed recording config in .remoteConfig and reset the replay integration's static install flag so each test installs fresh. - testOniOSSimulator: -retry-tests-on-failure -test-iterations 3. After the leak/contamination fixes a small tail remains: autocapture debounce/flush tests assert real-time windows that can't be made deterministic and occasionally slip on slow CI runners. Rerun a failed test up to 3x so transient misses don't fail the job; a broken test fails all 3 and stays red. (The earlier retry-induced post-test hang was the resource leak above — fixed — not the retries.) Full suite passes locally: 564 tests under `swift test` (macOS), 557 + 161 on the simulator.
ea22b4f to
0e9df4b
Compare
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
PostHogTests/PostHogRemoteConfigTest.swift:537-544
**Repeated two-step reload sequence (OnceAndOnlyOnce)**
The `await reloadRemoteConfig` + `await loadFeatureFlags(distinctId:anonymousId:groups:callback:)` block appears verbatim ~12 times across the `TestSessionReplayFlags` and `TestLinkedFlags` suites (lines 537–544, 563–570, 589–596, 617–624, 645–652, 671–678, 704–711, 739–746, 769–776, 795–802, 861–876, 891–906 …). Extracting it into a single helper — e.g. `func reloadConfigThenFlags(_ sut: PostHogRemoteConfig) async` — would honour the team's OnceAndOnlyOnce rule and make each test's intent clearer.
### Issue 2 of 2
Makefile:82-84
If `xcrun simctl list devices available` returns no matching lines (e.g. on a runner where simulators haven't been downloaded yet), `device` will be empty and `xcodebuild` will fail with a generic "Unable to find a destination" error that's hard to diagnose. An early check produces a clear message.
```suggestion
@device="$$(xcrun simctl list devices available | grep -E '^[[:space:]]*iPhone' | head -1 | sed -E 's/^[[:space:]]*//; s/ \(.*//')"; \
[ -n "$$device" ] || { echo "No available iPhone simulator found; install one via Xcode or 'xcrun simctl create'."; exit 1; }; \
echo "Testing on simulator: $$device"; \
set -o pipefail && xcrun xcodebuild test -scheme PostHog -destination "platform=iOS Simulator,name=$$device" -retry-tests-on-failure -test-iterations 3 | xcpretty
```
Reviews (1): Last reviewed commit: "ci: run the iOS test target on a simulat..." | Re-trigger Greptile |
dustinbyrne
approved these changes
Jun 8, 2026
dustinbyrne
left a comment
Contributor
There was a problem hiding this comment.
Greptile comments look legit
- Extract reload/flags bridge helpers in PostHogRemoteConfigTest (dedup ~24 inline sites) - Makefile: fail with a clear message when no iOS simulator is available - Drop the dead waitForRemoteConfig busy-wait (remote config is disabled, seeded synchronously) - Close the suite-held SDK in the RN session-management teardown - Wait on didReceiveFeatureFlags instead of the can-be-stale lastRequestId non-nil check Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The iOS simulator job flaked on the feature-flag suite: under a slow/contended CI runner the /flags (and batch) request didn't reach the mock server within the hardcoded 15s XCTWaiter timeout, so waitFlagsRequest/getBatchedEvents failed with "the expected requests never arrived" and the flag getters then read empty state. A re-run of the same commit passed, confirming it's slow-CI flakiness, not a regression. PollingDefaults.timeout was already bumped to 30s for the Nimble poll path; the XCTWaiter request-arrival helpers were left at 15s. Unify them on a single 30s constant so the fast path still returns immediately and only a contended runner pays the longer ceiling. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The remote-config and sampling suites waited on async load callbacks with
`while !flag, Date() < timeout {}` — a CPU-spinning busy-wait that blocks a
thread for the full timeout and, on the Swift cooperative pool, can starve the
very callback it's waiting on (the same anti-pattern we already removed from the
session-replay trigger tests).
Replace all 16 with AsyncLatch: a small thread-safe counting latch that resumes
the instant the awaited callback(s) fire. The timeout becomes a safety net only
(so a regression fails fast instead of hanging), and an optional `settle`
preserves the few small post-callback delays those tests relied on.
Verified on the iOS simulator (4x, deterministic) and macOS (make test).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-ups on the latch refactor and the logs-queue flake: - AsyncLatch: 6 of the 7 preserved settle delays were unnecessary — the awaited callback already runs after the asserted state lands. Removed them and the settle param entirely, so the happy path is now purely event-driven (the only remaining Task.sleep is the timeout watchdog, which is cancelled the instant the latch opens and never completes in a passing run). - The one site that genuinely needed it (cache cleared asynchronously after onRemoteConfigLoaded, with no signal to await) now polls the real end state via a shared waitUntil helper — returns the instant it clears instead of a blind delay. - PostHogLogsQueueTest reachability test: construct Reachability with notificationQueue: nil so the initial check notifies synchronously during start(), instead of dispatching an async onReachable to main that lands after our manual onUnreachable and wipes the paused flag. Deflakes it deterministically. Verified: RemoteConfig 5x, LogsQueue 3x (macOS) and all three suites on the iOS simulator. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…flake visibility Hardening the test system after the flakes this branch surfaced: - #1 Central isolation: Swift Testing has no global beforeEach (only the Quick side did, via TestPollingConfiguration). Add a ResetGlobalStateTrait (.resetsGlobalState) that resets the mocked 'now' clock and the RN 'postHogSdkName' before and after every test, and apply it to the suites that mutate those globals (session manager, logs capture/queue, multicast, app-view-layout). Stops a value one test leaves behind from bleeding into the next. - #4 Scoped clock: add withMockedNow / withMockedClock so a 'now' override auto-restores via the call scope instead of a hand-written defer. Migrated the logs-queue rate-cap test to it as the first adopter. - #2 CI flake visibility: the iOS-sim job retries failed tests 3x, which can mask flakiness. Tee the raw xcodebuild log and add a best-effort step that surfaces any test which only passed after a retry to the job summary (handles both the Swift Testing and XCTest line formats); never fails the job. Documented the intentional macOS-strict / iOS-retry asymmetry. - #3 Reachability: confirmed the remaining real-Reachability tests (context, reachability-multicast) never arm the system notifier, so they have no async-callback race to fix. Verified: full macOS suite (564 tests / 115 suites) green; trait-applied suites green on the iOS simulator; flake-detection parser validated against both output formats. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fba87ff to
b8b2882
Compare
Addressing an independent testing review: - waitForFeatureFlagsLoaded observed the global didReceiveFeatureFlags notification (posted object: nil for every load), which a prior, still-draining SUT could satisfy. Observe this instance's onFeatureFlagsLoaded instead, subscribed before the request so it can't be missed — ties the wait to this test's /flags response. - Document AsyncLatch's single-waiter contract and that its timeout watchdog deliberately does not fail the test (some callers wait on a signal that may legitimately never arrive, then assert regardless — surfacing those timeouts as failures would require reworking those tests, tracked separately). Verified: full macOS suite (564/115) green; PostHogSDKTest (77) and remote config suites green on the iOS simulator. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 'should not clear flags if hasFeatureFlags key is missing' test waited on onFeatureFlagsLoaded, but when the key is absent the SDK takes the 'leave flags alone' branch and never fires it (PostHogRemoteConfig.swift:118-126) — so the latch silently ran to its 10s timeout every run and the test passed only because its assertion holds regardless. Wait on onRemoteConfigLoaded, which fires after the clear-or-keep decision. Test now completes in ~5ms instead of 10s (and the suite drops from ~19s to ~9s). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tle at 2s The previous 'wait on onRemoteConfigLoaded' change was faster but not provably equivalent: onRemoteConfigLoaded is posted to main (PostHogRemoteConfig.swift:180) *before* the clear-or-keep decision runs in the reload callback (line 190) on the API queue, so a regression that wrongly cleared flags on a missing key could slip past the assertion. Restore the original onFeatureFlagsLoaded signal. That signal never actually fires for this path (reloadFeatureFlags early-returns because canReloadFlagsForTesting is false), so the wait is inherently a bounded settle — documented as such, and capped at 2s (the pre-existing value; the latch conversion had inflated it to the 10s default). Test asserts the cache survives after the async decision has run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
💡 Motivation and Context
The iOS test target was never running in CI.
make testrunsswift teston macOS, which compiles out every#if os(iOS)block, so the session replay, autocapture, tracing-header, and crash-reporting suites never executed. They silently rotted: broken/outdated tests "passed" because they weren't running.This adds a
test-ios-simulatorCI job that runs the suite on a real simulator, then fixes everything that turned red once the tests actually ran. No SDK source changes — pure test/CI infrastructure.What the run surfaced and we fixed:
PHPLCrashReporter@_implementationOnly). Wired the module into the test target so all 13 now run..sessionReplaystorage slice that fix: keep session replay active across identify/reset #630 removed. Migrated them to the whole-config.remoteConfigmodel.postHogSdkNameand never restored it, dropping every later test into RN mode (which no-ops sessions) and breaking the replay-trigger + tracing-header suites. Now restored on teardown.xcodebuild. Added it.flush respecting flushAt) had a real async race against the background flush. Deflaked it.testAPIKeytest constant.testOniOSSimulatorpick an available simulator dynamically (was pinned toiPhone 15).💚 How did you test it?
swift test(macOS), 557 Swift Testing + 161 XCTest on the iOS simulator.📝 Checklist
If releasing new changes
pnpm changesetto generate a changeset file