Skip to content

fix: [SDK-4336] add circuit breaker and defer appId commit on wedge#1469

Closed
sherwinski wants to merge 2 commits into
mainfrom
sherwin/sdk-4336-followup
Closed

fix: [SDK-4336] add circuit breaker and defer appId commit on wedge#1469
sherwinski wants to merge 2 commits into
mainfrom
sherwin/sdk-4336-followup

Conversation

@sherwinski

Copy link
Copy Markdown
Contributor

Summary

Two follow-up fixes layered on top of 7f5df216, surfaced during review of the (now-closed) #1468:

  • Circuit breaker — once any Options write times out, trip a page-scoped flag so the remaining ~7 Options writes during a wedged init short-circuit immediately. Cuts wedged-init time from ~14s (7 × 2s timeout) to ~2s (single timeout, rest skip).
  • Defer the new-appId commit on wedgeIds.appId is unguarded, so on a wedged page during an app-ID switch the Ids.appId write would succeed while the previous app's Options.{isPushEnabled,lastPushId,lastPushToken,lastOptedIn} stayed put. The previousAppId !== appId gate would then keep the reset branch from re-entering on later loads — making the cross-app contamination permanent. With the new isOptionsWriteWedged() accessor, initSaveState now bails early on a tripped breaker so a future non-wedged load can finish the reset cleanly.

Also fixes a minor leak — Promise.race now uses .finally(() => clearTimeout(timer)) so healthy pages don't accumulate dangling 2s timers across the dozens of Options writes init issues.

Adds a single Log._warn on timeout so support can correlate the breaker tripping with init slowness in console logs.

Test plan

  • vp test --run — 512/512 passing.
  • vp check — clean (5 unrelated pre-existing warnings).
  • vp run build:prod — passes; bundle deltas covered by a +80 B / +120 B size-limit bump.
  • On-device sanity check on iOS 26 PWA (the original repro path), confirming wedged init now resolves in ~2s and the db.Options timed out warning fires once.

Why a follow-up rather than amending #1468

#1468 had been chasing the same ticket and was effectively superseded by 7f5df21 landing on main. Closing it as superseded and re-uploading just the additive hardening on top of main keeps this diff minimal and reviewable (3 files, +26/-6).

fadi-george and others added 2 commits May 29, 2026 15:23
Follow-up hardening on top of 7f5df21, addressing two gaps surfaced
during review of #1468:

1. Add a page-scoped circuit breaker (`optionsWriteWedged`) that trips
   on the first Options-write timeout. Without it, `OneSignal.init()`
   pays the full 2s timeout on each of ~7 Options writes during a
   wedged init (~14s total). With it, the first wedge trips the breaker
   and the rest short-circuit immediately (~2s total).

2. Defer the new-appId commit in `initSaveState` when the Options reset
   got circuit-broken. The `Ids.appId` write is unguarded (the guard is
   Options-only), so without this gate it would succeed while the
   previous app's `isPushEnabled`/`lastPush*`/`lastOptedIn` stayed put,
   and the `previousAppId !== appId` branch would never re-enter on
   later loads — leaving cross-app contamination permanent. Skipping
   the commit lets a future non-wedged load complete the reset.

Also fixes a minor leak: the timer is now cleared via `.finally()`
when `op` settles before the timeout, so healthy pages doing many
Options writes don't accumulate dangling 2s timers. Adds a single
`Log._warn` on timeout so support can correlate the breaker tripping
with init slowness in console logs.

Bumps the size-limit budget by ~80 B (page) / ~120 B (sw) to fit.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens IndexedDB Options writes against iOS Safari PWA “wedge” behavior by adding a circuit breaker (to avoid repeated timeouts) and by preventing Ids.appId from being committed during an app-id switch when Options writes have wedged, avoiding permanent cross-app state contamination.

Changes:

  • Add an Options-write circuit breaker that trips on the first timeout and short-circuits subsequent Options writes.
  • Defer Ids.appId commit during app-id migration if the Options reset was circuit-broken.
  • Clear the setTimeout timer via .finally() to avoid accumulating dangling timers; bump size limits accordingly.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File Description
src/shared/helpers/init.ts Bails out of app-id migration before committing the new appId when Options writes are wedged.
src/shared/database/client.ts Implements the Options write circuit breaker and exports isOptionsWriteWedged().
package.json Adjusts size-limit thresholds to account for small bundle growth.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +109 to +124
function guardOptionsWrite<T>(
storeName: IDBStoreName,
op: () => Promise<T>,
): Promise<T | undefined> {
if (storeName !== 'Options') return op();
if (optionsWriteWedged) return Promise.resolve(undefined);
let timer: ReturnType<typeof setTimeout> | undefined;
const timeout = new Promise<undefined>((resolve) => {
timer = setTimeout(() => {
optionsWriteWedged = true;
Log._warn(`db.${storeName} timed out`);
resolve(undefined);
}, 2000);
});
return Promise.race([op(), timeout]).finally(() => clearTimeout(timer));
}
Comment on lines +101 to +105
// on stall; the dropped values are session metadata the SW reads with sensible
// fallbacks. Once any write times out we trip a page-scoped circuit breaker
// so the remaining ~7 Options writes during init short-circuit instead of
// each paying the full timeout. Remove if WebKit ever fixes the underlying
// bug: https://bugs.webkit.org/show_bug.cgi?id=315804
Comment on lines +356 to +360
// appId now would strand the previous app's metadata under it, and the
// `previousAppId !== appId` gate above would keep us out of this branch
// on later loads — leaving the stale values permanent. Skipping the
// appId commit instead lets a future non-wedged load complete the reset.
if (isOptionsWriteWedged()) return;
Comment on lines +109 to +124
function guardOptionsWrite<T>(
storeName: IDBStoreName,
op: () => Promise<T>,
): Promise<T | undefined> {
if (storeName !== 'Options') return op();
if (optionsWriteWedged) return Promise.resolve(undefined);
let timer: ReturnType<typeof setTimeout> | undefined;
const timeout = new Promise<undefined>((resolve) => {
timer = setTimeout(() => {
optionsWriteWedged = true;
Log._warn(`db.${storeName} timed out`);
resolve(undefined);
}, 2000);
});
return Promise.race([op(), timeout]).finally(() => clearTimeout(timer));
}
Comment on lines +101 to +105
// on stall; the dropped values are session metadata the SW reads with sensible
// fallbacks. Once any write times out we trip a page-scoped circuit breaker
// so the remaining ~7 Options writes during init short-circuit instead of
// each paying the full timeout. Remove if WebKit ever fixes the underlying
// bug: https://bugs.webkit.org/show_bug.cgi?id=315804
Comment on lines +356 to +360
// appId now would strand the previous app's metadata under it, and the
// `previousAppId !== appId` gate above would keep us out of this branch
// on later loads — leaving the stale values permanent. Skipping the
// appId commit instead lets a future non-wedged load complete the reset.
if (isOptionsWriteWedged()) return;
Comment on lines +109 to +124
function guardOptionsWrite<T>(
storeName: IDBStoreName,
op: () => Promise<T>,
): Promise<T | undefined> {
if (storeName !== 'Options') return op();
if (optionsWriteWedged) return Promise.resolve(undefined);
let timer: ReturnType<typeof setTimeout> | undefined;
const timeout = new Promise<undefined>((resolve) => {
timer = setTimeout(() => {
optionsWriteWedged = true;
Log._warn(`db.${storeName} timed out`);
resolve(undefined);
}, 2000);
});
return Promise.race([op(), timeout]).finally(() => clearTimeout(timer));
}
Comment on lines +101 to +105
// on stall; the dropped values are session metadata the SW reads with sensible
// fallbacks. Once any write times out we trip a page-scoped circuit breaker
// so the remaining ~7 Options writes during init short-circuit instead of
// each paying the full timeout. Remove if WebKit ever fixes the underlying
// bug: https://bugs.webkit.org/show_bug.cgi?id=315804
Comment on lines +356 to +360
// appId now would strand the previous app's metadata under it, and the
// `previousAppId !== appId` gate above would keep us out of this branch
// on later loads — leaving the stale values permanent. Skipping the
// appId commit instead lets a future non-wedged load complete the reset.
if (isOptionsWriteWedged()) return;
@fadi-george

Copy link
Copy Markdown
Contributor

just remake your branch

@fadi-george fadi-george closed this Jun 1, 2026
Comment on lines +358 to 363
// on later loads — leaving the stale values permanent. Skipping the
// appId commit instead lets a future non-wedged load complete the reset.
if (isOptionsWriteWedged()) return;
await db.put('Ids', { type: 'registrationId', id: null });
await db.put('Ids', { type: 'userId', id: null });
OneSignal._coreDirector._subscriptionModelStore._clear(ModelChangeTags._Hydrate);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The early-return on isOptionsWriteWedged() at init.ts:360 sits between the four Options resets and the Ids.{registrationId,userId} + _subscriptionModelStore._clear(_Hydrate) calls, so on a wedged app-ID switch the in-memory subscription store (and Ids.userId) keeps the previous app's identity for the rest of the page session. Downstream _sendOnSessionUpdate and outcome requests then serialize previous-app subscription IDs against the new app's config — the exact cross-app contamination this PR exists to prevent, just bounded to the wedged session. Moving _subscriptionModelStore._clear(ModelChangeTags._Hydrate) above line 360 closes the gap; the clear is synchronous in memory and its fire-and-forget db.delete targets the subscriptions store, not the wedged Options store.

Extended reasoning...

What the bug is\n\nThe new bail on line 360 (if (isOptionsWriteWedged()) return;) is placed between the four Options resets and three subsequent statements that are also part of the reset:\n\nts\nawait db.put('Options', { key: 'lastOptedIn', value: null });\nif (isOptionsWriteWedged()) return; // ← new bail\nawait db.put('Ids', { type: 'registrationId', id: null });\nawait db.put('Ids', { type: 'userId', id: null });\nOneSignal._coreDirector._subscriptionModelStore._clear(ModelChangeTags._Hydrate);\n\n\nThe PR comment above the bail only justifies skipping the Ids.appId commit further down ("Committing the new appId now would strand the previous app's metadata under it"). It does not address that the bail also drops the in-memory _subscriptionModelStore._clear() — a call that was unconditional pre-PR and that operates on a different IDB store entirely (subscriptions, not Options), so it cannot itself wedge.\n\n## Why this is a regression (and the concrete harm path)\n\nPre-PR, the wedged app-switch path was slow but functionally correct: the four Options writes each timed out (~8s), execution continued, and _subscriptionModelStore._clear(_Hydrate) ran. The in-memory store was emptied, so _getPushSubscriptionModel() returned undefined and UpdateManager._sendOnSessionUpdate short-circuited via the subscriptionModel?._notification_types !== Subscribed guard.\n\nPost-PR, after the first 2s timeout trips the breaker, the bail skips _clear(). The SubscriptionModelStore was populated at CoreModule construction via SimpleModelStore._load() (SimpleModelStore.ts:22), so the previous app's hydrated pushSubscription model survives in memory. The chain is then:\n\n1. Ids.userId is not nulled (skipped by the same bail), so getSubscription().deviceId (subscription.ts) and _isAlreadyRegisteredWithOneSignal() (base.ts:114-117) still return the previous app's identity.\n2. onSdkInitialized_sendOnSessionUpdate reads _getPushSubscriptionModel() and gets the previous app's model.\n3. The on_session / outcome requests serialize subscription: { id: pushSubscriptionModel.id, ... } against the new app's getAppId() endpoints.\n\nThat is the same cross-app contamination this PR exists to fix, just bounded to the wedged session rather than persisted.\n\n## Step-by-step proof\n\n1. iOS Safari PWA wedges Options writes (the documented WebKit bug 315804).\n2. User switches appId from A → B and reloads. previousAppId in Ids is still A, so we enter the previousAppId !== appId branch at init.ts:351.\n3. First db.put('Options', { key: 'isPushEnabled', value: null }) waits 2s, hits the timeout — optionsWriteWedged flips true. The remaining three Options writes short-circuit via if (optionsWriteWedged) return Promise.resolve(undefined).\n4. Line 360 returns. Ids.userId/Ids.registrationId are still A's; _subscriptionModelStore still holds A's pushSubscription rehydrated from IDB during new CoreModule().\n5. Init continues. onSdkInitialized_isAlreadyRegisteredWithOneSignal() reads A's Ids.userId and returns true.\n6. _sendOnSessionUpdate_getPushSubscriptionModel returns A's model. With notification_types === Subscribed, _upsertSession(SessionOrigin._UserNewSession) fires with OneSignal.config holding B's appId but A's subscription id in the body.\n7. SDK_INITIALIZED_PUBLIC consumers reading OneSignal.User.PushSubscription see A's id on a page running as B.\n\n## Addressing the "intentional design" refutations\n\nThe refutations argue this is an intentional "defer everything on wedge" policy that keeps the wedged session in a coherent "previous-app" state. Three reasons that framing doesn't hold:\n\n- The PR comment only documents the appId-commit deferral. It is silent on the _clear() and the two Ids resets, which are the statements actually being collateral-damaged. The comment scope is precisely "don't strand previous-app Options metadata under new appId" — moving _clear() up does not violate that.\n- The "partial-reset would create a different inconsistency" objection cuts the other way. The wedged session is already inconsistent: OneSignal.config is the new app's, but in-memory _subscriptionModelStore and Ids.userId are the old app's. Letting _sendOnSessionUpdate fire with a populated old-app subscription model is precisely the failure mode this PR is supposed to prevent. Letting _clear() run does not affect Options rows at all (it targets the subscriptions IDB store) and ModelStoreListener early-returns for any tag other than _Normal (ModelStoreListener.ts:55-58), so no spurious operations get enqueued.\n- The "could trigger duplicate player#create" worry depends on handleAutoResubscribe being entered with cleared state. But that path already gates on autoResubscribe && currentPermission === 'granted', and _isAlreadyRegisteredWithOneSignal still reflects A's Ids.userId (which is also skipped by the same bail — moving _clear() doesn't change that). If anything, the bigger and more principled fix is to move all three skipped statements (Ids.registrationId, Ids.userId, and _clear()) above the bail, since none of them are Options writes and none can wedge.\n\n## Suggested fix\n\nMove the in-memory _clear() (and ideally the two Ids resets, which are also unguarded by guardOptionsWrite) above the early-return:\n\nts\nawait db.put('Options', { key: 'lastOptedIn', value: null });\nawait db.put('Ids', { type: 'registrationId', id: null });\nawait db.put('Ids', { type: 'userId', id: null });\nOneSignal._coreDirector._subscriptionModelStore._clear(ModelChangeTags._Hydrate);\nif (isOptionsWriteWedged()) return;\n\n\nThis preserves the PR's stated intent (don't commit Ids.appId on wedge, so a future non-wedged load completes the reset) while preventing the wedged session from emitting on_session / outcome traffic that mixes previous-app subscription identity with new-app endpoints.

@fadi-george fadi-george deleted the sherwin/sdk-4336-followup branch June 1, 2026 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants