fix: guard event indexer poll cycle with Postgres advisory lock#31
fix: guard event indexer poll cycle with Postgres advisory lock#31nonsobethel0-dev wants to merge 2 commits into
Conversation
Closes LabsCrypt#5 Two concurrent indexer instances could both read the same cursor from indexer_state, fetch the same ledger range, and fire webhooks/notifications before the ON CONFLICT DO NOTHING result was known — wasting RPC calls and risking duplicate side-effect dispatch. Changes: - pollOnce() now acquires pg_try_advisory_lock(738154291) on a dedicated connection at the start of each cycle. A second instance that calls pg_try_advisory_lock while the first holds it gets false immediately and skips the cycle rather than blocking. - The lock is released in a finally block on the same connection that acquired it; the connection is always returned to the pool regardless of whether the cycle succeeds or throws. - Side effects (webhook dispatch, notifications) were already gated on INSERT ... RETURNING rowCount > 0, so only the instance that wins the insert fires them. Added a test to document and protect this guarantee. - Added mockLockClient default in src/__tests__/eventIndexer.test.ts so existing integration-style tests that call pollOnce() directly continue to work without changes. - Three new tests in services/__tests__/eventIndexer.test.ts cover: 1. Lock-not-acquired path: no DB writes, no dispatches, client released 2. Lock acquired but processing throws: unlock + client release still run 3. Two instances same event: only the inserting instance dispatches
- Add debug log line when advisory lock is successfully acquired so operators can confirm which instance is active in multi-pod deployments - Add mockLogger.debug to src/__tests__/eventIndexer.test.ts to match - Add test: restarted instance reads persisted cursor (last_indexed_ledger=500) and resumes from ledger 501, not from 0 — covers acceptance criterion 3
ogazboiz
left a comment
There was a problem hiding this comment.
the advisory-lock logic itself is solid, this is the right approach. pollOnce uses a dedicated session client, pg_try_advisory_lock (non-blocking, so a held lock just skips the cycle with no deadlock), unlocks in an inner finally that covers the early-return and error paths, releases the client in an outer finally, and the no-acquire path correctly skips the unlock. the new tests assert exactly those semantics. a few things block it though:
-
mock-completeness regression breaks an untouched test. adding
getClientto eventIndexer's import of connection.js makessrc/__tests__/loanDispute.test.tsfail withSyntaxError: ... does not provide an export named 'getClient'(5 tests), because that test's connection.js mock (lines 11-15) only exports query/default/withTransaction. addgetClient: jest.fn()to that mock. three other files mock connection.js without getClient and are latent risks if their import chains shift, worth adding it there too: src/tests/notificationCleanup.test.ts, src/tests/auditLog.test.ts, src/tests/remittanceService.test.ts. -
build fails, 6 tsc errors in the new test files (this repo has no ci so
npm run buildis the gate):- src/tests/eventIndexer.test.ts:83 the getClient mock infers
never, type itjest.fn<() => Promise<typeof mockLockClient>>(). - src/services/tests/eventIndexer.test.ts:404,440
.filterhas no matching overload on the calls array, annotate it (e.g..mock.calls as [string][]). - :431 mockRejectedValueOnce on a
nevermock, type the query mock generically. - :466 mockImplementation not assignable to UnknownFunction, cast to a typed signature.
- :503
queriedRanges[0].frompossibly undefined under noUncheckedIndexedAccess, use?.or assert non-null.
- src/tests/eventIndexer.test.ts:83 the getClient mock infers
-
prettier fails on src/services/tests/eventIndexer.test.ts. run
npx prettier --writeon it.
fix the mock and the test-file types so npm run build and npm test are green and the lock work is good to go.
if you want to keep contributing, join us on Telegram: https://t.me/+DOylgFv1jyJlNzM0
Closes #5
What changed
src/services/eventIndexer.tspollOnce()now acquirespg_try_advisory_lock(738154291)on a dedicated pool connection at the start of every poll cycle:pg_try_advisory_lockreturnsfalseimmediately — the current instance skips the cycle with adebuglog and returns without touching the cursor or firing any side effects.debuglog is emitted so operators can see which pod/process is active in multi-instance deployments.pg_advisory_unlockin afinallyblock on the same connection that acquired it (advisory locks are session-scoped in Postgres, so releasing on a different connection would be a no-op).finally, so neither the lock nor the connection leaks if processing throws.src/services/__tests__/eventIndexer.test.ts— 4 new testspg_advisory_unlock+release()still runlast_indexed_ledger = 500from DB and fetches from ledger 501rowCount > 0dispatches webhooks and notificationssrc/__tests__/eventIndexer.test.tsdebug: jest.fn()tomockLogger(required by the new observability log).mockLockClientthat returnsacquired: trueforpg_try_advisory_lock, so all existing tests that callpollOnce()directly continue to pass without modification.Acceptance criteria checklist
pg_try_advisory_lockensures only one instance advances the cursor per cycle; others skip immediately.INSERT ... RETURNING rowCount > 0, which was already correct and is now explicitly tested.getLastIndexedLedger()readslast_indexed_ledgerfrom the DB on every poll; restart is transparent. Covered by the new resume test.How to test