fix(workflow-executor): harden OAuth reauth and token write-back [PRD-692]#1724
Conversation
|
Coverage Impact This PR will not change total coverage. Modified Files with Diff Coverage (6) 🛟 Help
|
Three correctness fixes that gate the OAuth2 MCP executor flag: - clear the step idempotency marker on a re-auth pause so a resumed step is no longer rejected as interrupted - record a re-auth pause as a non-failure in the activity log instead of a spurious failure - re-check credential existence before the rotated-token write-back so a concurrent disconnect is not silently resurrected Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Assert the activity entry is closed as succeeded on a re-auth pause, not just that it was never failed. Drop the stale TDD/iteration framing and ticket references from the re-auth pause test block. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the get-then-upsert existence re-check with an atomic updateIfPresent (UPDATE ... WHERE) on the credentials store, closing the read-to-write window where a concurrent disconnect could be resurrected. upsert stays for the consent deposit path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Revert the isNonFailure audit special-casing on the re-auth pause path. An MCP tool call that 401s has genuinely failed, so it should surface as a failed audit entry (useful for observability) rather than be recorded as completed. The step still pauses (awaiting-input) and the resumed run logs its own entry. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
26e0158 to
3afe056
Compare
…cope Addresses review findings on the re-auth pause and refresh write-back: - preserve a confirmation-flow step's approved pendingData on a re-auth pause (clear only the marker) so resume replays that exact call; delete only when there is no pendingData (FullyAutomated), which would otherwise mis-route the resumed step into the confirmation flow - make the pause cleanup best-effort so a store error still returns awaiting-input instead of a hard failure - key updateIfPresent on the row id so a disconnect + re-authorize is not clobbered by a stale in-flight write-back (a re-created row has a new id) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Keep the non-obvious why (preserve-vs-delete on re-auth pause, id-scoped write-back); drop what the method names and code already state. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Swallowing a cleanup failure left the 'executing' marker in place, so the pause returned awaiting-input but could never resume (checkIdempotency rejects the stale marker). Let the store error propagate to an ordinary step error instead — consistent with the executor's other store-failure paths. Supersedes the earlier best-effort try/catch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Scra3
left a comment
There was a problem hiding this comment.
praise: the clearReauthPauseState fix correctly clears the write-ahead 'executing' marker on the tool-call re-auth-pause path (delete for FullyAutomated, keep pendingData for confirmation flows), with resume-to-success and cleanup-failure coverage; the updateIfPresent id-scoped write-back also correctly stops a concurrent disconnect from resurrecting a deleted credential. No issues in this PR's diff. Heads-up: the SSRF blocker for this feature lives in the base branch (#1665, oauth/refresh-grant.ts — fetch follows redirects), and the evict/set race is also base-branch code.

What
Two correctness fixes on the executor OAuth2 MCP runtime, surfaced in the PR2 (#1665) review and deliberately deferred to land before the feature is enabled (PRD-627). Both touch the store-port surface, so they're bundled.
idempotencyPhase: 'executing'(HIGH) — a reauth pause left the write-ahead marker set, so the resumed step was rejected with "Step execution was interrupted". AddeddeleteStepExecution(runId, stepIndex)to theRunStoreport (both store impls) and clear the marker on theOAuthReauthRequiredErrorpath.updateIfPresent(UPDATE … WHERE) on the credentials store instead of get-then-upsert, so a concurrent disconnect (DELETE) cannot be resurrected.upsertstays for the consent deposit path.Audit-log behavior (intentional)
A re-auth pause's MCP tool call genuinely 401'd, so it surfaces as a
failedactivity-log entry. The audit records the call (which failed); the run history records the step (which pauses withneeds-oauth-reauthand resumes). PR2's original "spurious failure" finding was reconsidered and dropped: the failure is real and worth surfacing for observability, and the resumed run logs its own entry — so no audit special-casing. (The earlierpaused-status follow-up, PRD-693, is cancelled for the same reason.)Stacking
Stacked on PR2 (#1665) — this PR targets
feat/prd-367-pr2-executor-oauth-runtime, notmain, because the OAuth runtime code it fixes only exists on that branch. Once PR2 merges, rebase and repoint this PR tomain.Tests
failedwhile the step pauses, and the write-back race (a concurrently-deleted credential is not resurrected).deleteStepExecutioncovered in both run stores;updateIfPresentcovered in both credential stores (updates in place / does not insert when absent).tsc+ eslint clean. Postgres*.pg.test.tsleft to CI.fixes PRD-692
🤖 Generated with Claude Code
Note
Harden OAuth re-auth pause state and rotated refresh token write-back in
McpStepExecutorMcpStepExecutor.doExecutenow calls a newclearReauthPauseState()helper before returningawaiting-input, preventing later resumes from being rejected as interrupted due to a staleexecutingwrite-ahead marker.clearReauthPauseStateclears only theidempotencyPhasewhenpendingDataexists (preserving confirmation-flow state), or deletes the step execution record entirely when there is nopendingData.OAuthTokenService.persistRotatedRefreshTokennow callsstore.updateIfPresent(id, ...)instead ofstore.upsert(...), making rotated token write-backs a no-op if the credential row was deleted or replaced since the token was read.McpOAuthCredentialsStoreandRunStoreinterfaces gain new methods (updateIfPresentanddeleteStepExecution) implemented across the database and in-memory stores.Macroscope summarized 50b58c5.