Skip to content

fix(workflow-executor): harden OAuth reauth and token write-back [PRD-692]#1724

Open
hercemer42 wants to merge 8 commits into
feat/prd-367-pr2-executor-oauth-runtimefrom
feature/prd-692-harden-executor-oauth-runtime-clear-idempotency-phase-on
Open

fix(workflow-executor): harden OAuth reauth and token write-back [PRD-692]#1724
hercemer42 wants to merge 8 commits into
feat/prd-367-pr2-executor-oauth-runtimefrom
feature/prd-692-harden-executor-oauth-runtime-clear-idempotency-phase-on

Conversation

@hercemer42

@hercemer42 hercemer42 commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

What

Two correctness fixes on the executor OAuth2 MCP runtime, surfaced in the PR2 (#1665) review and deliberately deferred to land before the feature is enabled (PRD-627). Both touch the store-port surface, so they're bundled.

  • Reauth pause stranding idempotencyPhase: 'executing' (HIGH) — a reauth pause left the write-ahead marker set, so the resumed step was rejected with "Step execution was interrupted". Added deleteStepExecution(runId, stepIndex) to the RunStore port (both store impls) and clear the marker on the OAuthReauthRequiredError path.
  • Refresh write-back resurrecting a deleted credential (MED) — the rotated-token write-back now uses an atomic updateIfPresent (UPDATE … WHERE) on the credentials store instead of get-then-upsert, so a concurrent disconnect (DELETE) cannot be resurrected. upsert stays for the consent deposit path.

Audit-log behavior (intentional)

A re-auth pause's MCP tool call genuinely 401'd, so it surfaces as a failed activity-log entry. The audit records the call (which failed); the run history records the step (which pauses with needs-oauth-reauth and resumes). PR2's original "spurious failure" finding was reconsidered and dropped: the failure is real and worth surfacing for observability, and the resumed run logs its own entry — so no audit special-casing. (The earlier paused-status follow-up, PRD-693, is cancelled for the same reason.)

Stacking

Stacked on PR2 (#1665) — this PR targets feat/prd-367-pr2-executor-oauth-runtime, not main, because the OAuth runtime code it fixes only exists on that branch. Once PR2 merges, rebase and repoint this PR to main.

Tests

  • Behavioral tests: resume-after-reauth (idempotency marker cleared), a re-auth pause audited as failed while the step pauses, and the write-back race (a concurrently-deleted credential is not resurrected).
  • deleteStepExecution covered in both run stores; updateIfPresent covered in both credential stores (updates in place / does not insert when absent).
  • Impacted suites green; tsc + eslint clean. Postgres *.pg.test.ts left to CI.

fixes PRD-692

🤖 Generated with Claude Code

Note

Harden OAuth re-auth pause state and rotated refresh token write-back in McpStepExecutor

  • When an OAuth re-auth pause is triggered, McpStepExecutor.doExecute now calls a new clearReauthPauseState() helper before returning awaiting-input, preventing later resumes from being rejected as interrupted due to a stale executing write-ahead marker.
  • clearReauthPauseState clears only the idempotencyPhase when pendingData exists (preserving confirmation-flow state), or deletes the step execution record entirely when there is no pendingData.
  • OAuthTokenService.persistRotatedRefreshToken now calls store.updateIfPresent(id, ...) instead of store.upsert(...), making rotated token write-backs a no-op if the credential row was deleted or replaced since the token was read.
  • Both McpOAuthCredentialsStore and RunStore interfaces gain new methods (updateIfPresent and deleteStepExecution) implemented across the database and in-memory stores.
  • Risk: resumes that previously succeeded despite a stale write-ahead marker will now require the OAuth re-auth flow to complete before proceeding.

Macroscope summarized 50b58c5.

@linear-code

linear-code Bot commented Jun 29, 2026

Copy link
Copy Markdown

PRD-692

@qltysh

qltysh Bot commented Jun 29, 2026

Copy link
Copy Markdown

Qlty


Coverage Impact

This PR will not change total coverage.

Modified Files with Diff Coverage (6)

RatingFile% DiffUncovered Line #s
Coverage rating: A Coverage rating: A
...ow-executor/src/stores/database-mcp-oauth-credentials-store.ts100.0%
Coverage rating: A Coverage rating: A
packages/workflow-executor/src/oauth/token-service.ts100.0%
Coverage rating: A Coverage rating: A
packages/workflow-executor/src/stores/in-memory-store.ts100.0%
Coverage rating: A Coverage rating: A
...w-executor/src/stores/in-memory-mcp-oauth-credentials-store.ts100.0%
Coverage rating: A Coverage rating: A
packages/workflow-executor/src/executors/mcp-step-executor.ts100.0%
Coverage rating: A Coverage rating: A
packages/workflow-executor/src/stores/database-store.ts100.0%
Total100.0%
🚦 See full report on Qlty Cloud »

🛟 Help
  • Diff Coverage: Coverage for added or modified lines of code (excludes deleted files). Learn more.

  • Total Coverage: Coverage for the whole repository, calculated as the sum of all File Coverage. Learn more.

  • File Coverage: Covered Lines divided by Covered Lines plus Missed Lines. (Excludes non-executable lines including blank lines and comments.)

    • Indirect Changes: Changes to File Coverage for files that were not modified in this PR. Learn more.

Comment thread packages/workflow-executor/test/executors/mcp-step-executor.test.ts Outdated
Comment thread packages/workflow-executor/test/executors/mcp-step-executor.test.ts Outdated
Comment thread packages/workflow-executor/src/oauth/token-service.ts Outdated
Comment thread packages/workflow-executor/src/executors/mcp-step-executor.ts
hercemer42 and others added 4 commits June 30, 2026 10:44
Three correctness fixes that gate the OAuth2 MCP executor flag:
- clear the step idempotency marker on a re-auth pause so a resumed step
  is no longer rejected as interrupted
- record a re-auth pause as a non-failure in the activity log instead of
  a spurious failure
- re-check credential existence before the rotated-token write-back so a
  concurrent disconnect is not silently resurrected

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Assert the activity entry is closed as succeeded on a re-auth pause, not
just that it was never failed. Drop the stale TDD/iteration framing and
ticket references from the re-auth pause test block.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the get-then-upsert existence re-check with an atomic
updateIfPresent (UPDATE ... WHERE) on the credentials store, closing the
read-to-write window where a concurrent disconnect could be resurrected.
upsert stays for the consent deposit path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Revert the isNonFailure audit special-casing on the re-auth pause path.
An MCP tool call that 401s has genuinely failed, so it should surface as
a failed audit entry (useful for observability) rather than be recorded
as completed. The step still pauses (awaiting-input) and the resumed run
logs its own entry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@hercemer42 hercemer42 force-pushed the feature/prd-692-harden-executor-oauth-runtime-clear-idempotency-phase-on branch from 26e0158 to 3afe056 Compare June 30, 2026 08:46
Comment thread packages/workflow-executor/src/executors/mcp-step-executor.ts Outdated
Comment thread packages/workflow-executor/src/stores/database-mcp-oauth-credentials-store.ts Outdated
hercemer42 and others added 2 commits June 30, 2026 11:03
…cope

Addresses review findings on the re-auth pause and refresh write-back:
- preserve a confirmation-flow step's approved pendingData on a re-auth
  pause (clear only the marker) so resume replays that exact call;
  delete only when there is no pendingData (FullyAutomated), which would
  otherwise mis-route the resumed step into the confirmation flow
- make the pause cleanup best-effort so a store error still returns
  awaiting-input instead of a hard failure
- key updateIfPresent on the row id so a disconnect + re-authorize is not
  clobbered by a stale in-flight write-back (a re-created row has a new id)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Keep the non-obvious why (preserve-vs-delete on re-auth pause, id-scoped write-back); drop what the method names and code already state.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread packages/workflow-executor/src/executors/mcp-step-executor.ts
hercemer42 and others added 2 commits June 30, 2026 11:22
Swallowing a cleanup failure left the 'executing' marker in place, so the pause returned awaiting-input but could never resume (checkIdempotency rejects the stale marker). Let the store error propagate to an ordinary step error instead — consistent with the executor's other store-failure paths. Supersedes the earlier best-effort try/catch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@Scra3 Scra3 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: the clearReauthPauseState fix correctly clears the write-ahead 'executing' marker on the tool-call re-auth-pause path (delete for FullyAutomated, keep pendingData for confirmation flows), with resume-to-success and cleanup-failure coverage; the updateIfPresent id-scoped write-back also correctly stops a concurrent disconnect from resurrecting a deleted credential. No issues in this PR's diff. Heads-up: the SSRF blocker for this feature lives in the base branch (#1665, oauth/refresh-grant.tsfetch follows redirects), and the evict/set race is also base-branch code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants