Skip to content

v2.1.1: harden Decixa resolve() call path with timeout + retry#4

Merged
TKCollective merged 1 commit into
mainfrom
hardening/resolve-timeout-retry
Apr 29, 2026
Merged

v2.1.1: harden Decixa resolve() call path with timeout + retry#4
TKCollective merged 1 commit into
mainfrom
hardening/resolve-timeout-retry

Conversation

@TKCollective
Copy link
Copy Markdown
Owner

Addresses @koki-socialgist's review feedback from the Discord thread ("timeout / retry around the Decixa call path"). Takes a first pass so review can focus on what's actually worth changing.

What changed

src/index.tsresolve tool now wraps the Decixa call in an AbortController-based timeout with a retry loop.

Behavior

  • Timeout: AbortController fires at DECIXA_RESOLVE_TIMEOUT_MS (default 4000ms)
  • Retry-on: network errors, AbortError / timeouts, 5xx responses
  • No-retry-on: 4xx responses (deterministic client errors — bad input, auth, not-found — retrying wastes latency)
  • Backoff: exponential, DECIXA_RESOLVE_RETRY_BASE_MS × 2^(attempt-1)
  • Default envelope: 3 attempts total (1 + 2 retries) with ~750ms of backoff between → worst case ~12.7s before falling through
  • Observability: response includes discovery_attempts on success; fallback response adds discovery_error with the last failure reason

Tunable via env

DECIXA_RESOLVE_TIMEOUT_MS=4000
DECIXA_RESOLVE_MAX_RETRIES=2
DECIXA_RESOLVE_RETRY_BASE_MS=250

Verified locally

Test Result
Happy path (Decixa reachable) discovery_source=decixa, discovery_attempts=1
Forced timeout (DECIXA_RESOLVE_TIMEOUT_MS=1, MAX_RETRIES=1) discovery_source=local_fallback, discovery_attempts=2, discovery_error=AbortError: This operation was aborted
4xx non-retry path Exits retry loop immediately on first 4xx, logs noted ✓

Questions for you

  1. Defaults OK? 4s timeout + 2 retries feels right for interactive MCP use, but you have better visibility into Decixa's p95 latency — happy to adjust.
  2. Retry policy on 429? Currently treated as 4xx (no retry). Should we treat Retry-After as a hint and retry once on 429?
  3. Jitter? No jitter on backoff right now. Easy to add if you think it matters at this call volume.

Once this merges, I'll tag v2.1.1 and push to npm.

Addresses @koki-socialgist review feedback from Discord thread.

## Behavior

- AbortController-based request timeout (default 4000ms)
- Retry on transient failures: network errors, AbortError/timeouts, 5xx
- No retry on 4xx (deterministic client errors \u2014 bad input, auth, not-found)
- Exponential backoff: 250ms \u00d7 2^(attempt-1), default 2 retries (3 total attempts)
- Response includes `discovery_attempts` so consumers see how many tries were needed
- Fallback response includes `discovery_error` with the last failure reason

## Tunable via env

- `DECIXA_RESOLVE_TIMEOUT_MS` (default: 4000)
- `DECIXA_RESOLVE_MAX_RETRIES` (default: 2)
- `DECIXA_RESOLVE_RETRY_BASE_MS` (default: 250)

## Verified

- Happy path: returns Decixa response with discovery_source='decixa', attempts=1
- Forced timeout (TIMEOUT_MS=1): returns local fallback with
  discovery_source='local_fallback', discovery_attempts=2,
  discovery_error='AbortError: This operation was aborted'
- 4xx non-retry: breaks retry loop immediately (logs noted in stderr)
Copy link
Copy Markdown

@koki-socialgist koki-socialgist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed end-to-end. Implementation is clean — AbortController + retry classification + env tunables. Good shape.

On your three questions

1. Defaults (4s timeout, 2 retries) — OK as-is.

Decixa /api/agent/resolve p95 sits well under 4s for typical intent queries. Worst-case envelope ~12.7s is at the edge of MCP UX tolerance but reasonable — leaves headroom for Vercel cold starts on our end. Wouldn't tighten.

2. 429 — yes, please retry.

Decixa doesn't emit 429 today, but if Vercel platform throttling kicks in at scale, 429 + Retry-After is the standard contract. Treat as retryable, honor Retry-After if present (cap to your max envelope), exponential otherwise. Forward-compatible, zero cost today.

3. Jitter — small win, worth adding.

At current volume not strictly needed, but ±25% jitter on the backoff (delay * (0.75 + Math.random() * 0.5)) is a 1-liner that future-proofs against thundering herd if multiple agents hit the same Decixa cold start. I'd add it.

One tiny nit

lastStatus is captured but never surfaced. Either drop the variable or include discovery_last_http_status in the fallback response for debugging.

Approving as-is. Up to you whether to fold jitter / 429 into this PR or ship in v2.1.2.

🙏

TKCollective added a commit that referenced this pull request Apr 29, 2026
…e 3 upgrades)

# Conflicts:
#	package.json
@TKCollective TKCollective merged commit 56cbd61 into main Apr 29, 2026
1 check passed
@TKCollective
Copy link
Copy Markdown
Owner Author

Thanks Koki — your three calls landed exactly as you scoped them:

Defaults (4s / 2 retries) — kept as-is. Decixa p95 + Vercel cold start headroom matches what we measure on our side too.

429 retry — added. Honors Retry-After (capped to max envelope), exponential otherwise. Forward-compatible the moment Vercel platform throttling kicks in.

Jitter — folded the one-liner in: delay * (0.75 + Math.random() * 0.5). Cheap insurance against thundering herd on cold starts.

The lastStatus nit — surfaced. discovery_last_http_status now returns in the fallback response payload for debugging.

All three plus your nit shipped in v2.1.2 on npm — same release that wired in Decixa Phase 3 (intent-driven /api/agent/resolve, ~85% top-3). Confirmed end-to-end against your live endpoint.

Appreciate the deep review. Enjoy family time — happy Monday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants