Skip to content

fix(safeFetch): exponential backoff, HTTP 429 awareness, Retry-After support#121

Open
kawacukennedy wants to merge 1 commit into
calesthio:masterfrom
kawacukennedy:fix/safefetch-429-backoff
Open

fix(safeFetch): exponential backoff, HTTP 429 awareness, Retry-After support#121
kawacukennedy wants to merge 1 commit into
calesthio:masterfrom
kawacukennedy:fix/safefetch-429-backoff

Conversation

@kawacukennedy

Copy link
Copy Markdown
  • I have read CONTRIBUTING.md
  • My code follows the project's style (pure ESM, async/await, no new dependencies)
  • I have tested locally (63/63 unit tests pass, diag.mjs verifies all imports)
  • My change is backward compatible — no existing callers modified
  • I have added 18 new unit tests
  • No documentation changes needed (no new env vars, no behavior visible to users)

Problem

safeFetch() in apis/utils/fetch.mjs has five issues affecting all 27+ source modules:

  1. No HTTP 429 awareness — Rate-limited responses (e.g. OpenSky, GDELT) are treated as generic errors. The README explicitly documents this as a limitation: "OpenSky can also return HTTP 429 when its public hotspots are queried too aggressively. Crucix does not try to evade that limit."
  2. Fixed backoff (no exponential growth) — Retries sleep 2000 × (i+1) ms regardless of error type. If a server sends Retry-After: 30, the header is ignored.
  3. All errors retried equally — 400 Bad Request and 404 Not Found are retried just like 503, wasting time and potentially worsening server load.
  4. AbortController timer leakclearTimeout(timer) is only called on the success path. (Noted in issue fix: safeFetch timer leak, env quote stripping, source count #84.)
  5. No POST/body support — Sources that need HTTP POST (BLS, ReliefWeb) must implement their own fetch wrapper.

Root Cause

safeFetch at apis/utils/fetch.mjs:3-28 was a single flat retry loop with no status-code classification, no exponential backoff, no Retry-After parsing, and timer cleanup only on success.

Solution

apis/utils/fetch.mjs — Rewrite safeFetch

  • Status-code classification: Only retry known-retryable codes (408, 429, 500, 502, 503, 504). 4xx client errors bail immediately.
  • Exponential backoff with jitter: min(100ms × 2^i, 30s) + random(0, 1000ms). Total cumulative backoff capped at 30s.
  • Retry-After header support: When a server sends Retry-After: N, wait exactly N seconds (capped at 60s to prevent pathological waits).
  • Timer leak fix: clearTimeout(timer) in both success and catch paths.
  • POST/body support: New method and body options.
  • Backward compatible: Same function signature. Defaults (retries: 1, method: 'GET') unchanged.

apis/sources/opensky.mjs — Explicit retries

OpenSky is the most rate-limited source (4k credits/day unauthenticated, 10 parallel hotspot queries). Setting retries: 2 on getFlightsInArea gives exponential backoff two chances to succeed before a hotspot returns empty data.

test/safe-fetch.test.mjs — 18 new unit tests

Covers: success, non-JSON, 400/403/404 bail, 429/503/408 retry, Retry-After, network error, timeout abort, POST body, custom headers, max backoff cap.

Testing

  • node --test test/safe-fetch.test.mjs — 18/18 pass
  • node --test (all tests) — 63/63 pass, 0 fail, 1 skipped (needs API key)
  • node diag.mjs — All 12 imports OK, port available, server loads
  • node -e \"import('./apis/utils/fetch.mjs')\" — Module loads without error

Suggested labels

bug, performance, enhancement

…support

safeFetch() had five issues affecting all 27+ source modules:
1. No HTTP 429 awareness — rate-limited responses treated as generic errors
2. Fixed backoff — 2000*(i+1)ms regardless of error type
3. All errors retried equally — 400/404 retried same as 503
4. AbortController timer leak — clearTimeout only on success path
5. No POST/body support — BLS, ReliefWeb had to implement their own fetch

Rewrite with:
- Status-code classification (only retry 408/429/500/502/503/504)
- Exponential backoff with jitter (100ms*2^i + random(0,1000), capped 30s)
- Retry-After header parsing (capped at 60s)
- Timer cleanup in catch path
- POST/body method support

Update OpenSky getFlightsInArea to use retries: 2 for 429 resilience.
Add 18 unit tests covering all new behaviors.

Fixes the known limitation documented in README.md:498.
@kawacukennedy kawacukennedy requested a review from calesthio as a code owner June 1, 2026 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant