Add robustness improvements: retries, diagnostics, better error reporting#38
Open
idStar-bot wants to merge 1 commit into
Open
Add robustness improvements: retries, diagnostics, better error reporting#38idStar-bot wants to merge 1 commit into
idStar-bot wants to merge 1 commit into
Conversation
…ting - Increase default connection timeout from 15s to 30s, configurable via CALLME_CONNECTION_TIMEOUT_MS environment variable - Add automatic retry logic: CALLME_MAX_RETRIES (default 2) wraps initiateCall so transient failures get up to 3 total attempts with a 2s delay between retries - Add /diagnostics HTTP endpoint returning server health as JSON - Add get_diagnostics MCP tool so Claude can self-diagnose call issues - Add DiagnosticEvent tracking for call lifecycle events (initiated, connected, failed, ended) with a rolling 50-event buffer - Improve Telnyx error parsing: JSON error bodies are unpacked to human-readable title/detail strings instead of raw JSON blobs - Add generateFailureDiagnostics: after all retries are exhausted, produce a structured error with attempt summary and remediation hints - Improve waitForConnection: periodic 5s status logging shows WebSocket and stream readiness state; timeout error names the specific missing component rather than the generic "WebSocket connection timeout"
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Hardening from ~5 months of daily call-me use. Calls that previously failed opaquely (network hiccups, slow WebSocket bring-up, provider errors) now retry automatically and explain themselves when they can't recover.
CALLME_CONNECTION_TIMEOUT_MS. Slow cellular legs routinely needed more than 15s to establish the media stream.CALLME_MAX_RETRIES) with a short delay between attempts./diagnosticsHTTP endpoint — server health, config snapshot, active-call detail, and recent lifecycle events for humans/scripts.get_diagnosticsMCP tool — lets the agent inspect the same diagnostics mid-conversation instead of guessing why a call failed.waitForConnectionstatus logging — periodic progress lines (WebSocket/stream state) and precise timeout reasons.Notes
main(post-v1.0.3: Kokoro TTS, server resilience, plugin fixes #34); re-expressed inside the newstartServer()promise structure and asyncshutdown().bun build src/index.ts --target=bunclean; server boots and binds normally.🤖 Generated with Claude Code