Replace panic calls with proper error propagation#530
Replace panic calls with proper error propagation#530swell-agent wants to merge 6 commits intowavesplatform:mainfrom
Conversation
Replace all panic() calls with structured error handling using buffered channels and select-based multiplexing. Errors from goroutines are now propagated to callers instead of crashing the process. Key changes: - Add ServeErr() channel to API and BotAPI for HTTP server error reporting - Convert runMessagingClients/Services to return error channels - Use FanInCtx for composing error sources with proper context cancellation - Fix nil pointer dereference in vault.go when login fails with context.Canceled - Ensure scheduler shutdown runs on all exit paths in bot binaries - Log shutdown reason on service failure in nodemon - Simplify statementLogWrapper.String() and BaseTargetAlert.Message() Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove unused //nolint:mnd directives in discord and telegram bots. Replace slog.NewTextHandler(io.Discard, nil) with slog.DiscardHandler in test files.
pkg/clients/vault.go
Outdated
| if loginErr != nil { | ||
| if !errors.Is(loginErr, context.Canceled) { | ||
| logger.Error("Unable to authenticate to Vault, stopping token renewal", attrs.Error(loginErr)) | ||
| } | ||
| return |
There was a problem hiding this comment.
Do exponential backoff with maximum backoff limit, not return. return here just stops the renew cycle.
pkg/clients/vault.go
Outdated
| logger.Error("Unable to start managing token lifecycle", attrs.Error(tokenErr)) | ||
| panic(tokenErr) | ||
| logger.Error("Unable to start managing token lifecycle, stopping token renewal", attrs.Error(tokenErr)) | ||
| return |
There was a problem hiding this comment.
Do exponential backoff with maximum backoff limit, not return. return here just stops the renew cycle.
Instead of stopping the renewal cycle on transient errors, retry with exponential backoff (1s initial, 5m max). Backoff resets on success.
pkg/clients/vault.go
Outdated
|
|
||
| const ( | ||
| vaultTokenTTLIncrement = 3600 // in seconds | ||
| vaultInitialBackoff = 1 * time.Second |
There was a problem hiding this comment.
We already have cenkalti/backoff/v4 in our dependency tree — let's use it instead of hand-rolling. You'd get jitter for free (no thundering herd) and can drop waitBackoff/nextBackoff.
Also, consider adding backoff.WithMaxElapsedTime(60 * time.Minute) so we fail loudly if Vault is unreachable for an extended period rather than retrying silently forever.
There was a problem hiding this comment.
Also ensure that make lint passes after switching to cenkalti/backoff/v4.
There was a problem hiding this comment.
Also update the brach to the latest main
Replace hand-rolled waitBackoff/nextBackoff with cenkalti/backoff/v4 exponential backoff which provides jitter out of the box. Add MaxElapsedTime of 60 minutes so we fail loudly if Vault is unreachable for an extended period rather than retrying silently forever.
Summary
Closes #409
panic()calls with structured error handling using buffered channels andselect-based multiplexingServeErr()channel toAPIandBotAPIfor HTTP server error reportingrunMessagingClients/runMessagingServicesto return error channelsFanInCtxwith proper context cancellation for composing error sources in nodemonvault.gowhen login fails withcontext.CanceledstatementLogWrapper.String()andBaseTargetAlert.Message()error handlingTest plan
makecompletes without errorspaniccalls remain in the codebaseapi_nopanic_test.go,alerts_nopanic_test.go🤖 Generated with Claude Code