Skip to content

Replace panic calls with proper error propagation#530

Open
swell-agent wants to merge 6 commits intowavesplatform:mainfrom
swell-agent:issue-409
Open

Replace panic calls with proper error propagation#530
swell-agent wants to merge 6 commits intowavesplatform:mainfrom
swell-agent:issue-409

Conversation

@swell-agent
Copy link
Copy Markdown

Summary

Closes #409

  • Replace all panic() calls with structured error handling using buffered channels and select-based multiplexing
  • Add ServeErr() channel to API and BotAPI for HTTP server error reporting
  • Convert runMessagingClients/runMessagingServices to return error channels
  • Use FanInCtx with proper context cancellation for composing error sources in nodemon
  • Fix nil pointer dereference in vault.go when login fails with context.Canceled
  • Ensure scheduler shutdown runs on all exit paths in discord/telegram bot binaries
  • Log shutdown reason on service failure in nodemon
  • Simplify statementLogWrapper.String() and BaseTargetAlert.Message() error handling

Test plan

  • make completes without errors
  • No panic calls remain in the codebase
  • New tests pass: api_nopanic_test.go, alerts_nopanic_test.go
  • Verify graceful shutdown on SIGTERM
  • Verify error propagation when messaging client fails

🤖 Generated with Claude Code

Replace all panic() calls with structured error handling using buffered
channels and select-based multiplexing. Errors from goroutines are now
propagated to callers instead of crashing the process.

Key changes:
- Add ServeErr() channel to API and BotAPI for HTTP server error reporting
- Convert runMessagingClients/Services to return error channels
- Use FanInCtx for composing error sources with proper context cancellation
- Fix nil pointer dereference in vault.go when login fails with context.Canceled
- Ensure scheduler shutdown runs on all exit paths in bot binaries
- Log shutdown reason on service failure in nodemon
- Simplify statementLogWrapper.String() and BaseTargetAlert.Message()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove unused //nolint:mnd directives in discord and telegram bots.
Replace slog.NewTextHandler(io.Discard, nil) with slog.DiscardHandler
in test files.
Comment on lines +61 to +65
if loginErr != nil {
if !errors.Is(loginErr, context.Canceled) {
logger.Error("Unable to authenticate to Vault, stopping token renewal", attrs.Error(loginErr))
}
return
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do exponential backoff with maximum backoff limit, not return. return here just stops the renew cycle.

logger.Error("Unable to start managing token lifecycle", attrs.Error(tokenErr))
panic(tokenErr)
logger.Error("Unable to start managing token lifecycle, stopping token renewal", attrs.Error(tokenErr))
return
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do exponential backoff with maximum backoff limit, not return. return here just stops the renew cycle.

Instead of stopping the renewal cycle on transient errors, retry with
exponential backoff (1s initial, 5m max). Backoff resets on success.

const (
vaultTokenTTLIncrement = 3600 // in seconds
vaultInitialBackoff = 1 * time.Second
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have cenkalti/backoff/v4 in our dependency tree — let's use it instead of hand-rolling. You'd get jitter for free (no thundering herd) and can drop waitBackoff/nextBackoff.

Also, consider adding backoff.WithMaxElapsedTime(60 * time.Minute) so we fail loudly if Vault is unreachable for an extended period rather than retrying silently forever.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also ensure that make lint passes after switching to cenkalti/backoff/v4.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also update the brach to the latest main

Replace hand-rolled waitBackoff/nextBackoff with cenkalti/backoff/v4
exponential backoff which provides jitter out of the box. Add
MaxElapsedTime of 60 minutes so we fail loudly if Vault is unreachable
for an extended period rather than retrying silently forever.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Get rid of panic calls

4 participants