Surface data email send failures as ERROR instead of silent drop#22
Merged
Conversation
_finalize's email send block had two bugs that combined into a silent-fail: 1. `send_step_email` is documented as \"Never raises\" and returns True/False. The caller wrapped it in `try: ... except Exception`, which is dead code — no exception can ever propagate out of the function — and then discarded the False return value. 2. When both Gmail SMTP and SES fallback failed inside `send_step_email`, the function logged at WARNING and returned False. The caller never checked the return, so the run continued as if the email had been sent. Net effect: a successful data collection with a broken emailer would record in S3 correctly but emit no notification, and monitoring had no way to tell the difference between \"ran and emailed\" and \"ran and silently dropped the notification\". ## Fix - Remove the dead try/except (send_step_email never raises) - Check the return value - On False, log ERROR with enough context to diagnose (which env vars/config to check) Not raising on email failure because the data is still valid in S3 and downstream Step Function steps can still consume it — email is a monitoring layer, not a data dependency. If CloudWatch alarms are wired to ERROR-level log events on this Lambda, the missed email will still surface. Audit of the rest of weekly_collector.py for other silent-fail patterns: - Lines 130, 363, 369: S3→Wikipedia fallback for constituents — intentional, downstream `if not tickers: return failed` catches the case where both sources fail - Lines 443, 483: duration formatting — cosmetic silent-fail, acceptable - run_daily status: ok/failed only, no partial; main() hard-fails on non-ok — already correct Tests: 61 passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix a silent-fail in `weekly_collector._finalize` where a failed `send_step_email` would log WARNING and continue, giving the run no visible indication that monitoring was blind to it.
Context
`_finalize`'s email block:
```python
if not dry_run and only is None:
try:
from emailer import send_step_email
step_name = f"Data Phase {phase}" if phase else "Data Collection"
send_step_email(step_name, results, run_date)
except Exception as exc:
logger.warning("Step email failed (non-fatal): %s", exc)
```
Two problems:
Net effect: a successful data collection with a broken emailer recorded in S3 correctly but emitted no notification. Monitoring couldn't distinguish "ran and emailed" from "ran and silently dropped the notification".
Fix
Not raising on email-send failure because the data itself is valid in S3 and downstream Step Function steps still consume it — email is a monitoring layer, not a data dependency. If CloudWatch alarms are wired to ERROR-level log events on this Lambda, the missed email surfaces naturally.
Audit of other silent-fail patterns in weekly_collector.py
No other silent-fails found in this file worth addressing.
Test plan
🤖 Generated with Claude Code