Skip to content

Surface data email send failures as ERROR instead of silent drop#22

Merged
cipher813 merged 2 commits into
mainfrom
fix/data-finalize-email-silent-fail
Apr 12, 2026
Merged

Surface data email send failures as ERROR instead of silent drop#22
cipher813 merged 2 commits into
mainfrom
fix/data-finalize-email-silent-fail

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

Fix a silent-fail in `weekly_collector._finalize` where a failed `send_step_email` would log WARNING and continue, giving the run no visible indication that monitoring was blind to it.

Context

`_finalize`'s email block:
```python
if not dry_run and only is None:
try:
from emailer import send_step_email
step_name = f"Data Phase {phase}" if phase else "Data Collection"
send_step_email(step_name, results, run_date)
except Exception as exc:
logger.warning("Step email failed (non-fatal): %s", exc)
```

Two problems:

  1. Dead try/except — `send_step_email` is explicitly documented as "Never raises" (see `emailer.py:20`) and returns `True`/`False`. No exception ever propagates out of it.
  2. Discarded return value — when both Gmail SMTP and SES fallback failed inside `send_step_email`, the function returned `False`. The caller ignored it.

Net effect: a successful data collection with a broken emailer recorded in S3 correctly but emitted no notification. Monitoring couldn't distinguish "ran and emailed" from "ran and silently dropped the notification".

Fix

  • Remove the dead `try/except`
  • Check the boolean return value
  • On `False`, log ERROR with enough context to diagnose (which env vars / SES identity to check)

Not raising on email-send failure because the data itself is valid in S3 and downstream Step Function steps still consume it — email is a monitoring layer, not a data dependency. If CloudWatch alarms are wired to ERROR-level log events on this Lambda, the missed email surfaces naturally.

Audit of other silent-fail patterns in weekly_collector.py

  • Lines 130, 363, 369 — S3→Wikipedia fallback for constituents. Intentional. Downstream `if not tickers: return failed` catches the empty case.
  • Lines 443, 483 — duration formatting `try/except: pass`. Cosmetic. Acceptable.
  • run_daily status logic (lines 428-432) — only produces `ok` or `failed`, no `partial`. `main()` hard-fails on anything != `ok`. Already correct.

No other silent-fails found in this file worth addressing.

Test plan

  • 61 existing tests pass locally
  • CI green
  • If a future emailer misconfiguration occurs, the next Saturday DataPhase1 run logs `Step email '...' failed to send` at ERROR level

🤖 Generated with Claude Code

cipher813 and others added 2 commits April 11, 2026 17:22
_finalize's email send block had two bugs that combined into a
silent-fail:

1. `send_step_email` is documented as \"Never raises\" and returns
   True/False. The caller wrapped it in `try: ... except Exception`,
   which is dead code — no exception can ever propagate out of the
   function — and then discarded the False return value.
2. When both Gmail SMTP and SES fallback failed inside
   `send_step_email`, the function logged at WARNING and returned
   False. The caller never checked the return, so the run continued
   as if the email had been sent.

Net effect: a successful data collection with a broken emailer would
record in S3 correctly but emit no notification, and monitoring had
no way to tell the difference between \"ran and emailed\" and \"ran
and silently dropped the notification\".

## Fix

- Remove the dead try/except (send_step_email never raises)
- Check the return value
- On False, log ERROR with enough context to diagnose (which env
  vars/config to check)

Not raising on email failure because the data is still valid in S3
and downstream Step Function steps can still consume it — email is
a monitoring layer, not a data dependency. If CloudWatch alarms are
wired to ERROR-level log events on this Lambda, the missed email
will still surface.

Audit of the rest of weekly_collector.py for other silent-fail
patterns:
- Lines 130, 363, 369: S3→Wikipedia fallback for constituents —
  intentional, downstream `if not tickers: return failed` catches
  the case where both sources fail
- Lines 443, 483: duration formatting — cosmetic silent-fail,
  acceptable
- run_daily status: ok/failed only, no partial; main() hard-fails
  on non-ok — already correct

Tests: 61 passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit bc2c94a into main Apr 12, 2026
1 check passed
@cipher813 cipher813 deleted the fix/data-finalize-email-silent-fail branch April 12, 2026 01:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant