Skip to content

Remove over-eager n_ok==0 guard in daily_append#33

Merged
cipher813 merged 1 commit into
mainfrom
fix/daily-append-zero-ok-guard
Apr 14, 2026
Merged

Remove over-eager n_ok==0 guard in daily_append#33
cipher813 merged 1 commit into
mainfrom
fix/daily-append-zero-ok-guard

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

Live run on 2026-04-14 exposed a false positive in my hard-fail guard from #24.

When 900/902 tickers already had today's row in ArcticDB (because this morning's Step Function run succeeded), `daily_append` correctly took the "today already exists" skip path on each one → `n_ok=0, n_skip=900, n_err=2` (2 newly-listed tickers Q + SOLS not yet backfilled, 0.22% error rate). My guard raised `RuntimeError` on `n_ok == 0`, treating "nothing to write because all done" as "failed to write anything."

What was over-broad

The `n_ok == 0` guard was meant to catch: ArcticDB-wide auth/connectivity failure → every ticker read throws → n_err=902, n_ok=0. But that case was already handled correctly after #24 converted per-ticker read exceptions from `n_skip` to `n_err`. Now a blanket auth failure registers as `n_err=902` → `err_rate=100%` → the existing 5% threshold fires.

So `n_ok == 0` alone doesn't distinguish "all done" from "all failed." Removing the guard and relying on err_rate is correct.

Net behavior

n_ok n_skip n_err Outcome
0 902 0 Pass (idempotent rerun — everyone already wrote)
900 0 2 Pass (normal run with 2 missing tickers)
0 0 902 Fail (ArcticDB auth broken — err_rate=100%)
800 0 102 Fail (err_rate 11% > 5% threshold)

Secondary finding from this run

My earlier claim that ArcticDB was "stale since 2026-04-12" was wrong. I was reading S3 `LastModified` on ArcticDB's internal `tdata/` / `vref/` prefixes as a "last append" signal, but ArcticDB batches/compacts — those timestamps don't monotonically increase on every append. The 900 tickers with today's row are ground truth: this morning's Step Function actually did write ArcticDB successfully.

Test plan

  • pytest 41/41
  • Re-run live command on EC2 after merge — expect pass with `n_ok=0 n_skip=900 n_err=2` (or n_err=0 if the 2 missing tickers got backfilled meanwhile)

🤖 Generated with Claude Code

2026-04-14 live run discovery: the n_ok==0 hard-fail guard added in
PR #24 is a false positive on legitimate no-op reruns. When 900/902
tickers already have today's row in ArcticDB (because this morning's
Step Function write succeeded), the loop correctly takes the "today
already exists" skip path for each — n_ok=0, n_skip=900, n_err=2
(2 newly-listed tickers Q and SOLS not yet backfilled). My guard
raised RuntimeError on that, treating "nothing to write because all
done" as "failed to write anything."

The real silent-fail this guard was meant to catch (ArcticDB-wide
auth/connectivity failure → every read throws) was reclassified to
n_err (not n_skip) as part of PR #24. So the err_rate > 5% threshold
already catches the true failure case, without false positives on
no-op reruns or idempotent retries.

Kept: the err_rate > 5% threshold. If ArcticDB is genuinely broken,
n_err will exceed 45 tickers on a 902-ticker run and this will fire.

Net behavior:
- n_ok=0, n_skip=902, n_err=0  → pass (idempotent rerun — everyone already wrote)
- n_ok=900, n_skip=0, n_err=2  → pass (normal run with 2 missing tickers)
- n_ok=0, n_skip=0, n_err=902  → fail (ArcticDB auth broken)
- n_ok=800, n_skip=0, n_err=102 → fail (err rate > 5%)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit bdb520a into main Apr 14, 2026
1 check passed
@cipher813 cipher813 deleted the fix/daily-append-zero-ok-guard branch April 14, 2026 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant