test: cover flagged edge cases + fix snapshot persistence-health bug by VijitSingh97 · Pull Request #330 · p2pool-starter-stack/pithead

VijitSingh97 · 2026-07-01T01:31:59Z

What

A source-vs-tests audit across Pithead's four testing tiers. It adds coverage — at the lowest honest tier (all tier 1) — for branches, guards, and render paths that had none, and fixes one real bug found along the way. No production behaviour changes except the bug fix.

The bug (found while auditing)

storage_service.save_snapshot's TypeError branch (a snapshot json.dumps can't serialize) called logger.error directly instead of _db_error like every other write path — so db_healthy stayed True and the #131 "DB write failing" badge never fired while snapshots silently stopped persisting. One-line root-cause fix + a regression test asserting is_db_healthy() flips to False.

Tests added (tier 1)

Area	Behaviour now covered
pytest	Unserializable snapshot flips `db_healthy` (guards the fix above)
shell	Firewall `iptables -I` install-failure → warn + roll back, not half-open (#270)
shell	Wallet subaddress / integrated payout → abort apply (#250, silent reward-loss)
shell	`monero.mode=remote` with no host → abort; non-`/24` `network.subnet` → abort (#180)
shell	`ensure_owner` conditional chown + whole-tree recursive scan, no `-maxdepth` (#255)
frontend	`ProxyTotals` reddens rejected only on `reject_level=high`
frontend	Per-worker `api_ok=false` + `reject_flag` badges; XvB/Unknown `PoolBadge` variants
frontend	#278/#313 invariant: Tari ✔ gated on a live gRPC channel, never on active-but-dead
frontend	`Gauge` done→✔ vs syncing→percent

Each test fails if the logic is inverted (e.g. the Tari test asserts check-inline is absent for active-but-dead; the snapshot test asserts db_healthy is False).

Docs

docs/test-inventory.md regenerated: 764 → 770 enumerated cases (drift check green).
docs/testing-strategy.md Known-gaps ledger gains a Coverage-audit follow-ups section (below), so the gaps stay tracked.

Additional gaps found — not covered here (need Docker / real box → tier 3/4)

These are the real completeness risks the audit surfaced. None touch the code changed in this PR, so per "test once at the lowest honest tier" they're follow-ups, not part of this change. The first two are divergence backstops for paths this PR proves only with stubs:

Firewall rollback, real kernel. Enforce Tor-only egress at the network layer (fail-closed) — stop relying on per-app config #270 rollback is proven with stubbed iptables (control flow only). Whether a real kernel strips a partially inserted DOCKER-USER rule and ends fail-closed is never exercised with an induced mid-insert failure → needs a --fault-injection case.
ensure_owner real mixed-ownership tree. The Run containers as a non-root user (drop root inside the container) #255 scenario (user-owned dir, root-owned contents, one chown -R) can only be built as root → needs a --lifecycle step.
Real-container monerod failover in PR CI — only runs on the manual tier-4 box; the mini-stack breaks Tari, not monerod.
Non-blocking-Tari "ignore" path with real containers — unit-only; the path that silently kills yield if it regresses to a reject.
monerod busy / mid-reorg failover — contract test proves the client reads busy as unreachable; no scenario asserts the dashboard rejects workers on a busy-but-alive node.
Double outage, both-must-recover — unit-only; recovery ordering unproven end-to-end.
Partial-start / stop-failure idempotency — control-loop retry is unit-only; no tier-3/4 injects a docker start/stop error.
pithead doctor on a real box — only its exit code is unit-tested; the NTP/clock-drift check is never fault-injected.
Disk-full / ENOSPC verdict — only a headroom warning is checked; a real unhealthy-on-ENOSPC verdict is never forced.
Tor-container-down partial start — no Caddy/Tor in the mini-stack, so SOCKS-unreachable is exercised at no tier below the manual box.
Insecure + main matrix row — dashboard.secure=false only pairs with pool=nano, so insecure+main has no row.

Verification (all local)

make lint → all checks passed
make test → dashboard 641 passed / 94.73% cov (gate 80%), pithead shell 432/0, selftest 97/0, compose ✓, fakes 12 ✓
make test-patch-coverage → 100% on the changed line (gate 90%)
frontend (node --test) → 66 pass / 0 fail
make test-inventory-check → up to date

Fixtures use synthetic values only (4AAA…/8AAA… wallets, RFC-1918 IPs) — respects .gitleaks.toml.

🤖 Generated with Claude Code

Source-vs-tests audit of the four testing tiers. Adds coverage at the lowest honest tier (all tier 1) for branches with no covering test, and fixes one real bug found along the way. Bug fix: - storage_service.save_snapshot: a non-serializable snapshot (TypeError from json.dumps) logged the error but left db_healthy=True, so the #131 "DB write failing" badge never fired while snapshots silently stopped persisting. Route it through _db_error like every other write path. Tests added (tier 1): - pytest: unserializable snapshot flips db_healthy (regression guard for above). - pithead shell: firewall install-failure rollback (#270); wallet subaddress / integrated hard-fail (#250); remote-mode-without-host and non-/24 subnet guards (#180); ensure_owner conditional + whole-tree recursive chown (#255). - frontend: ProxyTotals high-reject styling; per-worker api/reject badges + XvB/Unknown pool badges; the #278/#313 Tari-✔ invariant (never on active-but-dead); Gauge done-vs-syncing. Docs: - test-inventory regenerated (764 -> 770 cases); testing-strategy Known-gaps gains a "Coverage-audit follow-ups" section listing the tier-3/4 backstops still needed (real-kernel firewall rollback, real mixed-ownership ensure_owner, real-container monerod failover, non-blocking-Tari ignore, busy/reorg, double-outage, doctor NTP check, ENOSPC, Tor-down, insecure+main row). make lint + make test green; dashboard 94.73% cov; patch coverage 100%. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…e_owner The two divergence risks the audit flagged — paths PR #330 proves only with stubs — now have live counterparts, folded into the existing opt-in phases (no new flags), following the --auth-fail-closed "live counterpart to a tier-1 assertion" pattern. - --fault-injection: fault_firewall_rollback shadows iptables with a wrapper that fails every `-I`, re-runs apply_tor_egress_firewall, and asserts the box ends fail-closed (no pithead-tor-egress rule left half-installed), then reinstates the real firewall. Belt-and-braces reinstates unconditionally so a mid-phase abort can't leave clearnet egress open. Proves the real-kernel strip the tier-1 stub can't (#270). - --lifecycle: plants a root-owned file under the dashboard data dir and asserts the pool-flip apply (ensure_directories -> ensure_owner) chowns it to uid 1000 — the #255 "scan contents, not just the dir" regression, which needs root to set up so it can't live at tier 1. Both are DESTRUCTIVE-then-restored, local-box only, and run at the release gate (not CI). Help text, testing-strategy Known-gaps, and the test inventory updated. Verified: shellcheck + shfmt clean, bash -n parses, integration self-test 97/0, make lint green, inventory drift-check green. NOT executed against a real box — that happens at the release gate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

VijitSingh97 · 2026-07-01T01:44:01Z

Follow-up commit eb3b336: the first two "additional gaps" listed above are now implemented as tier-4 live backstops, folded into the existing --fault-injection / --lifecycle phases (no new flags), following the --auth-fail-closed "live counterpart to a tier-1 assertion" pattern.

--fault-injection → fault_firewall_rollback: shadows iptables with a wrapper that fails every -I, re-runs apply_tor_egress_firewall, asserts the box ends fail-closed (no pithead-tor-egress rule left half-installed), then reinstates the real firewall. Belt-and-braces reinstate is unconditional, so a mid-phase abort can't leave clearnet egress open (Enforce Tor-only egress at the network layer (fail-closed) — stop relying on per-app config #270).
--lifecycle: plants a root-owned file under the dashboard data dir and asserts the pool-flip apply (ensure_directories → ensure_owner) chowns it to uid 1000 — the Run containers as a non-root user (drop root inside the container) #255 "scan contents, not just the dir" regression.

Verified: shellcheck + shfmt clean, bash -n parses, integration self-test 97/0, make lint green, inventory drift-check green. Not executed against a real box — both are destructive-then-restored, local-box only, and run at the release gate. The remaining 9 gaps stay tracked in docs/testing-strategy.md.

…iptables Validated the two tier-4 backstops live on a real box (gouda). The ensure_owner migration passed as written. The firewall-rollback check did NOT exercise the rollback: apply_tor_egress_firewall calls `sudo iptables -I`, and sudo's secure_path ignores a PATH-shadowed iptables, so the real insert succeeded and the rollback branch never ran (the rc-0 assertion passed for the wrong reason — a false green). Fix: shadow `sudo` instead of `iptables`. sudo is still resolved via PATH, so a wrapper that fails an `iptables … -I …` insert and execs real sudo for everything else (remove's -D, -N, iptables-save) makes the insert fail exactly as a real mid-insert error would. Re-validated live: with the real -I failing, the partial ruleset is rolled back (0 pithead-tor-egress rules), then reinstated (7). Box restored byte-identical to its pre-test iptables snapshot. shellcheck + shfmt clean, bash -n parses, self-test 97/0, inventory drift green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

VijitSingh97 · 2026-07-01T01:49:59Z

Live-validated on a real box (commit `72fae38`)

Ran the two tier-4 backstops against a real synced box. Validation caught a real bug in the firewall test — exactly the kind of divergence tier-4 exists to find:

ensure_owner (Run containers as a non-root user (drop root inside the container) #255) — passed as written: a root-owned inode planted deep in a tree is migrated to uid 1000; a clean tree is a graceful no-op.
Firewall rollback (Enforce Tor-only egress at the network layer (fail-closed) — stop relying on per-app config #270) — first run failed: apply_tor_egress_firewall calls sudo iptables -I, and sudo's secure_path ignores a PATH-shadowed iptables, so the real insert succeeded and the rollback branch never ran (the rc-0 assertion was a false green). Fixed by shadowing sudo instead (still resolved via PATH; fails only the iptables … -I insert, execs real sudo otherwise). Re-validated: with the real -I failing, the partial ruleset rolls back to 0 pithead-tor-egress rules, then reinstates to 7.

The box was restored byte-identical to its pre-test iptables-save snapshot; deployed pithead untouched (sha unchanged), stack healthy, all artifacts removed.

Note: the box runs release v1.0.3 (which predates these functions), so validation used the develop-branch pithead at a temp path against throwaway targets — it exercises the real functions on the real kernel, not the full run.sh harness end-to-end. The full harness run happens at the release gate once develop is deployed there.

VijitSingh97 and others added 2 commits June 30, 2026 20:31

VijitSingh97 merged commit f1e6db2 into develop Jul 1, 2026
16 checks passed

VijitSingh97 deleted the claude/gallant-hodgkin-0cfe2c branch July 1, 2026 11:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: cover flagged edge cases + fix snapshot persistence-health bug#330

test: cover flagged edge cases + fix snapshot persistence-health bug#330
VijitSingh97 merged 3 commits into
developfrom
claude/gallant-hodgkin-0cfe2c

VijitSingh97 commented Jul 1, 2026

Uh oh!

VijitSingh97 commented Jul 1, 2026

Uh oh!

VijitSingh97 commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

VijitSingh97 commented Jul 1, 2026

What

The bug (found while auditing)

Tests added (tier 1)

Docs

Additional gaps found — not covered here (need Docker / real box → tier 3/4)

Verification (all local)

Uh oh!

VijitSingh97 commented Jul 1, 2026

Uh oh!

VijitSingh97 commented Jul 1, 2026

Live-validated on a real box (commit 72fae38)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Live-validated on a real box (commit `72fae38`)