Skip to content

fix(network): close multi-network gaps before #297 expansion#510

Merged
silent-cipher merged 4 commits into
refactor/add-network-to-schemafrom
pr-463-fixes
May 22, 2026
Merged

fix(network): close multi-network gaps before #297 expansion#510
silent-cipher merged 4 commits into
refactor/add-network-to-schemafrom
pr-463-fixes

Conversation

@SgtPooki

@SgtPooki SgtPooki commented May 6, 2026

Copy link
Copy Markdown
Collaborator

Builds on top of #463. Tested live in a kind cluster against PR #463's HEAD; surfaced three gaps that bite the multi-network expansion path. None affect today's calibration-only deployment, but all three need to land before anyone flips on a second network. Filing as a friendly stack on top of #463 so it's easy to review the delta.

What changed

  1. NETWORK env is now Joi.required(). The migration's "fail-fast if NETWORK missing" check was bypassed because Joi.default("calibration") writes back to process.env before TypeORM runs migrations. A mainnet operator who forgets to set NETWORK would silently backfill every row as calibration.
  2. DealService and DevToolsService set deal.network from blockchain config on every insert. The migration's per-column DEFAULT '<backfill>' is dropped at the end of up() so future writes that forget to set network fail with NOT NULL violation instead of inheriting a frozen default.
  3. network is threaded through ScheduleRow and the pg-boss payload types (SpJobData, ProvidersRefreshJobData, DataRetentionJobData). Workers receive network in the job data instead of relying on whatever NETWORK env the running pod was started with.

How we found these

Spun up the existing kind cluster on main, applied #463's image, then ran a small validation matrix:

  • Phase 1 (fail-fast): unset NETWORK from the configmap, restarted dealbot. Migration ran cleanly with 'calibration' backfill anyway. kubectl exec ... node -e "console.log(process.env.NETWORK)" showed undefined in a fresh node process. NestJS ConfigModule mutates process.env from Joi defaults during validation, so by the time TypeORM hits the migration, process.env.NETWORK = "calibration" is set even when the operator never provided it.
  • Phase 3 (write threading): triggered a deal via /api/dev/deal with the column DEFAULT temporarily set to 'mainnet'. Got FK_deals_storage_providers violation since (spAddress, mainnet) doesn't exist in storage_providers. Dropped the DEFAULT entirely → every insert crashed with null value in column "network" violates not-null constraint. Confirms DealService never sets deal.network; it relied on the DB DEFAULT.
  • Phase 4 (worker dispatch): inserted a synthetic mainnet row for an SP that already exists on calibration, plus a dual-network schedule. After one scheduler tick, pgboss.job had two sp.work jobs with identical singleton_key and identical data payload {"jobType":"deal","spAddress":"...","intervalSeconds":...}. No network field. A worker pinned to one network has no way to know which schedule originated the job.

Schema migration up + down both work cleanly when NETWORK is set. Today's single-network deployments keep working unchanged because the DB DEFAULT happens to match config.

How to verify

pnpm --filter dealbot-backend typecheck
pnpm test

For a deeper check: spin up make up, restart with NETWORK unset → app should now fail to boot at config validation instead of silently running. Set NETWORK=calibration, redeploy → migration runs and DROP DEFAULT clauses fire at the end of up(). Verify \d deals shows no DEFAULT on network.

Notes

  • Diff is ~57/-27 across 8 files. Test fixtures bulk-updated to add network: "calibration" where the new types now require it.
  • The DROP DEFAULT change is in the same migration as the backfill — operators upgrading from a deployment that already ran feat: add network column to schema for multi-network support #463's migration won't see the DEFAULT removed without an additional migration. Worth flagging if the staging deploy already migrated; happy to split into a follow-up migration if that's preferable.
  • Doesn't fix every network-scoping gap (retrieval/piece-cleanup reads still query by address only). Those are lower-impact for a single dealbot should be able to talk to both mainnet & calibration #297's first step. Filing separately makes sense.

@FilOzzy FilOzzy added this to FOC May 6, 2026
@github-project-automation github-project-automation Bot moved this to 📌 Triage in FOC May 6, 2026
@SgtPooki SgtPooki requested a review from silent-cipher May 6, 2026 15:44
@SgtPooki SgtPooki self-assigned this May 6, 2026
@SgtPooki SgtPooki requested a review from Copilot May 6, 2026 15:45
- NETWORK env now required (Joi). Migration's fail-fast was bypassed by
  Joi default writing back to process.env.
- DealService + dev-tools set deal.network from config on every insert.
  Drop DB DEFAULT clauses post-backfill so missed write paths fail loudly.
- Thread network through ScheduleRow and pg-boss payloads (SpJobData,
  ProvidersRefreshJobData, DataRetentionJobData) so workers can route
  by network instead of relying on the running pod's NETWORK env.
@BigLep BigLep moved this from 📌 Triage to 🔎 Awaiting review in FOC May 6, 2026
@BigLep BigLep moved this from 🔎 Awaiting review to 🐱 Todo in FOC May 12, 2026
@BigLep BigLep assigned silent-cipher and unassigned SgtPooki May 12, 2026
@SgtPooki SgtPooki moved this from 🐱 Todo to 🔎 Awaiting review in FOC May 18, 2026
@BigLep BigLep added this to the M4.5: GA Fast Follows milestone May 19, 2026

@silent-cipher silent-cipher left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I resolved the merge conflicts here and closed a few more gaps in these two commits - 8e512b6, d1e342c.
It looks good to me now!

@github-project-automation github-project-automation Bot moved this from 🔎 Awaiting review to ✔️ Approved by reviewer in FOC May 22, 2026
@silent-cipher silent-cipher merged commit 0968395 into refactor/add-network-to-schema May 22, 2026
9 checks passed
@silent-cipher silent-cipher deleted the pr-463-fixes branch May 22, 2026 07:55
@github-project-automation github-project-automation Bot moved this from ✔️ Approved by reviewer to 🎉 Done in FOC May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🎉 Done

Development

Successfully merging this pull request may close these issues.

4 participants