Skip to content

refactor(config): per-network shared-value inheritance + resolve #570 review findings#621

Open
SgtPooki wants to merge 5 commits into
refactor/multi-network-configfrom
feat/network-config-inheritance
Open

refactor(config): per-network shared-value inheritance + resolve #570 review findings#621
SgtPooki wants to merge 5 commits into
refactor/multi-network-configfrom
feat/network-config-inheritance

Conversation

@SgtPooki

@SgtPooki SgtPooki commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Built on top of #570.

What changed

Per-network env vars now resolve <NETWORK>_<VAR> (override) → unprefixed <VAR> (shared) → built-in default. So operators set shared tuning once and override per network only where it differs, instead of duplicating every value across CALIBRATION_* and MAINNET_*. A single source of truth (config/network-fields.ts) classifies the chain-specific vars that never inherit (credentials, RPC/subgraph endpoints, blocklists, dataset version, and the lifecycle-check flag whose default is network-dependent) and drives both the loader and the Joi schema.

Shared unprefixed values are validated against the same rules as their prefixed overrides. Per-network Joi .default()s were removed (the loader owns all per-network defaults) because NestJS assigns Joi defaults back into process.env, where an injected prefixed default would shadow a shared value.

This PR also resolves the review findings on #570:

  • TARGET < MAX is asserted post-resolution in the loader (the old Joi check read the wrong unprefixed key and never fired)
  • renamed legacy vars (DEALBOT_MAINTENANCE_WINDOWS_UTC, DEALBOT_MAINTENANCE_WINDOW_MINUTES, JOB_PIECE_CLEANUP_PER_SP_PER_HOUR) are promoted to their current names instead of silently reverting to defaults
  • empty/whitespace NETWORKS is rejected instead of booting an idle process
  • getBooleanEnv no longer treats "0"/"no"/"False" as true
  • tryGetSynapse() replaces the dead getSynapse() ?? createSynapseInstance() fallback (getSynapse threw before the fallback could run)
  • retrieval provider lookup filters by (address, network) to stop cross-network row leaks
  • shutdown drain timeout includes maxPieceCleanupRuntimeSeconds
  • dropped the import-time process.env.NETWORK default Prometheus label

How to verify

pnpm test

92 config specs + 538 backend tests pass; typecheck and biome clean. The new config-pipeline.spec.ts exercises the real NestJS boot order (validate → assign → load), including a .env-file-sourced shared value and the legacy-rename path.

Notes / risks

  • Backward compatible: existing prefixed ConfigMaps keep working unchanged; inheritance only adds the option to set shared vars once. No deployment-config migration required.
  • Behavior changes to confirm on dashboards: process/default Prometheus metrics no longer carry a global network label (per-metric network labels are unchanged).
  • DATASET_LIFECYCLE_CHECK_ENABLED is chain-specific: a shared value will not enable the canary on mainnet (its default stays off there per Dealbot can't auto-terminate datasets when payer is a Safe multisig #546). Set CALIBRATION_*/MAINNET_* explicitly to diverge.

Reviewed via cross-agent peer review (gemini, codex, cursor, claude) before opening.

@FilOzzy FilOzzy added this to FOC Jun 24, 2026
@github-project-automation github-project-automation Bot moved this to 📌 Triage in FOC Jun 24, 2026
@SgtPooki SgtPooki requested a review from silent-cipher June 24, 2026 18:17
@SgtPooki SgtPooki self-assigned this Jun 24, 2026
SgtPooki added 2 commits June 24, 2026 14:19
…iew findings

Per-network env vars now resolve `<NETWORK>_<VAR>` -> unprefixed `<VAR>` ->
default, so shared tuning is set once and overridden per network only where it
differs. A single source of truth (network-fields.ts) classifies chain-specific
vars (credentials, endpoints, blocklists) that never inherit, and drives both
the loader resolver and the Joi schema.

Validation: shared unprefixed keys are validated against the same rules as their
prefixed overrides (not waved through by allowUnknown). Per-network defaults
moved out of Joi into the loader so NestJS-injected prefixed defaults can't
shadow a shared value.

Also resolves prior-PR review findings on this branch:
- TARGET<MAX is asserted post-resolution in the loader (the Joi .custom read the
  wrong, unprefixed key and never fired)
- legacy var renames (DEALBOT_MAINTENANCE_WINDOWS_UTC, DEALBOT_MAINTENANCE_WINDOW_MINUTES,
  JOB_PIECE_CLEANUP_PER_SP_PER_HOUR) are promoted to their current names
- empty/degenerate NETWORKS is rejected instead of booting idle
- getBooleanEnv no longer treats "0"/"no"/"False" as true
- tryGetSynapse() replaces the dead getSynapse() ?? createSynapseInstance fallback
- retrieval provider lookup filters by (address, network)
- shutdown drain timeout includes maxPieceCleanupRuntimeSeconds
- dropped the import-time process.env.NETWORK default Prometheus label
…sification

Peer review surfaced that `.optional().strip()` on shared keys dropped them from
the validated object NestJS assigns back into process.env, so a shared value
sourced from a `.env` file (or a renamed legacy var promoted on NestJS's env
copy) never reached the loader. Remove `.strip()`: with per-network Joi defaults
already gone, returning present shared keys can't reshadow anything, and it lets
file-sourced shared values reach process.env.

- DATASET_LIFECYCLE_CHECK_ENABLED is now chain-specific: its default is
  network-dependent (off on mainnet, dealbot#546), so a shared `=true` must not
  silently enable the canary on mainnet.
- DATA_RETENTION_POLL_INTERVAL_SECONDS / PROVIDERS_REFRESH_INTERVAL_SECONDS gain
  .integer().min(1) so a 0/negative interval can't busy-loop.
- Add Nest-copy-faithful pipeline tests (file-sourced shared value, multi-network
  rename promotion, no prefixed-default injection) + lifecycle-safety and
  shared-only TARGET<MAX cases.
- Drop stale <NET>_METRICS_PER_HOUR from the env docs (not in the catalog).
@SgtPooki SgtPooki force-pushed the feat/network-config-inheritance branch from df02e03 to 89fe230 Compare June 24, 2026 18:19
@SgtPooki SgtPooki requested a review from Copilot June 24, 2026 18:22
@SgtPooki SgtPooki moved this from 📌 Triage to 🔎 Awaiting review in FOC Jun 24, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors backend configuration to support per-network env var inheritance (prefixed override → unprefixed shared → default) while centralizing the “chain-specific vs inheritable” classification in a single catalog. It also incorporates several follow-up fixes from the review of #570, including safer Synapse initialization fallback behavior, cross-network correctness fixes, and shutdown/metrics improvements.

Changes:

  • Introduces a per-network env-var catalog (network-fields.ts) and uses it to drive both env resolution in the loader and shared/unprefixed Joi validation.
  • Updates config validation + legacy compat behavior (renamed legacy vars promotion, NETWORKS empty rejection, post-resolution TARGET < MAX assertion, boolean coercion changes).
  • Fixes/adjusts multi-network runtime behavior (retrieval provider lookup keyed by (address, network), pg-boss drain timeout includes piece cleanup runtime, removes global Prometheus network default label, adds tryGetSynapse and uses it for on-demand Synapse creation).

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
docs/environment-variables.md Documents per-network inheritance rules and updates the env-var reference (incl. removed section).
apps/backend/.env.example Updates example env file to explain shared overrides + chain-specific requirements.
apps/backend/src/config/network-fields.ts Adds single source of truth for per-network env var classification (inheritable vs chain-specific).
apps/backend/src/config/network-fields.spec.ts Tests catalog partitioning and key classification.
apps/backend/src/config/env.schema.ts Removes per-network Joi defaults, adds shared/unprefixed validation for inheritable keys, tightens NETWORKS validation.
apps/backend/src/config/env.schema.spec.ts Adds tests for shared override validation and stricter NETWORKS handling.
apps/backend/src/config/loader.ts Implements per-network resolution precedence and post-resolution cross-field invariants.
apps/backend/src/config/loader.spec.ts Adds inheritance, post-resolution invariant, and boolean parsing regression tests.
apps/backend/src/config/legacy-env-compat.ts Sources legacy-per-network list from catalog and promotes renamed legacy vars to current names.
apps/backend/src/config/legacy-env-compat.spec.ts Adds tests for renamed legacy var promotion behavior.
apps/backend/src/config/env.helpers.ts Introduces coerce* helpers for loader-driven resolution and updates boolean parsing behavior.
apps/backend/src/config/env.helpers.spec.ts Adds unit tests for the new coercion helpers.
apps/backend/src/config/config-pipeline.spec.ts Adds end-to-end tests for NestJS validate→assign→load behavior, including .env-file-only shared values.
apps/backend/src/wallet-sdk/wallet-sdk.service.ts Adds tryGetSynapse() (non-throwing access for optional state).
apps/backend/src/piece-cleanup/piece-cleanup.service.ts Uses tryGetSynapse() to allow on-demand Synapse creation when not initialized.
apps/backend/src/piece-cleanup/piece-cleanup.service.spec.ts Updates mocks/tests to use tryGetSynapse().
apps/backend/src/deal/deal.service.ts Switches to tryGetSynapse() to allow fallback creation paths.
apps/backend/src/deal/deal.service.spec.ts Updates mocks/tests to use tryGetSynapse().
apps/backend/src/retrieval/retrieval.service.ts Fixes storage provider lookup to filter by (address, network) to avoid cross-network row leakage.
apps/backend/src/jobs/jobs.service.ts Includes maxPieceCleanupRuntimeSeconds in pg-boss drain timeout calculation.
apps/backend/src/metrics-prometheus/metrics-prometheus.module.ts Removes import-time global network default label for Prometheus metrics.

Comment thread docs/environment-variables.md
Comment thread docs/environment-variables.md Outdated
Comment thread apps/backend/.env.example
Comment thread apps/backend/src/config/env.helpers.ts Outdated
Comment thread apps/backend/src/retrieval/retrieval.service.ts

@silent-cipher silent-cipher left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. However, I have one concern regarding the web ui:

#570 updates the /api/config response, which will break the existing ui. We should roll out #576 alongside these change.

@github-project-automation github-project-automation Bot moved this from 🔎 Awaiting review to ✔️ Approved by reviewer in FOC Jun 26, 2026
@BigLep

BigLep commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

@silent-cipher : I assume you'll take on adding any extra commits and/or merging this to the target branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✔️ Approved by reviewer

Development

Successfully merging this pull request may close these issues.

5 participants