Skip to content

CI with non std functional test#417

Open
Naviabheeman wants to merge 4 commits into
chaintope:masterfrom
Naviabheeman:CIwithNonStdFunctionalTest
Open

CI with non std functional test#417
Naviabheeman wants to merge 4 commits into
chaintope:masterfrom
Naviabheeman:CIwithNonStdFunctionalTest

Conversation

@Naviabheeman

Copy link
Copy Markdown
Contributor

Move the functional tests in debug mode to extended tests as debug mode CI does not run the tests at all. So some errors like debug lock order were missed in the previous testing of v0.7.2 changes. Non-standard scripts used in the debug mode are changed to other standard scripts.
A new CI runs every week to verify the long running tests like feature_pruning.

@azuchi azuchi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of PR #417 — CI with non-std functional test

Verified scope: PR #417 head 575bd1a56, base master. Branched from 1f0759daa (PR #409 merge). 3 commits actually touch only 10 CI/test files (+472 / -163), but the branch base is before PR #411–414 were merged.


✅ Strategic intent — strongly endorsed

The PR description correctly identifies a real testing gap: debug-mode functional tests were never executed in CI, which is precisely why the cs_inventory → cs_main lock-order bug slipped through pre-merge testing of PR #414 and only surfaced in functional testing post-merge. This PR materially raises CI quality.

Content-level changes look sensible:

  • feature_block.py: replace bare OP_TRUE outputs with P2SH(OP_TRUE) so tests run under default policy — correct standardization
  • feature_cltv.py: drop -acceptnonstdtxn=1, adapt testmempoolaccept assertion — consistent with the policy-vs-consensus split
  • feature_csv_activation.py: remove BIP9-activation scaffolding and nFeatures=2 paths — correct (Tapyrus does not use BIP9 nor accept nFeatures=2)
  • feature_pruning.py + blocktools.py::create_tx_with_large_script: replace ~10 KB push-only non-standard scriptPubKey with OP_RETURN + null data — the existing test was silently producing tiny blocks and never exercising the prune path; the OP_RETURN form is standard, doesn't create UTXOs, and the -datacarriersize=10000 plumbing is correct
  • feature_nonstd_txn.py (new): two-node fixture (-acceptnonstdtxn=0 vs =1) verifying that mempool policy diverges while consensus accepts non-standard txns in blocks — exactly the right shape
  • heavy-functional-tests.yml (new): weekly schedule with 480-minute timeout, separate from daily matrix — good fit for feature_pruning.py (1.5–4 h, 4 GB disk)
  • daily-test.yml: invoke test_runner.py ... --debugscripts in the Debug build_type before early exit so debug-only tests actually run

🚨 Blocker — stale base, MUST rebase

The merge base of this PR is 1f0759daa ("Improve CI stability (#409)", merged before PR #411–414). All four security PRs are absent from this branch:

Missing on PR #417 head:
  src/softforkmanager.h
  src/issuedcolorids.{cpp,h}
  test/functional/feature_cp2sh_softfork.py
  (and all the consensus/script/wallet/RPC changes from #411–414)

This is why git diff master..pr417 --stat reports 65 files / +847 / -2414 — the deletions are stale-base artifacts, not the PR's intended changes (git diff $(git merge-base master pr417)..pr417 --stat shows the real scope: 10 files, +472 / -163).

Without a rebase, two real risks at merge time:

B1. After rebase, feature_cp2sh_softfork.py could silently disappear from CI

Master's test_runner.py:185 (added by PR #414) lists:

'feature_cp2sh_softfork.py',

in BASE_SCRIPTS. PR #417's test_runner.py does NOT list it (it was written before PR #414 existed). The two branches will conflict on test_runner.py. The rebase MUST preserve master's feature_cp2sh_softfork.py entry; resolving the conflict by taking PR #417's version wholesale will silently drop the entire CP2SH softfork test from CI.

Suggested resolution: keep the new BASE_SCRIPTS ordering from PR #417 AND re-insert feature_cp2sh_softfork.py in the appropriate group.

B2. Debug build_type CI run must have CP2SH_ACTIVATION_TEST_HEIGHT=120 defined

PR #417's daily-test.yml change adds, in the Debug build branch:

if [ "${{ matrix.config.build_type }}" == "Debug" ]; then
    python3 test/functional/test_runner.py ... --debugscripts
    ...
fi

--debugscripts does test_list += DEBUG_MODE_SCRIPTS (line 346–347), so the actual list = BASE_SCRIPTS + DEBUG_MODE_SCRIPTS. After rebase, BASE_SCRIPTS will include feature_cp2sh_softfork.py, which requires the binary to be built with -DCP2SH_ACTIVATION_TEST_HEIGHT=120 (otherwise the softfork activates at the production height of 693367 and the test will hang / timeout / falsely pass).

Verify that the Debug build_type matrix entry in daily-test.yml also injects -DCP2SH_ACTIVATION_TEST_HEIGHT=120 into CMAKE_OPTS. If it currently only injects it for the native build path (master's daily-test.yml:547), this needs to be extended.


⚠️ Minor observations

O1. p2p_compactblocks.py moved from DEBUG_MODE_SCRIPTS to EXTENDED_SCRIPTS

The move is justified (compact blocks are standard and should run unconditionally), but EXTENDED tests do not run on every daily build. Given the recently-confirmed M2 finding on master (MarkBlockAsReceived leaks mapBlocksInFlight entries when MAX_CMPCTBLOCKS_INFLIGHT_PER_BLOCK is reached), please verify EXTENDED is run at least nightly so this regression class is exercised on a regular cadence.

O2. heavy-functional-tests.yml weekly cadence — confirm Sunday matches release rhythm

schedule: cron runs every Sunday. If release tagging happens Monday–Wednesday, the most recent heavy-test run will always be 0–2 days stale. Acceptable, but worth confirming.

O3. feature_csv_activation.py CSV-active-from-genesis claim should be cross-checked

The PR removes the BIP9-activation paths because "CSV is always active from genesis in Tapyrus." This is true (Tapyrus never used nVersion-bit signaling), but please verify the residual CSV tests still cover both pre/post activation regimes via timing rather than version-bit signaling. The diff removes a substantial body of bip113, bip68, bip112 test paths — confirm what remains still exercises CSV correctness.


Recommended actions

Required before merge:

  1. Rebase onto current master
  2. Resolve test_runner.py conflict by preserving master's feature_cp2sh_softfork.py entry alongside PR #417's reorganization (B1)
  3. Verify the Debug build_type CI entry in daily-test.yml passes -DCP2SH_ACTIVATION_TEST_HEIGHT=120 to CMake (B2)
  4. After rebase, post a fresh diff so reviewers can verify the security PRs (#411–414) are unaffected

Recommended:
5. Address O1 (compact-block test cadence)
6. Address O3 (residual CSV test coverage)


Verification: merge base computed via git merge-base master pr417 = 1f0759daa (PR #409). Real PR scope via git diff $(git merge-base master pr417)..pr417 --stat = 10 files. Conflict candidates via set intersection of git diff $(merge-base)..pr417 --name-only vs git diff $(merge-base)..master --name-only = daily-test.yml, test_runner.py.

Naviabheeman and others added 3 commits June 2, 2026 22:00
…sts; add debug-mode test and CI integration

- feature_block.py: replace bare OP_TRUE outputs with P2SH(OP_TRUE); fix txout_b3
  uniqueness (value=2 vs b2.vtx[1] value=1 to ensure different malfixsha256); remove
  -acceptnonstdtxn=1 from extra_args; move to BASE_SCRIPTS
- feature_cltv.py: remove -acceptnonstdtxn=1; adapt testmempoolaccept assertion; move
  to BASE_SCRIPTS
- feature_csv_activation.py: remove nFeatures=2 txns (only features=1 is valid in
  Tapyrus); fix bip68timetxs (time txs pass at block 581); remove activation code
  (CSV is always active from genesis in Tapyrus)
- p2p_compactblocks.py: remove -acceptnonstdtxn=1; move to EXTENDED_SCRIPTS
- feature_nonstd_txn.py: new test with two nodes (standard vs -acceptnonstdtxn=1)
  verifying mempool policy differs while consensus accepts non-standard txns in blocks;
  added to DEBUG_MODE_SCRIPTS
- test_runner.py: migrate all former DEBUG_MODE_SCRIPTS to BASE/EXTENDED_SCRIPTS;
  rebuild DEBUG_MODE_SCRIPTS with only feature_nonstd_txn.py
- daily-test.yml: run BASE_SCRIPTS + DEBUG_MODE_SCRIPTS (--debugscripts) for debug
  builds before early exit, so -acceptnonstdtxn tests execute in debug CI

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ETURN

create_tx_with_large_script previously built a ~10 KB push-only scriptPubKey that is
non-standard, causing sendrawtransaction to reject the transaction without
-acceptnonstdtxn. As a result, mine_large_block silently produced tiny blocks and
pruning never triggered ("does not invoke the intended test path").

Fix: replace the non-standard output with an OP_RETURN carrying MAX_SCRIPT_SIZE-10 bytes
of null data. OP_RETURN is a standard null-data script (no UTXO created, no UTXO-set
bloat across 9000 blocks). Add -datacarriersize=10000 to all pruning-test node args so
the ~10 KB OP_RETURN passes mempool standardness checks. Re-enable feature_pruning.py
in EXTENDED_SCRIPTS.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
feature_pruning.py takes 1.5–4 hours and uses 4 GB disk — too long for
the daily CI matrix. Move it from EXTENDED_SCRIPTS into a new HEAVY_SCRIPTS
category and create a dedicated weekly workflow (heavy-functional-tests.yml)
that runs every Sunday on Linux x86_64 RelWithDebInfo with GUI+USDT+Wallet.

The workflow shares ccache and depends caches with the daily-test job so
rebuilds are fast. A commented-out USDT section is scaffolded for future use.

Update the missed_tests checker to cover HEAVY_SCRIPTS and USDT_SCRIPTS so
no test file goes untracked.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Naviabheeman Naviabheeman force-pushed the CIwithNonStdFunctionalTest branch from 575bd1a to 90eb83b Compare June 2, 2026 16:31
@Naviabheeman Naviabheeman requested a review from azuchi June 2, 2026 16:48

@azuchi azuchi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review after rebase (head 90eb83ba8)

✅ Prior blockers resolved

  • B1: test_runner.py:191 preserves feature_cp2sh_softfork.py in BASE_SCRIPTS
  • B2: daily-test.yml:551 injects -DCP2SH_ACTIVATION_TEST_HEIGHT=120 for all native builds; :698 runs --debugscripts in the Debug branch ✓
  • Rebase clean (merge base now 54a27e153, diff scope 10 files / +472 / -163)

🚨 New blocker — Smoke Test #600 fails on 4/6 jobs

Commit 2f66ee7ae rewrote test_framework/blocktools.py::create_tx_with_large_script to emit OP_RETURN [MAX_SCRIPT_SIZE - 10] (≈10 KB). MAX_OP_RETURN_RELAY defaults to 83 bytes (src/script/standard.h:42), so the tx is rejected by mempool standardness unless -datacarriersize=10000 is set.

Only one of three callers was updated:

Caller Updated?
test/functional/feature_pruning.py ✅ adds -datacarriersize=10000
test/functional/feature_largeblocksize.py ❌ — causes the Smoke Test failure (AssertionError: <txid> not found in mempool at line 102)
test/functional/feature_reindex.py ❌ — masked by --failfast, but same vector

Fix: pass -datacarriersize=10000 to nodes in feature_largeblocksize.py::set_test_params and feature_reindex.py (same pattern as feature_pruning.py).

Failing jobs: Linux x86_64, macOS ARM64, macOS x86_64, Linux ARM64.

…largeblocksize to allow large scripts and thereafter large transactions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants