CI with non std functional test#417
Conversation
azuchi
left a comment
There was a problem hiding this comment.
Review of PR #417 — CI with non-std functional test
Verified scope: PR #417 head 575bd1a56, base master. Branched from 1f0759daa (PR #409 merge). 3 commits actually touch only 10 CI/test files (+472 / -163), but the branch base is before PR #411–414 were merged.
✅ Strategic intent — strongly endorsed
The PR description correctly identifies a real testing gap: debug-mode functional tests were never executed in CI, which is precisely why the cs_inventory → cs_main lock-order bug slipped through pre-merge testing of PR #414 and only surfaced in functional testing post-merge. This PR materially raises CI quality.
Content-level changes look sensible:
feature_block.py: replace bare OP_TRUE outputs with P2SH(OP_TRUE) so tests run under default policy — correct standardizationfeature_cltv.py: drop-acceptnonstdtxn=1, adapt testmempoolaccept assertion — consistent with the policy-vs-consensus splitfeature_csv_activation.py: remove BIP9-activation scaffolding andnFeatures=2paths — correct (Tapyrus does not use BIP9 nor accept nFeatures=2)feature_pruning.py+blocktools.py::create_tx_with_large_script: replace ~10 KB push-only non-standard scriptPubKey withOP_RETURN+ null data — the existing test was silently producing tiny blocks and never exercising the prune path; the OP_RETURN form is standard, doesn't create UTXOs, and the-datacarriersize=10000plumbing is correctfeature_nonstd_txn.py(new): two-node fixture (-acceptnonstdtxn=0vs=1) verifying that mempool policy diverges while consensus accepts non-standard txns in blocks — exactly the right shapeheavy-functional-tests.yml(new): weekly schedule with 480-minute timeout, separate from daily matrix — good fit forfeature_pruning.py(1.5–4 h, 4 GB disk)daily-test.yml: invoketest_runner.py ... --debugscriptsin the Debug build_type before early exit so debug-only tests actually run
🚨 Blocker — stale base, MUST rebase
The merge base of this PR is 1f0759daa ("Improve CI stability (#409)", merged before PR #411–414). All four security PRs are absent from this branch:
Missing on PR #417 head:
src/softforkmanager.h
src/issuedcolorids.{cpp,h}
test/functional/feature_cp2sh_softfork.py
(and all the consensus/script/wallet/RPC changes from #411–414)
This is why git diff master..pr417 --stat reports 65 files / +847 / -2414 — the deletions are stale-base artifacts, not the PR's intended changes (git diff $(git merge-base master pr417)..pr417 --stat shows the real scope: 10 files, +472 / -163).
Without a rebase, two real risks at merge time:
B1. After rebase, feature_cp2sh_softfork.py could silently disappear from CI
Master's test_runner.py:185 (added by PR #414) lists:
'feature_cp2sh_softfork.py',in BASE_SCRIPTS. PR #417's test_runner.py does NOT list it (it was written before PR #414 existed). The two branches will conflict on test_runner.py. The rebase MUST preserve master's feature_cp2sh_softfork.py entry; resolving the conflict by taking PR #417's version wholesale will silently drop the entire CP2SH softfork test from CI.
Suggested resolution: keep the new BASE_SCRIPTS ordering from PR #417 AND re-insert feature_cp2sh_softfork.py in the appropriate group.
B2. Debug build_type CI run must have CP2SH_ACTIVATION_TEST_HEIGHT=120 defined
PR #417's daily-test.yml change adds, in the Debug build branch:
if [ "${{ matrix.config.build_type }}" == "Debug" ]; then
python3 test/functional/test_runner.py ... --debugscripts
...
fi--debugscripts does test_list += DEBUG_MODE_SCRIPTS (line 346–347), so the actual list = BASE_SCRIPTS + DEBUG_MODE_SCRIPTS. After rebase, BASE_SCRIPTS will include feature_cp2sh_softfork.py, which requires the binary to be built with -DCP2SH_ACTIVATION_TEST_HEIGHT=120 (otherwise the softfork activates at the production height of 693367 and the test will hang / timeout / falsely pass).
Verify that the Debug build_type matrix entry in daily-test.yml also injects -DCP2SH_ACTIVATION_TEST_HEIGHT=120 into CMAKE_OPTS. If it currently only injects it for the native build path (master's daily-test.yml:547), this needs to be extended.
⚠️ Minor observations
O1. p2p_compactblocks.py moved from DEBUG_MODE_SCRIPTS to EXTENDED_SCRIPTS
The move is justified (compact blocks are standard and should run unconditionally), but EXTENDED tests do not run on every daily build. Given the recently-confirmed M2 finding on master (MarkBlockAsReceived leaks mapBlocksInFlight entries when MAX_CMPCTBLOCKS_INFLIGHT_PER_BLOCK is reached), please verify EXTENDED is run at least nightly so this regression class is exercised on a regular cadence.
O2. heavy-functional-tests.yml weekly cadence — confirm Sunday matches release rhythm
schedule: cron runs every Sunday. If release tagging happens Monday–Wednesday, the most recent heavy-test run will always be 0–2 days stale. Acceptable, but worth confirming.
O3. feature_csv_activation.py CSV-active-from-genesis claim should be cross-checked
The PR removes the BIP9-activation paths because "CSV is always active from genesis in Tapyrus." This is true (Tapyrus never used nVersion-bit signaling), but please verify the residual CSV tests still cover both pre/post activation regimes via timing rather than version-bit signaling. The diff removes a substantial body of bip113, bip68, bip112 test paths — confirm what remains still exercises CSV correctness.
Recommended actions
Required before merge:
- Rebase onto current
master - Resolve
test_runner.pyconflict by preserving master'sfeature_cp2sh_softfork.pyentry alongside PR #417's reorganization (B1) - Verify the Debug build_type CI entry in
daily-test.ymlpasses-DCP2SH_ACTIVATION_TEST_HEIGHT=120to CMake (B2) - After rebase, post a fresh diff so reviewers can verify the security PRs (#411–414) are unaffected
Recommended:
5. Address O1 (compact-block test cadence)
6. Address O3 (residual CSV test coverage)
Verification: merge base computed via git merge-base master pr417 = 1f0759daa (PR #409). Real PR scope via git diff $(git merge-base master pr417)..pr417 --stat = 10 files. Conflict candidates via set intersection of git diff $(merge-base)..pr417 --name-only vs git diff $(merge-base)..master --name-only = daily-test.yml, test_runner.py.
…sts; add debug-mode test and CI integration - feature_block.py: replace bare OP_TRUE outputs with P2SH(OP_TRUE); fix txout_b3 uniqueness (value=2 vs b2.vtx[1] value=1 to ensure different malfixsha256); remove -acceptnonstdtxn=1 from extra_args; move to BASE_SCRIPTS - feature_cltv.py: remove -acceptnonstdtxn=1; adapt testmempoolaccept assertion; move to BASE_SCRIPTS - feature_csv_activation.py: remove nFeatures=2 txns (only features=1 is valid in Tapyrus); fix bip68timetxs (time txs pass at block 581); remove activation code (CSV is always active from genesis in Tapyrus) - p2p_compactblocks.py: remove -acceptnonstdtxn=1; move to EXTENDED_SCRIPTS - feature_nonstd_txn.py: new test with two nodes (standard vs -acceptnonstdtxn=1) verifying mempool policy differs while consensus accepts non-standard txns in blocks; added to DEBUG_MODE_SCRIPTS - test_runner.py: migrate all former DEBUG_MODE_SCRIPTS to BASE/EXTENDED_SCRIPTS; rebuild DEBUG_MODE_SCRIPTS with only feature_nonstd_txn.py - daily-test.yml: run BASE_SCRIPTS + DEBUG_MODE_SCRIPTS (--debugscripts) for debug builds before early exit, so -acceptnonstdtxn tests execute in debug CI Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ETURN
create_tx_with_large_script previously built a ~10 KB push-only scriptPubKey that is
non-standard, causing sendrawtransaction to reject the transaction without
-acceptnonstdtxn. As a result, mine_large_block silently produced tiny blocks and
pruning never triggered ("does not invoke the intended test path").
Fix: replace the non-standard output with an OP_RETURN carrying MAX_SCRIPT_SIZE-10 bytes
of null data. OP_RETURN is a standard null-data script (no UTXO created, no UTXO-set
bloat across 9000 blocks). Add -datacarriersize=10000 to all pruning-test node args so
the ~10 KB OP_RETURN passes mempool standardness checks. Re-enable feature_pruning.py
in EXTENDED_SCRIPTS.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
feature_pruning.py takes 1.5–4 hours and uses 4 GB disk — too long for the daily CI matrix. Move it from EXTENDED_SCRIPTS into a new HEAVY_SCRIPTS category and create a dedicated weekly workflow (heavy-functional-tests.yml) that runs every Sunday on Linux x86_64 RelWithDebInfo with GUI+USDT+Wallet. The workflow shares ccache and depends caches with the daily-test job so rebuilds are fast. A commented-out USDT section is scaffolded for future use. Update the missed_tests checker to cover HEAVY_SCRIPTS and USDT_SCRIPTS so no test file goes untracked. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
575bd1a to
90eb83b
Compare
azuchi
left a comment
There was a problem hiding this comment.
Re-review after rebase (head 90eb83ba8)
✅ Prior blockers resolved
- B1:
test_runner.py:191preservesfeature_cp2sh_softfork.pyinBASE_SCRIPTS✓ - B2:
daily-test.yml:551injects-DCP2SH_ACTIVATION_TEST_HEIGHT=120for all native builds;:698runs--debugscriptsin the Debug branch ✓ - Rebase clean (merge base now
54a27e153, diff scope 10 files / +472 / -163)
🚨 New blocker — Smoke Test #600 fails on 4/6 jobs
Commit 2f66ee7ae rewrote test_framework/blocktools.py::create_tx_with_large_script to emit OP_RETURN [MAX_SCRIPT_SIZE - 10] (≈10 KB). MAX_OP_RETURN_RELAY defaults to 83 bytes (src/script/standard.h:42), so the tx is rejected by mempool standardness unless -datacarriersize=10000 is set.
Only one of three callers was updated:
| Caller | Updated? |
|---|---|
test/functional/feature_pruning.py |
✅ adds -datacarriersize=10000 |
test/functional/feature_largeblocksize.py |
❌ — causes the Smoke Test failure (AssertionError: <txid> not found in mempool at line 102) |
test/functional/feature_reindex.py |
❌ — masked by --failfast, but same vector |
Fix: pass -datacarriersize=10000 to nodes in feature_largeblocksize.py::set_test_params and feature_reindex.py (same pattern as feature_pruning.py).
Failing jobs: Linux x86_64, macOS ARM64, macOS x86_64, Linux ARM64.
…largeblocksize to allow large scripts and thereafter large transactions
Move the functional tests in debug mode to extended tests as debug mode CI does not run the tests at all. So some errors like debug lock order were missed in the previous testing of v0.7.2 changes. Non-standard scripts used in the debug mode are changed to other standard scripts.
A new CI runs every week to verify the long running tests like feature_pruning.