Fix intermittent CI failures in daily test runs by Naviabheeman · Pull Request #415 · chaintope/tapyrus-core

Naviabheeman · 2026-06-01T09:34:03Z

AddressCI failures in 759, 760, 787 upto 790

fees.cpp: a longer confirmation target must never yield a higher fee estimate than a shorter one
test_node.py stop_node: catch ConnectionRefusedError and JSONRPCException (HTTP 503) so a mid-shutdown node does not fail the test
test_node.py assert_start_raises_init_error: kill() and wait() the process on timeout/exception so it releases the data-directory lock before the next start attempt
util_time_GetTime: replace BOOST_CHECK_EQUAL with a <=1-second window check to tolerate second boundaries between sequential clock reads
lint-all.sh: add lint-lock-ordering.py to the lint runner

azuchi

Review of PR #415 — Fix intermittent CI failures in daily test runs

Verified scope: head 703c6d57a, base master (54a27e153, up-to-date). 2 commits, +432 / -6 across 5 files.

CI failure investigation: I downloaded the failure logs for the 5 cited runs to verify what each fix actually addresses:

Run	Failing test	Root cause
759	`feature_fee_estimation.py --scheme SCHNORR`	`Estimated fee (0.000565) larger than last fee (0.000478)` — monotonicity violation in `estimateSmartFee`
760	`wallet_multiwallet.py`	`http.client.RemoteDisconnected` during `encryptwallet` (RPC triggers internal shutdown, response never sent)
787	`wallet_multiwallet.py --usecli`	Stderr mismatch — prior `tapyrusd` held the data-directory lock on the next start attempt
789 / 790	`wallet_multiwallet.py`	`JSONRPCException -342: non-JSON HTTP response '503 Service Unavailable'` — 503 from mid-shutdown HTTP server

✅ Strongly endorsed

`test_node.py` shutdown robustness — directly addresses 4 of 5 failures

Each catch precisely matches one observed failure mode:

ConnectionRefusedError → run 760 (the eventual error after RemoteDisconnected during encryptwallet)
JSONRPCException → runs 789 / 790 (503 from mid-shutdown HTTP server)
process.kill() + process.wait() on init-error → run 787 (data-dir lock not released before next start)

Good, focused robustness work.

`util_tests.cpp::util_time_GetTime` — clean fix for a real race

The original BOOST_CHECK_EQUAL(ntime, GetTimeMillis()/1000) was racing the second boundary between three sequential clock reads. The new (ms_secs >= ntime && ms_secs <= ntime + 1) window is the correct shape (monotonic, ≤1 s skew).

`lint-lock-ordering.py` — high-value addition

Lint script verified empirically:

No false positives on current master (Lock ordering lint: OK (no violations found))

Successfully detects the v0.7.2 regression when net_processing.cpp is reverted to the pre-fix state:

FORBIDDEN: cs_inventory → cs_main  (violates canonical hierarchy)
  Found at: src/net_processing.cpp:1394

Detected canonical orderings look sensible: cs_main → cs_wallet (95 sites), cs_main → mempool.cs (6 sites), cs_inventory → cs_filter (3 sites) etc.

This is exactly the safety net that would have caught the v0.7.2 lock-order bug before merge.

🚨 Concerns — must address before merge

B1. `fees.cpp` monotonicity enforcement diverges from upstream and is O(N²)

The added block in estimateSmartFee:

if (confTarget > 2) {
    CFeeRate prevRate = estimateSmartFee(confTarget - 1, nullptr, conservative);
    if (prevRate.GetFeePerK() > 0 && llround(median) > prevRate.GetFeePerK()) {
        median = static_cast<double>(prevRate.GetFeePerK());
    }
}

Performance: confTarget may be up to 1008 (one week — see fees.h:154). The recursion walks confTarget → 2, so a worst-case call to estimateSmartFee(1008) recurses ~1006 times, each doing the full estimator computation (3× estimateCombinedFee + 1× estimateConservativeFee). Asymptotic cost goes from O(M) to O(N · M). For estimatesmartfee RPC with high targets this is observable latency.

Upstream comparison: Bitcoin Core's src/policy/fees/block_policy_estimator.cpp::estimateSmartFee (bitcoin/bitcoin@master) does not have this recursive enforcement. Instead, upstream relies on max(target/2 est, target est, 2*target est) plus estimateCombinedFee(...,true,...) which checks shorter time horizons inline. Crucially, upstream explicitly documents this as a known limitation:

* Note: In certain rare edge cases, monotonically increasing estimates may
* not be guaranteed. Specifically, given two targets N and M, where M > N,
* if a sub-estimate for target N fails to return a valid fee rate, while
* target M has valid fee rate for that sub-estimate, target M may result
* in a higher fee rate estimate than target N.
*
* See: https://github.com/bitcoin/bitcoin/issues/11800#issuecomment-349697807

So run 759's monotonicity violation is the same rare edge case upstream consciously chose not to fix in code. Adding a divergent fix here increases the maintenance burden of future upstream merges and pays O(N²) cost for what upstream treats as expected behavior.

Alternative — match upstream test instead: Tapyrus's check_estimates (test/functional/feature_fee_estimation.py:148) initialises last_feerate = float(max(fees_seen)), while upstream's check_smart_estimates uses:

mempoolMinFee = node.getmempoolinfo()["mempoolminfee"]
minRelaytxFee = node.getmempoolinfo()["minrelaytxfee"]
feerate_ceiling = max(max(fees_seen), float(mempoolMinFee), float(minRelaytxFee))
last_feerate = feerate_ceiling

This aligns Tapyrus with upstream's tolerance for the documented edge case without paying O(N²) per RPC call. (It does not eliminate flakiness for iterations 2+, but neither does the recursive enforcement when an intermediate target legitimately returns a higher value — the recursion only clips the current estimate.)

Suggested change: drop the fees.cpp recursion, mirror upstream's check_smart_estimates ceiling logic in the Python test. Or, if code-level enforcement is genuinely desired, implement it via memoisation / iteration rather than recursion.

B2. `JSONRPCException` catch in `stop_node` is broader than the documented case

except (subprocess.CalledProcessError, JSONRPCException) as e:
    self.log.debug("Stop command failed, node may already be stopping: %s", e)

The PR description says this handles "HTTP 503", but JSONRPCException matches any RPC error — authentication failures, parameter errors, server errors, etc. — and silently downgrades all of them to a debug-level log. A real shutdown bug that surfaces via a non-503 error would be masked.

Suggested: narrow the catch by inspecting e.error.get('code') for -342 (the specific 503 case observed in runs 789/790), or check '503' in str(e). Re-raise otherwise.

🔍 Minor observations (not blocking)

O1. `lint-lock-ordering.py` function-body extractor is naive

iter_function_bodies treats every top-level {...} block as a single "function body." In C++ source, namespace foo { ... } and class Foo { ... }; are top-level brace blocks, so AssertLockHeld in one method gets propagated to LOCK calls in other methods of the same namespace/class.

Reproduction:

namespace foo {
    void fn1() { AssertLockHeld(cs_a); }   // fn1 only
    void fn2() { LOCK(cs_b); }              // no precondition
}

Script reports cs_a → cs_b (false ordering).

Current impact: zero, because the inferred orderings don't trigger any FORBIDDEN_PAIRS entries on current master. But this is fragile — a future code change could happen to produce a false-positive violation that blocks CI for no real reason.

Fix (future): use a proper parser (libclang, tree-sitter) or restrict AssertLockHeld scope to the innermost enclosing function (track function start via signature regex).

O2. `lint-lock-ordering.py` doesn't strip string literals

const char* msg = "}";

would corrupt the brace depth counter. Rare in practice, but a clean implementation should tokenise.

O3. `lint-lock-ordering.py` TRY_LOCK handling is inconsistent across passes

Pass 2 (AssertLockHeld + LOCK) records TRY_LOCK as an acquisition edge (line 225–230), but Pass 3 (nested-brace) excludes TRY_LOCK from depth_locks (line 257). When TRY_LOCK succeeds, the held lock at outer depth is still followed by TRY_LOCK acquisition, which is a real ordering. Pass 3 should record it too.

O4. `CannotSendRequest` log level downgraded silently

Change from log.exception (full stack trace + ERROR level) to log.debug (silent unless -loglevel=debug). The comment justifies it ("already stopping"), but this also silences genuine CannotSendRequest bugs (e.g. socket exhaustion, race conditions). Consider log.info as a middle ground.

Recommended actions

Required before merge:

B1: drop fees.cpp recursion; mirror upstream's feerate_ceiling pattern in check_estimates instead (or document why Tapyrus deliberately diverges from upstream)
B2: narrow JSONRPCException catch to only the 503 case (e.error.get('code') == -342)

Recommended:
3. O1: at least add a comment to iter_function_bodies flagging the namespace/class FP risk so a future maintainer knows the boundary
4. O3: align TRY_LOCK handling between Pass 2 and Pass 3

Defer: O2 (string literal handling), O4 (log level choice)

Verification: lint script tested against current master (clean) and against 00bc440b9~1-state of net_processing.cpp (correctly reports the cs_inventory → cs_main violation at line 1394). Upstream block_policy_estimator.cpp and feature_fee_estimation.py fetched from bitcoin/bitcoin@master and compared line-by-line. CI failure logs downloaded via gh run view --log-failed --job <id> for runs 759, 760, 787, 790.

- fees.cpp: a longer confirmation target must never yield a higher fee estimate than a shorter one - test_node.py stop_node: catch ConnectionRefusedError and JSONRPCException (HTTP 503) so a mid-shutdown node does not fail the test - test_node.py assert_start_raises_init_error: kill() and wait() the process on timeout/exception so it releases the data-directory lock before the next start attempt - util_time_GetTime: replace BOOST_CHECK_EQUAL with a <=1-second window check to tolerate second boundaries between sequential clock reads - lint-all.sh: add lint-lock-ordering.py to the lint runner Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Naviabheeman force-pushed the FixDailyTest branch from ca73e60 to 703c6d5 Compare June 1, 2026 12:50

azuchi requested changes Jun 2, 2026

View reviewed changes

Naviabheeman and others added 2 commits June 2, 2026 22:16

Add lint-lock-ordering.py lint script

de65433

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Naviabheeman force-pushed the FixDailyTest branch 2 times, most recently from 50ade58 to de65433 Compare June 3, 2026 10:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix intermittent CI failures in daily test runs#415

Fix intermittent CI failures in daily test runs#415
Naviabheeman wants to merge 2 commits into
chaintope:masterfrom
Naviabheeman:FixDailyTest

Naviabheeman commented Jun 1, 2026

Uh oh!

azuchi left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Naviabheeman commented Jun 1, 2026

Uh oh!

azuchi left a comment

Choose a reason for hiding this comment

Review of PR #415 — Fix intermittent CI failures in daily test runs

✅ Strongly endorsed

test_node.py shutdown robustness — directly addresses 4 of 5 failures

util_tests.cpp::util_time_GetTime — clean fix for a real race

lint-lock-ordering.py — high-value addition

🚨 Concerns — must address before merge

B1. fees.cpp monotonicity enforcement diverges from upstream and is O(N²)

B2. JSONRPCException catch in stop_node is broader than the documented case

🔍 Minor observations (not blocking)

O1. lint-lock-ordering.py function-body extractor is naive

O2. lint-lock-ordering.py doesn't strip string literals

O3. lint-lock-ordering.py TRY_LOCK handling is inconsistent across passes

O4. CannotSendRequest log level downgraded silently

Recommended actions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`test_node.py` shutdown robustness — directly addresses 4 of 5 failures

`util_tests.cpp::util_time_GetTime` — clean fix for a real race

`lint-lock-ordering.py` — high-value addition

B1. `fees.cpp` monotonicity enforcement diverges from upstream and is O(N²)

B2. `JSONRPCException` catch in `stop_node` is broader than the documented case

O1. `lint-lock-ordering.py` function-body extractor is naive

O2. `lint-lock-ordering.py` doesn't strip string literals

O3. `lint-lock-ordering.py` TRY_LOCK handling is inconsistent across passes

O4. `CannotSendRequest` log level downgraded silently