Tune NEW_ROUND recovery debounce and GET_ROUND polling cadence to reduce duplicate template churn#704
Merged
Conversation
Agent-Logs-Url: https://github.com/NamecoinGithub/NexusMiner/sessions/f7853b56-97a2-40ed-8323-ef0f50e2472a Co-authored-by: NamecoinGithub <130555019+NamecoinGithub@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Fix duplicate BLOCK_DATA traffic in miner branch
Tune NEW_ROUND recovery debounce and GET_ROUND polling cadence to reduce duplicate template churn
May 24, 2026
NamecoinGithub
approved these changes
May 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
After the node-side latency improvements, miner-side timing was too aggressive: the 2s NEW_ROUND recovery defer frequently raced real PUSH/BLOCK_DATA delivery, and the 20s GET_ROUND cadence caused avoidable template churn within the ~50s Prime block window. This PR is constant-tuning only, with matching log-text updates.
NEW_ROUND recovery defer: 2s → 5s
GET_ROUND poll interval: 20000ms → 30000ms
30000msautomatically.Debounce test expectations aligned to 5s
new_round_recovery_debounce_testtiming assumptions/messages from 2s to 5s.Original prompt
Context
After the LLL-TAO node-side improvements landed (PRs #598 → #602), PUSH → BLOCK_DATA round-trip latency on localhost is now reliably ~2 s with bursts staying within ~2 s as well. Field observation shows the miner has never observed a genuinely missed PUSH — every NEW_ROUND/GET_ROUND-triggered GET_BLOCK has been redundant work that races a real PUSH.
The current timings produce two visible problems on the MINER branch:
Problem 1 — 2-second deferral is too tight
When a
NEW_ROUNDis received and no template is available, the miner defers a recoveryGET_BLOCKfor 2 seconds and then fires it. The intent is "if PUSH arrives within 2 s, cancel the recovery." In practice, with PUSH→BLOCK_DATA cycles regularly landing at 1.7–2.5 s on real nodes, the recoveryGET_BLOCKfrequently fires moments before the PUSH-driven BLOCK_DATA arrives, producing duplicate BLOCK_DATA traffic (saved only by the node-side 10-second duplicate-BLOCK_DATA suppression window).Sample evidence (log timestamps from a real run):
That case was fine (40 ms into the window). But operator reports show many cycles where the deferral elapses and the
GET_BLOCKfires before PUSH lands, producing redundant BLOCK_DATA seconds later.Problem 2 — 20-second GET_ROUND poll cadence is too aggressive
The Prime channel block cadence is ~50 s. A 20 s GET_ROUND poll means 2–3 polls per block window. Each poll, even when it doesn't schedule a recovery GET_BLOCK, causes the TemplateInterface to invalidate the current template (
⚠ Channel 1 height ... template stale) which momentarily disrupts the worker mining loop. The operator's empirical finding: this happens "OFTEN right after a NEW PUSH" — the 20 s poll keeps landing inside the window where it does nothing useful but adds churn.Required changes — bump only two constants
This PR is constant-tuning only. No structural changes, no new logic, no new tests.
Change 1 — GET_BLOCK recovery deferral: 2000 ms → 5000 ms
Find the constant or literal that controls the "deferring recovery GET_BLOCK for 2s" behavior. Likely candidates (search to confirm):
kNewRoundRecoveryDeferMs/RECOVERY_DEFER_MS/kDeferredGetBlockMsstyle constant.2000orstd::chrono::milliseconds(2000)near the log string"deferring recovery GET_BLOCK for".std::chrono::seconds(2)near the same site.Update the value to 5000 ms (5 seconds) and update the corresponding log string
"deferring recovery GET_BLOCK for 2s"to"deferring recovery GET_BLOCK for 5s".Rationale: 5 s is wide enough to absorb PUSH→BLOCK_DATA cycles up to ~4.5 s (covering localhost worst-case and slow-CPU production nodes) without being so wide that a genuinely missed PUSH delays recovery noticeably.
Change 2 — GET_ROUND polling cadence: 20000 ms → 30000 ms
Find the constant or literal that controls the GET_ROUND poll cadence. Likely candidates (search to confirm):
kGetRoundPollIntervalMs/POLL_INTERVAL_MS/kNewRoundPollMsstyle constant.20000orstd::chrono::milliseconds(20000)near the log string"poll interval reset to 20000ms"or"poll interval reset to".Update the value to 30000 ms (30 seconds) and update any log strings that hardcode "20000ms" to "30000ms". Search for and update both:
"poll interval reset to 20000ms"→"poll interval reset to 30000ms"(if found as a hardcoded string)Rationale: 30 s gives ~1.5 polls per Prime block window instead of 2–3, cutting redundant work meaningfully without losing liveness detection. Operator explicitly rejected 60 s as too lenient.
Acceptance criteria
git grep -nE '\b2000\b|\b20000\b'(or the equivalent build-tree search) does not show any remaining references to the old values in the recovery/poll code paths (other unrelated 2000/20000 literals in the codebase are fine and out of scope).MINERbranch's build docs).Out of scope (DO NOT do these in this PR)
This pull request was created from Copilot chat.