Skip to content

Tune NEW_ROUND recovery debounce and GET_ROUND polling cadence to reduce duplicate template churn#704

Merged
NamecoinGithub merged 2 commits into
MINERfrom
copilot/fix-dupe-block-data-traffic
May 24, 2026
Merged

Tune NEW_ROUND recovery debounce and GET_ROUND polling cadence to reduce duplicate template churn#704
NamecoinGithub merged 2 commits into
MINERfrom
copilot/fix-dupe-block-data-traffic

Conversation

Copilot AI commented May 24, 2026

Copy link
Copy Markdown

After the node-side latency improvements, miner-side timing was too aggressive: the 2s NEW_ROUND recovery defer frequently raced real PUSH/BLOCK_DATA delivery, and the 20s GET_ROUND cadence caused avoidable template churn within the ~50s Prime block window. This PR is constant-tuning only, with matching log-text updates.

  • NEW_ROUND recovery defer: 2s → 5s

    • Increased the deferred recovery window used when NEW_ROUND arrives with no valid template.
    • Updated related recovery log strings/comments to reflect 5s behavior.
    • Keeps existing scheduling/cancel logic intact; only timing constants/text changed.
  • GET_ROUND poll interval: 20000ms → 30000ms

    • Increased fixed poll interval constants from 20s to 30s.
    • Existing poll-reset log already formats from the runtime constant, so it now emits 30000ms automatically.
    • No changes to polling control flow or template invalidation behavior.
  • Debounce test expectations aligned to 5s

    • Updated existing new_round_recovery_debounce_test timing assumptions/messages from 2s to 5s.
    • No new tests or structural test changes.
// solo.hpp
static constexpr uint32_t POLL_INTERVAL_MIN_MS = 30000;
static constexpr uint32_t POLL_INTERVAL_MAX_MS = 30000;
static constexpr auto kRecoveryDebounceWindow = std::chrono::seconds(5);

// solo.cpp log text
"[Solo GET_ROUND] 📭 NEW_ROUND received but no template — deferring recovery GET_BLOCK for 5s"
Original prompt

Context

After the LLL-TAO node-side improvements landed (PRs #598#602), PUSH → BLOCK_DATA round-trip latency on localhost is now reliably ~2 s with bursts staying within ~2 s as well. Field observation shows the miner has never observed a genuinely missed PUSH — every NEW_ROUND/GET_ROUND-triggered GET_BLOCK has been redundant work that races a real PUSH.

The current timings produce two visible problems on the MINER branch:

Problem 1 — 2-second deferral is too tight

When a NEW_ROUND is received and no template is available, the miner defers a recovery GET_BLOCK for 2 seconds and then fires it. The intent is "if PUSH arrives within 2 s, cancel the recovery." In practice, with PUSH→BLOCK_DATA cycles regularly landing at 1.7–2.5 s on real nodes, the recovery GET_BLOCK frequently fires moments before the PUSH-driven BLOCK_DATA arrives, producing duplicate BLOCK_DATA traffic (saved only by the node-side 10-second duplicate-BLOCK_DATA suppression window).

Sample evidence (log timestamps from a real run):

20:37:09.326  NEW_ROUND received but no template — deferring recovery GET_BLOCK for 2s
20:37:09.367  [PRIME_BLOCK_AVAILABLE] Cancelling pending NEW_ROUND recovery GET_BLOCK (deferred 40ms ago — push won the race)

That case was fine (40 ms into the window). But operator reports show many cycles where the deferral elapses and the GET_BLOCK fires before PUSH lands, producing redundant BLOCK_DATA seconds later.

Problem 2 — 20-second GET_ROUND poll cadence is too aggressive

The Prime channel block cadence is ~50 s. A 20 s GET_ROUND poll means 2–3 polls per block window. Each poll, even when it doesn't schedule a recovery GET_BLOCK, causes the TemplateInterface to invalidate the current template (⚠ Channel 1 height ... template stale) which momentarily disrupts the worker mining loop. The operator's empirical finding: this happens "OFTEN right after a NEW PUSH" — the 20 s poll keeps landing inside the window where it does nothing useful but adds churn.

Required changes — bump only two constants

This PR is constant-tuning only. No structural changes, no new logic, no new tests.

Change 1 — GET_BLOCK recovery deferral: 2000 ms → 5000 ms

Find the constant or literal that controls the "deferring recovery GET_BLOCK for 2s" behavior. Likely candidates (search to confirm):

  • A kNewRoundRecoveryDeferMs / RECOVERY_DEFER_MS / kDeferredGetBlockMs style constant.
  • A literal 2000 or std::chrono::milliseconds(2000) near the log string "deferring recovery GET_BLOCK for".
  • A std::chrono::seconds(2) near the same site.

Update the value to 5000 ms (5 seconds) and update the corresponding log string "deferring recovery GET_BLOCK for 2s" to "deferring recovery GET_BLOCK for 5s".

Rationale: 5 s is wide enough to absorb PUSH→BLOCK_DATA cycles up to ~4.5 s (covering localhost worst-case and slow-CPU production nodes) without being so wide that a genuinely missed PUSH delays recovery noticeably.

Change 2 — GET_ROUND polling cadence: 20000 ms → 30000 ms

Find the constant or literal that controls the GET_ROUND poll cadence. Likely candidates (search to confirm):

  • A kGetRoundPollIntervalMs / POLL_INTERVAL_MS / kNewRoundPollMs style constant.
  • A literal 20000 or std::chrono::milliseconds(20000) near the log string "poll interval reset to 20000ms" or "poll interval reset to".

Update the value to 30000 ms (30 seconds) and update any log strings that hardcode "20000ms" to "30000ms". Search for and update both:

  • "poll interval reset to 20000ms""poll interval reset to 30000ms" (if found as a hardcoded string)
  • Any other log line that prints the cadence value (prefer using the constant in the format string rather than a literal so future bumps don't need string edits).

Rationale: 30 s gives ~1.5 polls per Prime block window instead of 2–3, cutting redundant work meaningfully without losing liveness detection. Operator explicitly rejected 60 s as too lenient.

Acceptance criteria

  • Both constants live in one location each — search for any duplicates and update them all (a stale duplicate is a regression risk).
  • The two log strings reflect the new values.
  • git grep -nE '\b2000\b|\b20000\b' (or the equivalent build-tree search) does not show any remaining references to the old values in the recovery/poll code paths (other unrelated 2000/20000 literals in the codebase are fine and out of scope).
  • Build succeeds with whatever the standard build invocation is for this repo (CMake / make / etc. — check the MINER branch's build docs).
  • No new tests required — this is a config tuning PR.

Out of scope (DO NOT do these in this PR)

  1. Do not decouple GET_ROUND from GET_BLOCK scheduling entirely. The operator wants this as a future Tier-3 change after running with the new cadences for a few days.
  2. Do not investigate whether the cancellation is actually wired correctly (the operator's report that...

This pull request was created from Copilot chat.

Copilot AI changed the title [WIP] Fix duplicate BLOCK_DATA traffic in miner branch Tune NEW_ROUND recovery debounce and GET_ROUND polling cadence to reduce duplicate template churn May 24, 2026
Copilot AI requested a review from NamecoinGithub May 24, 2026 23:00
@NamecoinGithub NamecoinGithub marked this pull request as ready for review May 24, 2026 23:15
Copilot AI review requested due to automatic review settings May 24, 2026 23:15
@NamecoinGithub NamecoinGithub merged commit 3a3da68 into MINER May 24, 2026
1 check failed
Copilot AI removed the request for review from Copilot May 24, 2026 23:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants