Skip to content

oracle: fix consensus deadlock from stale attestation timestamps#383

Merged
DigiSwarm merged 1 commit intoDigiByte-Core:feature/digidollar-v1from
JohnnyLawDGB:fix/oracle-consensus-deadlock
Feb 27, 2026
Merged

oracle: fix consensus deadlock from stale attestation timestamps#383
DigiSwarm merged 1 commit intoDigiByte-Core:feature/digidollar-v1from
JohnnyLawDGB:fix/oracle-consensus-deadlock

Conversation

@JohnnyLawDGB
Copy link

Summary

Fixes the oracle attestation deadlock where consensus gets permanently stuck at 0 despite all oracles reporting valid prices. Observed on testnet19 — 8/9 oracles reporting ~$0.00445 but consensus_price_micro_usd: 0 and last_bundle_height: 0 for 19+ hours.

Root Cause

When a consensus round stalls, all oracles create attestations with the same frozen (price, timestamp) tuple. The Phase2 hash — H(oracle_id, price, timestamp) — is identical each cycle, so the duplicate filter in AddOracleMessage() permanently rejects it. Since BroadcastMessage() calls AddOracleMessage() before pushing to P2P, the message never reaches the network. All oracle nodes enter this state simultaneously with no automatic recovery.

The deadlock chain:

  1. pending_messages contains entries with stale timestamp T
  2. ComputeConsensusValues() returns consensus_timestamp = T (median of stale entries)
  3. Oracle creates attestation with (price, T) → Phase2 hash H
  4. AddOracleMessage() finds H in seen_message_hashes → rejected as duplicate
  5. BroadcastMessage() returns false → message never reaches P2P
  6. Other oracles in the same state → no fresh data arrives → goto 1

Three-Part Fix

  1. Stale consensus detection (src/oracle/node.cpp): When consensus timestamp is >5 minutes old, BroadcastCurrentPrice() clears stale state and falls through to individual price broadcast with GetTime(). Fresh timestamp → different Phase2 hash → passes duplicate filter → propagates to network → new consensus forms.

  2. Periodic seen-hash cleanup (src/oracle/bundle_manager.cpp): Clear seen_message_hashes every 300 seconds in AddOracleMessage(). Safety net for automatic recovery. The pending_messages map (keyed by oracle_id, accepting only newer timestamps) provides authoritative dedup.

  3. State reset on stoporacle (src/rpc/digidollar.cpp): stoporacle now calls ClearPendingMessages() to reset the duplicate filter. Gives operators a manual recovery path.

Relationship to RC22 pending_messages.clear() Fix

The RC22 fix (f7a4b1d) removed pending_messages.clear() from AddOracleBundleToBlock() to prevent messages being drained on every CreateNewBlock() call. This fix addresses a different but related deadlock: messages survive template creation but get permanently trapped in the duplicate filter when the consensus timestamp freezes.

Test plan

  • Verify oracle recovers from stale consensus within 5 minutes (automatic)
  • Verify stoporacle/startoracle cycle clears state and allows fresh consensus
  • Verify normal oracle operation (fresh consensus) is unaffected by changes
  • Verify periodic seen-hash cleanup doesn't cause duplicate message processing storms
  • Run existing oracle test suite (oracle_bundle_manager_tests, oracle_phase2_tests)

🤖 Generated with Claude Code

When oracle consensus stalls (e.g., from rapid block production or network
partition), all oracles create attestations with the same frozen
(price, timestamp) tuple. The Phase2 hash—computed from (oracle_id, price,
timestamp)—is identical each cycle, so the duplicate filter in
AddOracleMessage() permanently rejects it. Since BroadcastMessage() calls
AddOracleMessage() before pushing to P2P, the message never reaches the
network. All oracle nodes enter this state simultaneously, creating a
network-wide deadlock with no automatic recovery.

Three-part fix:

1. Stale consensus detection (node.cpp): When the consensus timestamp is
   >5 minutes old, BroadcastCurrentPrice() clears stale pending state and
   falls through to individual price broadcast with GetTime() as timestamp.
   The fresh timestamp produces a different Phase2 hash, breaking the
   duplicate filter collision and allowing the message to propagate.

2. Periodic seen-hash cleanup (bundle_manager.cpp): Clear
   seen_message_hashes every 300 seconds in AddOracleMessage(). This
   provides automatic recovery even without the stale consensus detection.
   The pending_messages map (keyed by oracle_id, accepting only newer
   timestamps) provides authoritative dedup; the seen hash set is a
   best-effort P2P optimization only.

3. State reset on stoporacle (digidollar.cpp): stoporacle now calls
   ClearPendingMessages() to reset seen_message_hashes,
   pending_messages, and pending_attestations. Gives operators a manual
   recovery path via stoporacle/startoracle cycle.

Fixes the oracle attestation stuck-at-4/5 bug observed on testnet19 where
8 oracles reported valid prices but consensus_price remained 0 for 19+
hours until an operator manually restarted their node.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@DigiSwarm
Copy link

Hey Johnny, thank you so much for digging into this and putting in the time to track down the root cause. The deadlock analysis is excellent — the frozen timestamp → identical Phase2 hash → permanent duplicate filter rejection chain is exactly what we were seeing on testnet19 and you nailed it.

The three-part oracle fix (stale consensus detection, periodic seen-hash cleanup, and state reset on stoporacle) is well thought out and I want to get this merged.

I ran the full test suite locally — 1,969 C++ unit tests pass and all 311 functional tests pass. However, both CI checks (macOS and Ubuntu) are failing on the PR, and I noticed the branch was forked from before my latest RC22 RPC display fix commit (272d44092b), which means a few of those RC22 fixes got inadvertently reverted in digidollar.cpp and the digidollar_rpc_display_bugs.py test file was removed.

No worries at all — I'm going to cherry-pick your oracle fix commit directly onto our current branch tip so everything stays clean. Your authorship will be preserved in the commit. This way we keep both the oracle deadlock fix and the RC22 RPC corrections in one clean history.

Thank you again for the incredible contribution. This kind of deep debugging work is exactly what we need as we push toward mainnet. The whole testnet19 testing effort from everyone has been invaluable. 🚀💎

@DigiSwarm DigiSwarm merged commit 6963ed5 into DigiByte-Core:feature/digidollar-v1 Feb 27, 2026
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants