Skip to content

Pr2 tmm basic#10706

Open
NomDeTom wants to merge 14 commits into
meshtastic:developfrom
NomDeTom:pr2-tmm-basic
Open

Pr2 tmm basic#10706
NomDeTom wants to merge 14 commits into
meshtastic:developfrom
NomDeTom:pr2-tmm-basic

Conversation

@NomDeTom

@NomDeTom NomDeTom commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Reworks the TrafficManagementModule cache layer and adds a routing-hint overflow store.
Stacks on #10705 — review/merge after it; the shared NodeDB/WarmNodeStore files belong to #10705.
Overall packet-policing behaviour is unchanged from upstream, but is now enabled with very basic deduplication - see below.

What it does

  • Flatten the cache: replaces the cuckoo-hashed unified cache and bucketed PSRAM NodeInfo index with plain flat arrays + linear scan (negligible at LoRa rates, far less complexity).
  • Compact 10-byte entries with free-running tick timestamps: per-node state (position fingerprint, rate/unknown counters) packs into 5 bytes; ages are tracked with free-running modular tick counters (position 8-bit @360 s/tick; rate/unknown 4-bit nibbles), so there's no epoch anchor or periodic flush to maintain.
  • Next-hop overflow cache: setNextHop/getNextHopHint store a confirmed last-byte relay, written only from NextHopRouter's ACK-confirmed decision (and mirrored from TraceRoute). getNextHop falls back to it when the hot NodeDB has no hint, so DMs/relays to long-tail nodes keep routing after the node ages out.
  • Persistence: warm-starts the next-hop cache from persisted NodeInfoLite hints on the first maintenance pass; next-hop entries survive the sweep and aren't clobbered by a stale preload.
  • Enabled by default (HAS_TRAFFIC_MANAGEMENT) with position-dedup on: 19-bit grid (~90 m / ±45 m) and an 11 h min-interval between identical positions; rate-limiting left off. Position dedup only runs on well-known channels (new Channels::isWellKnownChannel gate).

Tests: test_traffic_management — upstream policing suite plus next-hop round-trip / persistence cases, and the position-dedup tests now actually exercise the dedup path.

🤝 Attestations

  • I have tested that my proposed changes behave as described.
  • I have tested that my proposed changes do not cause any obvious regressions on the following devices:
    • Heltec (Lora32) V3
    • LilyGo T-Deck
    • LilyGo T-Beam
    • RAK WisBlock 4631
    • Seeed Studio T-1000E tracker card
    • Other (please specify below)

@NomDeTom NomDeTom requested a review from thebentern June 12, 2026 22:29
@NomDeTom NomDeTom added 2.8 needs-review Needs human review ai-generated Possible AI-generated low-quality content labels Jun 12, 2026
@github-actions github-actions Bot added the enhancement New feature or request label Jun 12, 2026
@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

⚡ Try this PR in the Web Flasher

Flash this PR in the Web Flasher

firmware commit boards expires

Warning

This is an automated, unreviewed CI test build. Back up your device configuration
before flashing, and only flash devices you are able to recover.

Supported boards built by this PR (24)
Device Board Platform
Crowpanel Adv 3.5 TFT elecrow-adv-35-tft esp32-s3
Heltec HT62 heltec-ht62-esp32c3-sx1262 esp32-c3
Heltec Mesh Node 096 heltec-mesh-node-t096 nrf52840
Heltec Mesh Node T1 heltec-mesh-node-t1 nrf52840
Heltec Mesh Node T114 heltec-mesh-node-t114 nrf52840
Heltec V3 heltec-v3 esp32-s3
Heltec V4 heltec-v4 esp32-s3
Raspberry Pi Pico pico rp2040
Raspberry Pi Pico W picow rp2040
RAK WisMesh Tag rak_wismeshtag nrf52840
RAK WisBlock 11200 rak11200 esp32
RAK WisBlock 11310 rak11310 rp2040
RAK3312 rak3312 esp32-s3
RAK WisBlock 4631 rak4631 nrf52840
Seeed Wio Tracker L1 seeed_wio_tracker_L1 nrf52840
Seeed Xiao NRF52840 Kit seeed_xiao_nrf52840_kit nrf52840
Seeed Xiao ESP32-S3 seeed-xiao-s3 esp32-s3
Station G2 station-g2 esp32-s3
Station G3 station-g3 esp32-s3
LILYGO T-Deck t-deck-tft esp32-s3
LILYGO T-Echo t-echo nrf52840
LILYGO T-Echo Plus t-echo-plus nrf52840
LilyGo T3-C6 tlora-c6 esp32-c6
Seeed SenseCAP T1000-E tracker-t1000-e nrf52840

Build artifacts expire on 2026-07-14. Updated for ab56f45.

@NomDeTom NomDeTom removed the enhancement New feature or request label Jun 12, 2026
@NomDeTom NomDeTom added the enhancement New feature or request label Jun 13, 2026
NomDeTom and others added 5 commits June 13, 2026 16:04
…ty retention)

Introduces a tiered NodeDB so the device retains identity (public key,
last_heard) for far more nodes than fit in the full-record hot store,
without growing heap or the persisted nodes.proto unboundedly.

- Hot store: full NodeInfoLite, MAX_NUM_NODES (120 on nRF52).
- Satellite maps: position/telemetry/environment/status capped at
  MAX_SATELLITE_NODES (40 freshest); eviction via enforceSatelliteCaps /
  evictSatelliteOverCap.
- Warm tier (WarmNodeStore): 40 B {num,last_heard,public_key} records for
  evicted nodes so DMs to/from long-tail nodes keep encrypting/decrypting.
  Persisted to /prefs/warm.dat, or on nRF52840 a dedicated 12 KB raw-flash
  record-ring below LittleFS (3x4 KB pages; see linker scripts + the
  nrf52_warm_region.py post-link guard).

NodeDB::getOrCreateMeshNode now demotes evicted nodes into the warm tier and
re-admits them (restoring key/last_heard). Router PKI decrypt/encode resolve
the peer key via NodeDB::copyPublicKey (hot store, then warm tier).

NodeInfoLite gains snr_q4 (sint32, Q4-encoded dB); the float snr is zeroed on
disk. NodeInfoLite grows 105 -> 112 B; backup 2432 -> 2468 B.

Note: the snr_q4 .proto change still needs to land in the protobufs submodule
(generated header is updated here; submodule pointer left at upstream).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Hardens how ignored/favourite nodes are received over admin and retained,
closing paths where a block could be lost or accidentally cleared.

- Blocking keeps the node's public key (admin set_ignored_node and
  addFromContact no longer zero it / drop the warm-tier key), so a blocked
  peer stays a verifiable identity.
- set_ignored_node creates the node if absent, so a block by node ID sticks
  even for a node we've never heard from (e.g. pushed by a remote admin) with
  no NodeInfo or key.
- Eviction protection (favourite/ignored/manually-verified) now also applies to
  the load-time hot-store migration and is never undone by cleanupMeshDB, which
  previously purged ignored nodes that lacked user info.
- The hot-store migration leaves our own node (index 0) in place and prefers to
  demote non-protected nodes, like the runtime eviction scan.

Caps the protected set (favourite + ignored + verified) at MAX_NUM_NODES-2 via
NodeDB::setProtectedFlag(), so at least two evictable slots always remain and
getOrCreateMeshNode can always make room — replacing the previous unconditional
append that could run off the end of the node vector when every node was
protected. A locally-set favourite/ignore that hits the cap reports back to the
phone via a ClientNotification.

Adds test_nodedb_blocked covering the migration, favourite/ignored eviction
protection, ignored-survives-cleanup, and the protected-node cap. The
maintenance methods stay private in production; the test reaches them through a
PIO_UNIT_TESTING-guarded friend shim.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

# Conflicts:
#	src/mesh/NodeDB.h
Zero-initialise `stranded[]` and `seqs[]/order[]` VLAs so cppcheck can
verify there are no unguarded reads of uninitialised memory (the guards
exist but are not visible to static analysis). Mark two local pointers
`const` where the pointed-to entry is never mutated after assignment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Firmware Size Report

22 targets | vs develop: 22 increased, net +251,660 (+245.8 KB)

Target Size vs develop
rak11200 1,846,144 📈 +15,472 (+15.1 KB)
rak3312 2,257,184 📈 +15,248 (+14.9 KB)
seeed-xiao-s3 2,261,088 📈 +15,152 (+14.8 KB)
t-deck-tft 3,796,240 📈 +14,736 (+14.4 KB)
station-g3 2,251,024 📈 +14,208 (+13.9 KB)
Show 17 more target(s)
Target Size vs develop
heltec-vision-master-e213-inkhud 2,209,088 📈 +14,032 (+13.7 KB)
elecrow-adv-35-tft 3,401,616 📈 +14,000 (+13.7 KB)
t-eth-elite 2,475,280 📈 +13,712 (+13.4 KB)
heltec-v3 2,248,576 📈 +13,648 (+13.3 KB)
heltec-ht62-esp32c3-sx1262 2,119,632 📈 +12,832 (+12.5 KB)
tlora-c6 2,353,184 📈 +12,768 (+12.5 KB)
picow 1,232,880 📈 +12,132 (+11.8 KB)
pico2w 1,208,912 📈 +11,632 (+11.4 KB)
rak11310 794,376 📈 +11,288 (+11.0 KB)
pico 771,776 📈 +11,280 (+11.0 KB)
seeed_xiao_rp2040 769,976 📈 +11,280 (+11.0 KB)
pico2 759,184 📈 +10,792 (+10.5 KB)
seeed_xiao_rp2350 757,328 📈 +10,784 (+10.5 KB)
heltec-v4 2,260,480 📈 +5,008 (+4.9 KB)
station-g2 2,251,040 📈 +4,640 (+4.5 KB)
rak3172 183,892 📈 +3,592 (+3.5 KB)
wio-e5 236,100 📈 +3,424 (+3.3 KB)

Updated for 02e0ed4

@jp-bennett jp-bennett requested a review from Copilot June 13, 2026 18:35

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Reworks the TrafficManagementModule cache implementation (linear-scan unified cache + epoch rebase) and introduces persistence-backed “warm” long-tail node retention plus a next-hop overflow hint cache to keep routing/PKI working after NodeDB eviction.

Changes:

  • Replace TMM’s cuckoo-hash cache/indexing with flat arrays + linear scan, add sliding epoch rebase, and add next-hop overflow hint storage/preload.
  • Add WarmNodeStore (file-backed generally, raw-flash ring on nRF52840) and integrate it into NodeDB eviction/migration + PKI DM key lookup.
  • Add/adjust unit tests and nRF52 linker/build guards to reserve the warm-store flash region.

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
variants/nrf52840/nrf52840.ini Pins nRF52840 base env to a warm-store-safe linker script for S140 v6.
variants/nrf52840/nrf52.ini Adds a post-link build guard to prevent the image overlapping the warm-store flash region.
test/test_warm_store/test_main.cpp New unit tests for WarmNodeStore admission/eviction/take() and persistence.
test/test_traffic_management/test_main.cpp Extends TMM tests for congestion-gated hop exhaustion and next-hop overflow cache behavior.
test/test_nodedb_blocked/test_main.cpp New tests for NodeDB migration + favorite/ignored retention with warm-tier demotion.
src/platform/nrf52/nrf52840_s140_v7.ld Shrinks FLASH region to reserve 0xEA000–0xED000 for WarmNodeStore ring.
src/platform/nrf52/nrf52840_s140_v6.ld Adds a new S140 v6 linker script variant with the same warm-store reservation.
src/modules/TrafficManagementModule.h Updates docs and adds APIs/fields for next-hop cache + epoch rebase.
src/modules/TrafficManagementModule.cpp Implements flat caches, next-hop overflow cache, preload, and epoch rebase logic.
src/modules/TraceRouteModule.cpp Mirrors traceroute-derived next-hop info into TMM overflow cache.
src/modules/AdminModule.cpp Routes favorite/ignore changes through NodeDB’s protected-node cap enforcement.
src/mesh/mesh-pb-constants.h Adjusts MAX_NUM_NODES defaults, adds MAX_SATELLITE_NODES/WARM_NODE_COUNT, enables HAS_TRAFFIC_MANAGEMENT by default.
src/mesh/generated/meshtastic/deviceonly.pb.h Updates generated max size constant due to proto size changes.
src/mesh/WarmNodeStore.h Introduces WarmNodeStore API + persistence design and nRF52840 ring layout.
src/mesh/WarmNodeStore.cpp Implements WarmNodeStore memory/persistence (raw-flash ring or warm.dat snapshot).
src/mesh/Router.cpp Uses NodeDB::copyPublicKey() so PKI DMs can decrypt/encrypt for warm-tier nodes.
src/mesh/NodeDB.h Adds warm tier, protected-node cap API, satellite caps, and public key copy helper.
src/mesh/NodeDB.cpp Implements migration demotion to warm tier, satellite caps, protected-node cap, and warm-tier persistence.
src/mesh/NextHopRouter.cpp Stores ACK-confirmed next hops in both NodeDB and TMM, and consults TMM as fallback.
src/mesh/Default.h Changes default position-dedup grid precision (24 → 19 bits).
src/graphics/draw/MenuHandler.cpp Surfaces protected-node-cap failures when favoriting/ignoring via UI.
extra_scripts/nrf52_warm_region.py New post-link guard to fail builds that overlap reserved warm-store flash.

Comment thread src/modules/TrafficManagementModule.cpp Outdated
Comment thread test/test_traffic_management/test_main.cpp Outdated
Comment thread src/mesh/NodeDB.cpp
Comment thread src/mesh/NodeDB.cpp Outdated
Comment thread src/mesh/NodeDB.cpp
NomDeTom and others added 7 commits June 14, 2026 15:24
…store

Reworks the TrafficManagementModule cache layer (policing behaviour unchanged
from upstream) and adds a routing-hint overflow store:

- Flatten the ring: replace the cuckoo-hashed unified cache and the bucketed
  PSRAM NodeInfo index with plain flat arrays + linear scan (same idiom as
  WarmNodeStore). At LoRa packet rates an O(n) scan of the cache is negligible,
  and it removes a large amount of hashing/displacement complexity. The cache
  entry is 11 B; timestamps use a uniform +1 presence-offset so a 0 byte always
  means "empty" across every sub-store. Adds rebaseEpoch() so cached state
  survives the ~19 h relative-timestamp horizon instead of being flushed.

- Next-hop overflow cache: setNextHop/getNextHopHint store a confirmed last-byte
  relay for a destination, written only from NextHopRouter's ACK-confirmed
  decision (and mirrored from TraceRoute). NextHopRouter::getNextHop falls back
  to this cache when the hot NodeDB has no hint, so DMs/relays to long-tail
  nodes keep routing after the node ages out of NodeInfoLite.

- Persistence: preloadNextHopsFromNodeDB warm-starts the cache from persisted
  NodeInfoLite hints on first maintenance pass; next_hop entries are kept alive
  across the maintenance sweep (no TTL) and never clobbered by a stale preload.

All packet-policing logic (rate limit, position dedup, unknown-packet drop,
NodeInfo direct response, hop exhaustion) is the existing upstream behaviour,
untouched. HAS_TRAFFIC_MANAGEMENT defaults on so the module is compiled in. (see note).

Tests: upstream policing suite now actually runs (adds the MeshTypes.h include
that gates HAS_TRAFFIC_MANAGEMENT) plus 4 next-hop tests. Role-aware throttles,
politeness, precision clamp, port-interval and mesh-radius gating — and the
rate-limit >255 saturation fix — are deferred to the advanced-TMM branch.

Note: default dedup movement grid moves to ~91m, which also means 1.5km required to end up with the same signature position - coarser and therefore further than before.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`node` in preloadNextHopsFromNodeDB() is never written through — mark
it const to satisfy cppcheck's constVariablePointer check in CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Position dedup in TrafficManagementModule::handleReceived is gated on
channels.isWellKnownChannel(mp.channel). The test helper
installWellKnownPrimaryChannel() sets up channelFile/config.lora so that
gate is true, but it was defined and never called — so the dedup path was
never reached. test_tm_positionDedup_dropsDuplicateWithinWindow therefore
failed (duplicate forwarded -> CONTINUE instead of STOP), and
test_tm_positionDedup_allowsMovedPosition passed only vacuously.

Call installWellKnownPrimaryChannel() in both dedup tests so the dedup
path is genuinely exercised.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nt-0

Copilot review (PR meshtastic#10706):
- preloadNextHopsFromNodeDB() now returns bool; runOnce only latches
  nextHopPreloaded once the preload actually ran (retries if nodeDB wasn't
  ready), instead of skipping it forever.
- Remove the empty `#if HAS_VARIABLE_HOPS` blocks in the test.

Test correctness:
- Three more position-dedup tests were missing installWellKnownPrimaryChannel()
  (dropsDuplicate/allowsMoved were fixed earlier; allowsDuplicateAfterInterval,
  cacheFlush, priorRateState were not) — without the well-known-channel gate the
  dedup path never runs, so their STOP assertions failed.

Fake-time injection (no more real sleeps):
- Add TrafficManagementModule::s_testNowMs + nowMs(), mirroring HopScalingModule;
  route all TMM tick/time reads through nowMs(). Tests advance a virtual clock via
  s_testNowMs instead of testDelay() sleeping real 5-6 min across a tick — the
  suite drops from ~15 min to ~30 s. Production behaviour is unchanged (nowMs()
  inlines to millis()).

Fingerprint-0 fix:
- computePositionFingerprint() never returns 0 now (remap 0 -> 0xFF, mirroring
  getLastByteOfNodeNum), so a real position that hashes to 0 doesn't collide with
  the "no position seen" sentinel and its duplicates dedup correctly.

test_traffic_management: 34/34 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2.8 ai-generated Possible AI-generated low-quality content enhancement New feature or request needs-review Needs human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants