Pr2 tmm basic#10706
Conversation
⚡ Try this PR in the Web FlasherWarning This is an automated, unreviewed CI test build. Back up your device configuration Supported boards built by this PR (24)
Build artifacts expire on 2026-07-14. Updated for |
…ty retention)
Introduces a tiered NodeDB so the device retains identity (public key,
last_heard) for far more nodes than fit in the full-record hot store,
without growing heap or the persisted nodes.proto unboundedly.
- Hot store: full NodeInfoLite, MAX_NUM_NODES (120 on nRF52).
- Satellite maps: position/telemetry/environment/status capped at
MAX_SATELLITE_NODES (40 freshest); eviction via enforceSatelliteCaps /
evictSatelliteOverCap.
- Warm tier (WarmNodeStore): 40 B {num,last_heard,public_key} records for
evicted nodes so DMs to/from long-tail nodes keep encrypting/decrypting.
Persisted to /prefs/warm.dat, or on nRF52840 a dedicated 12 KB raw-flash
record-ring below LittleFS (3x4 KB pages; see linker scripts + the
nrf52_warm_region.py post-link guard).
NodeDB::getOrCreateMeshNode now demotes evicted nodes into the warm tier and
re-admits them (restoring key/last_heard). Router PKI decrypt/encode resolve
the peer key via NodeDB::copyPublicKey (hot store, then warm tier).
NodeInfoLite gains snr_q4 (sint32, Q4-encoded dB); the float snr is zeroed on
disk. NodeInfoLite grows 105 -> 112 B; backup 2432 -> 2468 B.
Note: the snr_q4 .proto change still needs to land in the protobufs submodule
(generated header is updated here; submodule pointer left at upstream).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Hardens how ignored/favourite nodes are received over admin and retained, closing paths where a block could be lost or accidentally cleared. - Blocking keeps the node's public key (admin set_ignored_node and addFromContact no longer zero it / drop the warm-tier key), so a blocked peer stays a verifiable identity. - set_ignored_node creates the node if absent, so a block by node ID sticks even for a node we've never heard from (e.g. pushed by a remote admin) with no NodeInfo or key. - Eviction protection (favourite/ignored/manually-verified) now also applies to the load-time hot-store migration and is never undone by cleanupMeshDB, which previously purged ignored nodes that lacked user info. - The hot-store migration leaves our own node (index 0) in place and prefers to demote non-protected nodes, like the runtime eviction scan. Caps the protected set (favourite + ignored + verified) at MAX_NUM_NODES-2 via NodeDB::setProtectedFlag(), so at least two evictable slots always remain and getOrCreateMeshNode can always make room — replacing the previous unconditional append that could run off the end of the node vector when every node was protected. A locally-set favourite/ignore that hits the cap reports back to the phone via a ClientNotification. Adds test_nodedb_blocked covering the migration, favourite/ignored eviction protection, ignored-survives-cleanup, and the protected-node cap. The maintenance methods stay private in production; the test reaches them through a PIO_UNIT_TESTING-guarded friend shim. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> # Conflicts: # src/mesh/NodeDB.h
Zero-initialise `stranded[]` and `seqs[]/order[]` VLAs so cppcheck can verify there are no unguarded reads of uninitialised memory (the guards exist but are not visible to static analysis). Mark two local pointers `const` where the pointed-to entry is never mutated after assignment. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Firmware Size Report22 targets | vs
Show 17 more target(s)
Updated for 02e0ed4 |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Reworks the TrafficManagementModule cache implementation (linear-scan unified cache + epoch rebase) and introduces persistence-backed “warm” long-tail node retention plus a next-hop overflow hint cache to keep routing/PKI working after NodeDB eviction.
Changes:
- Replace TMM’s cuckoo-hash cache/indexing with flat arrays + linear scan, add sliding epoch rebase, and add next-hop overflow hint storage/preload.
- Add WarmNodeStore (file-backed generally, raw-flash ring on nRF52840) and integrate it into NodeDB eviction/migration + PKI DM key lookup.
- Add/adjust unit tests and nRF52 linker/build guards to reserve the warm-store flash region.
Reviewed changes
Copilot reviewed 21 out of 22 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| variants/nrf52840/nrf52840.ini | Pins nRF52840 base env to a warm-store-safe linker script for S140 v6. |
| variants/nrf52840/nrf52.ini | Adds a post-link build guard to prevent the image overlapping the warm-store flash region. |
| test/test_warm_store/test_main.cpp | New unit tests for WarmNodeStore admission/eviction/take() and persistence. |
| test/test_traffic_management/test_main.cpp | Extends TMM tests for congestion-gated hop exhaustion and next-hop overflow cache behavior. |
| test/test_nodedb_blocked/test_main.cpp | New tests for NodeDB migration + favorite/ignored retention with warm-tier demotion. |
| src/platform/nrf52/nrf52840_s140_v7.ld | Shrinks FLASH region to reserve 0xEA000–0xED000 for WarmNodeStore ring. |
| src/platform/nrf52/nrf52840_s140_v6.ld | Adds a new S140 v6 linker script variant with the same warm-store reservation. |
| src/modules/TrafficManagementModule.h | Updates docs and adds APIs/fields for next-hop cache + epoch rebase. |
| src/modules/TrafficManagementModule.cpp | Implements flat caches, next-hop overflow cache, preload, and epoch rebase logic. |
| src/modules/TraceRouteModule.cpp | Mirrors traceroute-derived next-hop info into TMM overflow cache. |
| src/modules/AdminModule.cpp | Routes favorite/ignore changes through NodeDB’s protected-node cap enforcement. |
| src/mesh/mesh-pb-constants.h | Adjusts MAX_NUM_NODES defaults, adds MAX_SATELLITE_NODES/WARM_NODE_COUNT, enables HAS_TRAFFIC_MANAGEMENT by default. |
| src/mesh/generated/meshtastic/deviceonly.pb.h | Updates generated max size constant due to proto size changes. |
| src/mesh/WarmNodeStore.h | Introduces WarmNodeStore API + persistence design and nRF52840 ring layout. |
| src/mesh/WarmNodeStore.cpp | Implements WarmNodeStore memory/persistence (raw-flash ring or warm.dat snapshot). |
| src/mesh/Router.cpp | Uses NodeDB::copyPublicKey() so PKI DMs can decrypt/encrypt for warm-tier nodes. |
| src/mesh/NodeDB.h | Adds warm tier, protected-node cap API, satellite caps, and public key copy helper. |
| src/mesh/NodeDB.cpp | Implements migration demotion to warm tier, satellite caps, protected-node cap, and warm-tier persistence. |
| src/mesh/NextHopRouter.cpp | Stores ACK-confirmed next hops in both NodeDB and TMM, and consults TMM as fallback. |
| src/mesh/Default.h | Changes default position-dedup grid precision (24 → 19 bits). |
| src/graphics/draw/MenuHandler.cpp | Surfaces protected-node-cap failures when favoriting/ignoring via UI. |
| extra_scripts/nrf52_warm_region.py | New post-link guard to fail builds that overlap reserved warm-store flash. |
…store Reworks the TrafficManagementModule cache layer (policing behaviour unchanged from upstream) and adds a routing-hint overflow store: - Flatten the ring: replace the cuckoo-hashed unified cache and the bucketed PSRAM NodeInfo index with plain flat arrays + linear scan (same idiom as WarmNodeStore). At LoRa packet rates an O(n) scan of the cache is negligible, and it removes a large amount of hashing/displacement complexity. The cache entry is 11 B; timestamps use a uniform +1 presence-offset so a 0 byte always means "empty" across every sub-store. Adds rebaseEpoch() so cached state survives the ~19 h relative-timestamp horizon instead of being flushed. - Next-hop overflow cache: setNextHop/getNextHopHint store a confirmed last-byte relay for a destination, written only from NextHopRouter's ACK-confirmed decision (and mirrored from TraceRoute). NextHopRouter::getNextHop falls back to this cache when the hot NodeDB has no hint, so DMs/relays to long-tail nodes keep routing after the node ages out of NodeInfoLite. - Persistence: preloadNextHopsFromNodeDB warm-starts the cache from persisted NodeInfoLite hints on first maintenance pass; next_hop entries are kept alive across the maintenance sweep (no TTL) and never clobbered by a stale preload. All packet-policing logic (rate limit, position dedup, unknown-packet drop, NodeInfo direct response, hop exhaustion) is the existing upstream behaviour, untouched. HAS_TRAFFIC_MANAGEMENT defaults on so the module is compiled in. (see note). Tests: upstream policing suite now actually runs (adds the MeshTypes.h include that gates HAS_TRAFFIC_MANAGEMENT) plus 4 next-hop tests. Role-aware throttles, politeness, precision clamp, port-interval and mesh-radius gating — and the rate-limit >255 saturation fix — are deferred to the advanced-TMM branch. Note: default dedup movement grid moves to ~91m, which also means 1.5km required to end up with the same signature position - coarser and therefore further than before. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`node` in preloadNextHopsFromNodeDB() is never written through — mark it const to satisfy cppcheck's constVariablePointer check in CI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Position dedup in TrafficManagementModule::handleReceived is gated on channels.isWellKnownChannel(mp.channel). The test helper installWellKnownPrimaryChannel() sets up channelFile/config.lora so that gate is true, but it was defined and never called — so the dedup path was never reached. test_tm_positionDedup_dropsDuplicateWithinWindow therefore failed (duplicate forwarded -> CONTINUE instead of STOP), and test_tm_positionDedup_allowsMovedPosition passed only vacuously. Call installWellKnownPrimaryChannel() in both dedup tests so the dedup path is genuinely exercised. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nt-0 Copilot review (PR meshtastic#10706): - preloadNextHopsFromNodeDB() now returns bool; runOnce only latches nextHopPreloaded once the preload actually ran (retries if nodeDB wasn't ready), instead of skipping it forever. - Remove the empty `#if HAS_VARIABLE_HOPS` blocks in the test. Test correctness: - Three more position-dedup tests were missing installWellKnownPrimaryChannel() (dropsDuplicate/allowsMoved were fixed earlier; allowsDuplicateAfterInterval, cacheFlush, priorRateState were not) — without the well-known-channel gate the dedup path never runs, so their STOP assertions failed. Fake-time injection (no more real sleeps): - Add TrafficManagementModule::s_testNowMs + nowMs(), mirroring HopScalingModule; route all TMM tick/time reads through nowMs(). Tests advance a virtual clock via s_testNowMs instead of testDelay() sleeping real 5-6 min across a tick — the suite drops from ~15 min to ~30 s. Production behaviour is unchanged (nowMs() inlines to millis()). Fingerprint-0 fix: - computePositionFingerprint() never returns 0 now (remap 0 -> 0xFF, mirroring getLastByteOfNodeNum), so a real position that hashes to 0 doesn't collide with the "no position seen" sentinel and its duplicates dedup correctly. test_traffic_management: 34/34 green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reworks the TrafficManagementModule cache layer and adds a routing-hint overflow store.
Stacks on #10705 — review/merge after it; the shared NodeDB/WarmNodeStore files belong to #10705.
Overall packet-policing behaviour is unchanged from upstream, but is now enabled with very basic deduplication - see below.
What it does
setNextHop/getNextHopHintstore a confirmed last-byte relay, written only fromNextHopRouter's ACK-confirmed decision (and mirrored from TraceRoute).getNextHopfalls back to it when the hot NodeDB has no hint, so DMs/relays to long-tail nodes keep routing after the node ages out.NodeInfoLitehints on the first maintenance pass; next-hop entries survive the sweep and aren't clobbered by a stale preload.HAS_TRAFFIC_MANAGEMENT) with position-dedup on: 19-bit grid (~90 m / ±45 m) and an 11 h min-interval between identical positions; rate-limiting left off. Position dedup only runs on well-known channels (newChannels::isWellKnownChannelgate).Tests:
test_traffic_management— upstream policing suite plus next-hop round-trip / persistence cases, and the position-dedup tests now actually exercise the dedup path.🤝 Attestations