Handle addrList allocation failures safely in RF24Mesh master nodes#266
Conversation
|
Woah! Was that prompt generated? It is ridiculously long. |
What is ridiculously long? This PR is all prompt generated. |
|
Click "Original prompt" in the description (at the bottom), and you'll see what I mean. |
|
lol, i see. Yeah, I just asked it to find memory leaks in the RF24Mesh repo lol. @copilot run clang-tidy and clang-format on this pr |
|
I think Anthropic (& some exclusive friends) has been doing the same for a lot of package registries recently. I've seen a lot of CVE alerts about python, npm, and rust deps lately. |
Ran both locally on this PR.
|
|
We may need to add an AGENTS.md file in these repos, so that they know how to setup the build env (including RF24* deps and their git HEAD refs). |
|
I have no idea how to do that :p |
|
It is basically like a CONTRIBUTING.md doc for any AI agents: totally free-form, no standard convention. |
Memory usage change @ 55a6ce3
Click for full report per board
|
| Sketch | Flash | % | RAM for global variables | % |
|---|---|---|---|---|
| examples/RF24Mesh_Example | 0 | 0 | 0 | 0 |
| examples/RF24Mesh_Example_Master_Statics | +16 | +0.05 | 0 | 0 |
| examples/RF24Mesh_Example_Master_To_Nodes | +20 | +0.07 | 0 | 0 |
| examples/RF24Mesh_Example_Node2Node | 0 | 0 | 0 | 0 |
| examples/RF24Mesh_Example_Node2NodeExtra | 0 | 0 | 0 | 0 |
| examples/RF24Mesh_SerialConfig | +8 | +0.03 | 0 | 0 |
| examples/RF24Mesh_Example_Master | +20 | +0.07 | 0 | 0 |
arduino:samd:mkrzero
| Sketch | Flash | % | RAM for global variables | % |
|---|---|---|---|---|
| examples/RF24Mesh_Example | +4 | 0 | 0 | 0 |
| examples/RF24Mesh_Example_Master_Statics | +12 | 0 | 0 | 0 |
| examples/RF24Mesh_Example_Master_To_Nodes | +16 | +0.01 | 0 | 0 |
| examples/RF24Mesh_Example_Node2Node | +4 | 0 | 0 | 0 |
| examples/RF24Mesh_Example_Node2NodeExtra | +4 | 0 | 0 | 0 |
| examples/RF24Mesh_SerialConfig | +4 | 0 | 0 | 0 |
| examples/RF24Mesh_Example_Master | +16 | +0.01 | 0 | 0 |
|
Would you want to tackle that, or are you leaving it up to me? |
|
I was just thinking about how I'd make copilot do it for all the RF24 repos at once... I think I can do it in a VS Code workspace with all the local RF24 repos added. PS - I've been rewriting the |
|
Hmm, having an agent build RF24Ethernet is very different from other Linux-compatible RF24 repos. It would have to use arduino-cli, and I doubt we can use clang-tidy unless the arduino-cli fixed their compilation database support. I think I'll skip that one for now. Are you planning on tasking AI to analyze RF24Ethernet? |
Already done to find that previous memory leak, this time I used it just to confirm what I already suspected and found another one. Its just a matter of testing and figuring out lwIP process now without crashing the MCU. This whole memory leak stuff is what is killing me as far as a timeframe for addressing these other PRs, |
|
You heavily rely on calling Almost done with the AGENTS.md and clang-tidy configs. I usually waste about 3 prompts before I throw my hands in the air and just say aloud, "Damn it, I'll just do it myself!" |
Another reason why I F&#$ing ❤️ Rust (🦀). |
2bndy5
left a comment
There was a problem hiding this comment.
It is a good idea to check the result of (re)malloc calls, and only proceed normally when successfully allocated the requested mem block.
If this wasn't embedded C++, then I'd insist that an exception is thrown to propagate alloc errors to the caller. Unfortunately, throwing exceptions in embedded C++ isn't really a thing, at least not for most implementations (I think the ESP32 core actually allows exceptions).
I say this because if alloc fails, this patch will silently drop any incoming requests for newly assigned mesh addresses. As it stands now, this patch just prevents data loss of previously assigned addresses.
- Is there some kind of error handling elsewhere that I am missing? I think this is heavily relying on child nodes to interpret a master's response, be it silent, good, or somehow erroneous/unexpected.
- I'm guessing the child node's double-check of a newly assigned address would surface this alloc problem in an obscure way. Didn't we add a Error message type in the reserved system message types?
|
I'm running this code on all of my Linux/Gateway nodes with no apparent issues so far.
The child nodes should either see themselves de-allocated or allocated with the wrong address, meaning they should just continue retrying to renew their addresses. There should be nothing new to interpret. I'm not sure how to test this fully, other than hard-coding it to fail once or twice, but in a working situation, there should be no real "unexpected" behaviour that I can see. |
|
I'm envisioning a DoS when mem is saturated on master node. |
|
What else? If memory is saturated, the |
|
yeah, that's what I'm thinking too. I was just thinking it would be better to have a message saying "we're at capacity" to prospecting child nodes, instead of letting them clog the bandwidth trying to reconnect. I'm probably overthinking again. |
|
Haha, yeah I'd agree with that expectation for Linux devices, but for embedded systems I think that is probably overkill. There should typically be a timeout on address renewals, so it's not that bad, but yeah a large number of nodes trying to connect constantly is a burden & could even create a DOS for the whole system. |
|
Wondering if you approve this PR or have more concerns @2bndy5 ? |
2bndy5
left a comment
There was a problem hiding this comment.
Sorry. thought my last review was an approval
@copilotclang-format-14on the modified PR file (RF24Mesh.cpp)clang-tidy-14againstRF24Mesh.cppand review diagnosticsOriginal prompt
Create a pull request in repository
nRF24/RF24Meshagainst base branchmasterthat hardens allocation failure handling for the master node address list inRF24Mesh.cpp.Scope:
RF24Mesh.cpp.ESBMesh<network_t, radio_t>::begin(...), check the result of the initialmalloc()used to allocateaddrList.falseand do not setaddrMemAllocated = true.ESBMesh<network_t, radio_t>::setAddress(...), replace the direct assignment ofrealloc()back toaddrListwith a temporary pointer.realloc()fails, return early without losing the originaladdrListpointer.Constraints:
Rationale:
realloc()directly toaddrList, which can lose the original allocation on failure and lead to a memory leak plus null dereference.malloc()inbegin()is also unchecked.Use this PR metadata:
Title:
Handle addrList allocation failures safely in RF24Mesh master nodesBody:
Summary
This PR hardens dynamic memory handling for the master node's
addrListinRF24Mesh.Specifically:
malloc()inbegin()addrListdirectly withrealloc()insetAddress()realloc()failsaddrListafter allocation failureProblem
RF24Meshallocates the master node's address list dynamically:begin()setAddress()Before this change,
setAddress()assignedrealloc()directly back toaddrList:If
realloc()fails, it returnsNULLwhile the original allocation remains valid. Assigning that return value directly toaddrListloses the only pointer to the original allocation, which creates:addrListSimilarly, the initial
malloc()inbegin()was not checked beforeloadDHCP()and later use.Changes
begin()malloc()falseif allocation failsaddrMemAllocated = trueafter successful allocationsetAddress()realloc()addrListuntouched on failureWhy this is useful
On many embedded targets allocation failure may be rare, but this still improves correctness and makes failure behavior safer and more predictable, especially on:
addrListstateBehavior impact
There should be no behavior change during normal operation.
On allocation failure:
begin()now fails cleanly instead of proceeding with an invalidaddrListsetAddress()now leaves the existing address list intact instead of losing itFiles changed
RF24Mesh.cppNotes
This PR intentionally keeps the current allocation model and only hardens failure handling. It does not change the overall ownership/lifetime design of
addrList.The following is the prior conversation context from the user's chat exploration (may be truncated):
User: Can you find any memory leaks in RF24Mesh?
Assistant: Yes — I found at least one real memory-management bug that can leak memory and also lose the only pointer to allocated storage.
Most likely leak: unchecked
realloc()insetAddress().Why this is a bug
If
realloc()fails, it returnsNULLand the original pointer is not freed. But this code assigns the return value directly back toaddrList, so on failure:addrList, which can also crashSo this is both:
Safer pattern
Use a temporary pointer:
Another memory-lifetime concern
The master node allocates
addrListwithmalloc()inbegin():