[Comgr] Fix use-after-free in metadata merge for multiple MsgPack notes#1827
[Comgr] Fix use-after-free in metadata merge for multiple MsgPack notes#1827
Conversation
When merging metadata from multiple MsgPack notes, the code was reusing the same Document for parsing each note while accumulating results in a separate DocNode (Root). Since DocNode contains pointers to memory owned by its Document, clearing and reusing the Document for subsequent notes invalidated the pointers held by the accumulated Root node, causing a use-after-free crash. Fix this by using a temporary Document for parsing each note, then deep copying the parsed nodes into the main Document. This ensures all nodes in Root point to memory owned by a Document that is never cleared. Co-Authored-By: Claude <noreply@anthropic.com>
|
PSDB Build Link: http://mlse-bdc-20dd129:8065/#/builders/11/builds/68 |
slinder1
left a comment
There was a problem hiding this comment.
I am still not entirely comfortable with the fact that we don't seem to fully understand what we are fixing. I would have to look at the ASAN report and do some debugging to be more helpful, and I don't have the time right now.
The code here LGTM, and because Document never frees the backing arrays/maps/strings it doesn't seem unreasonable to first parse into a temporary one anyway.
Final note: I believe the Document APIs around converting nodes in-place to maps and arrays has always had some lingering UB in that it effectively reinterprets an object allocated as the base type as a derived type without any placement new or anything to actually create the derived object. It may be that UB which ultimately confuses ASAN (which would be right to assume the derived type object is never actually created).
If I rebuild Comgr with BUILD_TYPE=Debug, I get this error: bad map key type I think the root issue is coming from this line that Sam added: Before we were never comparing Maps between To/From, just returning early if both had printf, otherwise copy/pasting Printf to the new Document. The comparison is pulling in some extra logic we didn't have before, and isn't doing what it appears to: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/BinaryFormat/MsgPackDocument.h#L155. If the maps are coming from the same document, it hits the llvm_unreachable(), due to amdhsa.printf being an array: If they're different documents (i.e. we create new documents like in this PR), it just always returns true: TLDR: I don't think MsgPackDocument was set up for cross-document comparison @yxsamliu do we need to do a proper comparison of the amdhsa.printf content? From your commit message, I see "The amdhsa.printf content is identical across all partitions". But I guess we're worried about cases where we may mergeNoteRecords() that weren't created by splitting a single device module via LTO partitions? |
When merging metadata from multiple MsgPack notes, the code was reusing the same Document for parsing each note while accumulating results in a separate DocNode (Root). Since DocNode contains pointers to memory owned by its Document, clearing and reusing the Document for subsequent notes invalidated the pointers held by the accumulated Root node, causing a use-after-free crash.
Fix this by using a temporary Document for parsing each note, then deep copying the parsed nodes into the main Document. This ensures all nodes in Root point to memory owned by a Document that is never cleared.