-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Description
Describe the bug
UTILITY::Clean_name() in SOFIE_common.cxx (line 513) erases dot characters (.) from tensor names instead of replacing them with a safe character (e.g., _). This causes tensor name collisions when parsing TorchScript models whose ONNX graphs contain dotted intermediate names.
The Collision
TorchScript generates intermediate tensor names using a sequential dotted pattern:
input.1 → input0 → input1 → input2 → ...
When Clean_name() processes these:
| Original Name | After Clean_name() |
Intended Role |
|---|---|---|
input.1 |
input1 |
Model input tensor |
input1 |
input1 |
Intermediate tensor (output of first op) |
Both map to the same C++ variable name input1, causing the generated inference code to contain duplicate declarations or silently overwrite tensor data.
Root Cause
// SOFIE_common.cxx line 513-518
std::string UTILITY::Clean_name(std::string input_tensor_name){
std::string s (input_tensor_name);
std::replace( s.begin(), s.end(), '-', '_');
s.erase(std::remove_if(s.begin(), s.end(),
[]( char const& c ) -> bool { return !std::isalnum(c) && c != '_'; } ), s.end());
return s;
}The function replaces - → _ explicitly, but all other non-alphanumeric characters (including .) are erased rather than replaced. This is inconsistent: - gets a safe replacement while . gets deleted.
Affected Parsers
- PyTorch parser (
RModelParser_PyTorch.cxx): TorchScript ONNX graphs consistently use dotted names (input.1,x.1, etc.). - ONNX parser (
RModelParser_ONNX.cxx): Some ONNX models from external tools also use dotted tensor names. - Keras parser: Not directly affected (Keras uses
_in names natively).
Why Existing Tests Pass
The existing PyTorch parser tests (SEQUENTIAL_MODEL, MODULE_MODEL, CONVOLUTION_MODEL) pass by coincidence. TorchScript happens to generate non-colliding name sequences for those specific architectures. Models using operators like Tanh, Softmax, or BatchNormalization trigger the collision because TorchScript assigns sequential dotted names without gaps.
Proposed Fix
Replace dots with underscores before erasing, consistent with the existing - → _ treatment:
std::string UTILITY::Clean_name(std::string input_tensor_name){
std::string s (input_tensor_name);
std::replace( s.begin(), s.end(), '-', '_');
std::replace( s.begin(), s.end(), '.', '_'); // <-- add this line
s.erase(std::remove_if(s.begin(), s.end(),
[]( char const& c ) -> bool { return !std::isalnum(c) && c != '_'; } ), s.end());
return s;
}This makes input.1 → input_1 (distinct from input1), eliminating the collision.
Alternative (parser-level workaround): Normalize dotted names on the Python side before they reach Clean_name(). This is what PR #21528 (PyTorch parser extension) implements as a workaround, but the root cause remains in Clean_name() and affects all parsers.
Reproducer
- Create a PyTorch model with
nn.Tanh()or any activation that causes TorchScript to assign sequential dotted tensor names:
import torch
import torch.nn as nn
model = nn.Sequential(
nn.Linear(4, 8),
nn.Tanh(),
nn.Linear(8, 6),
)
model.eval()
m = torch.jit.script(model)
torch.jit.save(m, "test_collision.pt")- Parse with SOFIE:
using namespace TMVA::Experimental;
std::vector<size_t> shape{2, 4};
SOFIE::RModel model = SOFIE::PyTorch::Parse("test_collision.pt", {shape});
model.Generate();- Inspect generated code - the model input
input.1and the intermediate output of Gemm (input1) both becomeinput1, producing a name collision.
Root cause in SOFIE_common.cxx line 513:
std::string UTILITY::Clean_name(std::string input_tensor_name){
std::string s (input_tensor_name);
std::replace( s.begin(), s.end(), '-', '_');
// erases '.' instead of replacing it
s.erase(std::remove_if(s.begin(), s.end(),
[]( char const& c ) -> bool { return !std::isalnum(c) && c != '_'; } ), s.end());
return s;
}Fix: Add std::replace( s.begin(), s.end(), '.', '_'); before the erase, consistent with the existing - → _ treatment.
ROOT version
master
Installation method
build from source
Operating system
Linux (Ubuntu 24.04)
Additional context
Affected parsers:
- PyTorch parser (
RModelParser_PyTorch.cxx) - TorchScript consistently uses dotted names - ONNX parser (
RModelParser_ONNX.cxx) - some external ONNX tools also produce dotted names
Why existing tests pass: The current SEQUENTIAL_MODEL, MODULE_MODEL, and CONVOLUTION_MODEL tests use architectures where TorchScript happens to generate non-colliding name sequences (e.g., input, input.3, input.5 - skipping numbers). Models with Tanh, Softmax, or BatchNormalization trigger the collision because TorchScript assigns consecutive names without gaps.
Workaround: Normalizing dots to underscores on the Python side before names reach Clean_name(), e.g., x.debugName().replace('.','_'). This is implemented in PR #21528 as a parser-level workaround.