Skip to content

[tmva][sofie] Clean_name() erases dots causing tensor name collisions in PyTorch parser #21527

@AdityaDRathore

Description

@AdityaDRathore

Description

Describe the bug

UTILITY::Clean_name() in SOFIE_common.cxx (line 513) erases dot characters (.) from tensor names instead of replacing them with a safe character (e.g., _). This causes tensor name collisions when parsing TorchScript models whose ONNX graphs contain dotted intermediate names.

The Collision

TorchScript generates intermediate tensor names using a sequential dotted pattern:

input.1 → input0 → input1 → input2 → ...

When Clean_name() processes these:

Original Name After Clean_name() Intended Role
input.1 input1 Model input tensor
input1 input1 Intermediate tensor (output of first op)

Both map to the same C++ variable name input1, causing the generated inference code to contain duplicate declarations or silently overwrite tensor data.

Root Cause

// SOFIE_common.cxx line 513-518
std::string UTILITY::Clean_name(std::string input_tensor_name){
   std::string s (input_tensor_name);
   std::replace( s.begin(), s.end(), '-', '_');
   s.erase(std::remove_if(s.begin(), s.end(), 
       []( char const& c ) -> bool { return !std::isalnum(c) && c != '_'; } ), s.end());
   return s;
}

The function replaces -_ explicitly, but all other non-alphanumeric characters (including .) are erased rather than replaced. This is inconsistent: - gets a safe replacement while . gets deleted.

Affected Parsers

  • PyTorch parser (RModelParser_PyTorch.cxx): TorchScript ONNX graphs consistently use dotted names (input.1, x.1, etc.).
  • ONNX parser (RModelParser_ONNX.cxx): Some ONNX models from external tools also use dotted tensor names.
  • Keras parser: Not directly affected (Keras uses _ in names natively).

Why Existing Tests Pass

The existing PyTorch parser tests (SEQUENTIAL_MODEL, MODULE_MODEL, CONVOLUTION_MODEL) pass by coincidence. TorchScript happens to generate non-colliding name sequences for those specific architectures. Models using operators like Tanh, Softmax, or BatchNormalization trigger the collision because TorchScript assigns sequential dotted names without gaps.

Proposed Fix

Replace dots with underscores before erasing, consistent with the existing -_ treatment:

std::string UTILITY::Clean_name(std::string input_tensor_name){
   std::string s (input_tensor_name);
   std::replace( s.begin(), s.end(), '-', '_');
   std::replace( s.begin(), s.end(), '.', '_');  // <-- add this line
   s.erase(std::remove_if(s.begin(), s.end(), 
       []( char const& c ) -> bool { return !std::isalnum(c) && c != '_'; } ), s.end());
   return s;
}

This makes input.1input_1 (distinct from input1), eliminating the collision.

Alternative (parser-level workaround): Normalize dotted names on the Python side before they reach Clean_name(). This is what PR #21528 (PyTorch parser extension) implements as a workaround, but the root cause remains in Clean_name() and affects all parsers.

Reproducer

  1. Create a PyTorch model with nn.Tanh() or any activation that causes TorchScript to assign sequential dotted tensor names:
import torch
import torch.nn as nn

model = nn.Sequential(
    nn.Linear(4, 8),
    nn.Tanh(),
    nn.Linear(8, 6),
)
model.eval()
m = torch.jit.script(model)
torch.jit.save(m, "test_collision.pt")
  1. Parse with SOFIE:
using namespace TMVA::Experimental;
std::vector<size_t> shape{2, 4};
SOFIE::RModel model = SOFIE::PyTorch::Parse("test_collision.pt", {shape});
model.Generate();
  1. Inspect generated code - the model input input.1 and the intermediate output of Gemm (input1) both become input1, producing a name collision.

Root cause in SOFIE_common.cxx line 513:

std::string UTILITY::Clean_name(std::string input_tensor_name){
   std::string s (input_tensor_name);
   std::replace( s.begin(), s.end(), '-', '_');
   // erases '.' instead of replacing it
   s.erase(std::remove_if(s.begin(), s.end(),
       []( char const& c ) -> bool { return !std::isalnum(c) && c != '_'; } ), s.end());
   return s;
}

Fix: Add std::replace( s.begin(), s.end(), '.', '_'); before the erase, consistent with the existing -_ treatment.

ROOT version

master

Installation method

build from source

Operating system

Linux (Ubuntu 24.04)

Additional context

Affected parsers:

  • PyTorch parser (RModelParser_PyTorch.cxx) - TorchScript consistently uses dotted names
  • ONNX parser (RModelParser_ONNX.cxx) - some external ONNX tools also produce dotted names

Why existing tests pass: The current SEQUENTIAL_MODEL, MODULE_MODEL, and CONVOLUTION_MODEL tests use architectures where TorchScript happens to generate non-colliding name sequences (e.g., input, input.3, input.5 - skipping numbers). Models with Tanh, Softmax, or BatchNormalization trigger the collision because TorchScript assigns consecutive names without gaps.

Workaround: Normalizing dots to underscores on the Python side before names reach Clean_name(), e.g., x.debugName().replace('.','_'). This is implemented in PR #21528 as a parser-level workaround.

cc: @lmoneta @sanjibansg @guitargeek

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions