[IR Container] Phase 2 Copy-Move Semantics by mdavis36 · Pull Request #5964 · NVIDIA/Fuser

mdavis36 · 2026-02-12T23:31:36Z

Copy constructor now shares the source's container pointer instead of creating a new one. Fusion::copy clones directly from per-Fusion filtered vals rather than delegating to IrContainer::copy. Swap changed from content-based (IrContainer::swap) to pointer-based with per-Fusion ownership tracking for both same-container and different-container cases.

mdavis36 · 2026-02-12T23:31:46Z

!test

github-actions · 2026-02-12T23:32:29Z

Description

Complete rewrite of Fusion::swap to use pointer-based approach with per-Fusion ownership tracking
Modified Fusion::copy to clone directly from per-Fusion filtered vals instead of delegating to IrContainer::copy
Changed copy constructor to share source's container pointer instead of creating new container
Added self-assignment check to move assignment operator for safety

Changes walkthrough

Relevant files

Enhancement

fusion.cpp `Phase 2 Copy-Move Semantics Implementation` csrc/fusion.cpp Completely rewrote Fusion::swap function with pointer-based approach and per-Fusion ownership tracking Modified Fusion::copy to clone vals directly instead of delegating to IrContainer::copy Changed copy constructor to share source's container pointer Added self-assignment check to move assignment operator	+84/-36

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review
Memory Safety The copy constructor now shares the source's container pointer (line 304), which could lead to dangling references if the source Fusion is destroyed while the copy still exists. This changes fundamental ownership semantics and requires careful review of lifetime management and container reference counting. Fusion::Fusion(const Fusion& other) : ir_container_(other.ir_container_) { FUSER_PERF_SCOPE("Fusion copy"); ir_container_->addFusion(this); Fusion::copy(&other, this); } Complex Swap Logic The new swap implementation (lines 107-192) is significantly more complex with multiple phases of pointer updates and ownership transfers. The complexity increases the risk of bugs, especially in edge cases like self-swap or when containers are null. The logic for handling same-container vs different-container cases needs thorough validation. void Fusion::swap(Fusion& a, Fusion& b) noexcept { FUSER_PERF_SCOPE("Fusion swap"); if (&a == &b) { return; } // Collect statements owned by each Fusion BEFORE swap std::vector<Val> a_owned_vals, b_owned_vals; std::vector<Expr> a_owned_exprs, b_owned_exprs; if (a.ir_container_) { const auto& av = a.ir_container_->valsOwnedBy(&a); const auto& ae = a.ir_container_->exprsOwnedBy(&a); a_owned_vals.assign(av.begin(), av.end()); a_owned_exprs.assign(ae.begin(), ae.end()); } if (b.ir_container_) { const auto& bv = b.ir_container_->valsOwnedBy(&b); const auto& be = b.ir_container_->exprsOwnedBy(&b); b_owned_vals.assign(bv.begin(), bv.end()); b_owned_exprs.assign(be.begin(), be.end()); } // Transfer Fusion registrations between containers before pointer swap. // After swap, a will own b's container and b will own a's container. if (a.ir_container_ && b.ir_container_ && a.ir_container_.get() != b.ir_container_.get()) { a.ir_container_->transferFusion(&a, &b); b.ir_container_->transferFusion(&b, &a); } // Swap container pointers std::swap(a.ir_container_, b.ir_container_); // Swap all Fusion-level members std::swap(a.inputs_, b.inputs_); std::swap(a.outputs_, b.outputs_); std::swap(a.io_alias_, b.io_alias_); std::swap(a.all_tv_uses_valid_, b.all_tv_uses_valid_); std::swap(a.is_during_update_uses_, b.is_during_update_uses_); std::swap(a.managed_data_, b.managed_data_); std::swap(a.managed_named_data_, b.managed_named_data_); std::swap(a.expected_dynamic_smem_bytes_, b.expected_dynamic_smem_bytes_); std::swap(a.all_tvs_ptr_, b.all_tvs_ptr_); std::swap(a.zero_val_, b.zero_val_); std::swap(a.one_val_, b.one_val_); std::swap(a.true_val_, b.true_val_); std::swap(a.false_val_, b.false_val_); std::swap(a.magic_zero_val_, b.magic_zero_val_); std::swap(a.axioms_, b.axioms_); std::swap(a.metadata_, b.metadata_); // Update Statement::ir_container_ pointers: a's old statements now belong // to b, and b's old statements now belong to a for (auto* val : a_owned_vals) { val->ir_container_ = &b; } for (auto* expr : a_owned_exprs) { expr->ir_container_ = &b; } for (auto* val : b_owned_vals) { val->ir_container_ = &a; } for (auto* expr : b_owned_exprs) { expr->ir_container_ = &a; } // Update per-Fusion tracking keys in containers if (a.ir_container_ && b.ir_container_) { if (a.ir_container_.get() == b.ir_container_.get()) { // Same container: directly swap per-Fusion tracking entries auto* c = a.ir_container_.get(); std::swap(c->per_fusion_vals_[&a], c->per_fusion_vals_[&b]); std::swap(c->per_fusion_exprs_[&a], c->per_fusion_exprs_[&b]); } else { // Different containers: rename tracking keys to match new owners a.ir_container_->transferStatementOwnership(&b, &a); b.ir_container_->transferStatementOwnership(&a, &b); } } else if (a.ir_container_) { a.ir_container_->transferStatementOwnership(&b, &a); } else if (b.ir_container_) { b.ir_container_->transferStatementOwnership(&a, &b); } } Missing Tests No test changes are visible in the diff despite fundamental changes to copy and swap semantics. These are core operations that affect memory management and object lifecycle. Comprehensive tests should be added to verify correct behavior, especially for edge cases like self-assignment, container sharing, and ownership transfers. IrCloner Fusion::copy(const Fusion* from, Fusion* to) { to->clear(); IrCloner ir_cloner(to); // Clone from's vals in insertion order for (auto val : from->deterministic_vals()) { ir_cloner.clone(val); } // Wire up definitions and uses on cloned vals for (auto val : from->vals()) { ir_cloner.clone(val)->setDefinition(ir_cloner.clone(val->definition_)); ir_cloner.clone(val)->setUses(ir_cloner.clone(val->uses_)); } // Remap cached special val pointers if (from->zero_val_) { to->zero_val_ = ir_cloner.clone(from->zero_val_); } if (from->one_val_) { to->one_val_ = ir_cloner.clone(from->one_val_); } if (from->true_val_) { to->true_val_ = ir_cloner.clone(from->true_val_); } if (from->false_val_) { to->false_val_ = ir_cloner.clone(from->false_val_); } if (from->magic_zero_val_) { to->magic_zero_val_ = ir_cloner.clone(from->magic_zero_val_)->as<NamedScalar>(); } to->inputs_ = ir_cloner.clone(from->inputs_); to->outputs_ = ir_cloner.clone(from->outputs_); for (auto inp : to->inputs_) { inp->setIsFusionInput(true); } for (auto out : to->outputs_) { out->setIsFusionOutput(true); } for (Val* out : from->outputs_) { const AliasInfo& alias = from->io_alias_.get(out); if (alias.type == AllocationType::New) { continue; } Val* copied_out = ir_cloner.clone(out); Val* copied_in = ir_cloner.clone(alias.aliased_io); to->io_alias_.add(copied_out, copied_in, alias.type, alias.visibility); } to->all_tv_uses_valid_ = from->all_tv_uses_valid_; to->is_during_update_uses_ = from->is_during_update_uses_; for (const auto& i : from->managed_data_) { if (i.first.has_value()) { to->managed_data_.emplace_back(i.second(ir_cloner, i.first), i.second); } else { to->managed_data_.emplace_back(i.first, i.second); } }

Test failures

(High, 195) CUDA InvalidAddressSpace errors in nvFuser direct test_repro suite

Test Name	A100	GB200	H100
tests.python.direct.test_repro.test_issue1246[nvfuser_direct_test=eager]	❌	❌	❌
tests.python.direct.test_repro.test_issue1270[nvfuser_direct_test=eager]	❌	❌	❌
tests.python.direct.test_repro.test_issue1270[nvfuser_direct_test=lru_cache]	❌	❌	❌
tests.python.direct.test_repro.test_issue1273[nvfuser_direct_test=eager]	❌	❌	❌
tests.python.direct.test_repro.test_issue1273[nvfuser_direct_test=lru_cache]			❌
tests.python.direct.test_repro.test_issue1277[nvfuser_direct_test=eager]	❌	❌	❌
tests.python.direct.test_repro.test_issue1277[nvfuser_direct_test=lru_cache]	❌	❌	❌
tests.python.direct.test_repro.test_issue1279[nvfuser_direct_test=eager]	❌	❌	❌
tests.python.direct.test_repro.test_issue1279[nvfuser_direct_test=lru_cache]	❌	❌	❌
tests.python.direct.test_repro.test_issue1310[nvfuser_direct_test=eager]	❌	❌	❌
... with 57 more test failures omitted. Check internal logs.

(High, 66) CUDA uniform_ random kernel ‘InvalidAddressSpace’ errors across NVFuser repro tests

Test Name	A100	GB200	H100
tests.python.direct.test_repro.test_ca_map_concrete_loop_id[nvfuser_direct_test=eager]	❌	❌	❌
tests.python.direct.test_repro.test_ca_map_concrete_loop_id[nvfuser_direct_test=lru_cache]	❌	❌	❌
tests.python.direct.test_repro.test_domain_map_hang[nvfuser_direct_test=eager]	❌	❌	❌
tests.python.direct.test_repro.test_domain_map_hang[nvfuser_direct_test=lru_cache]	❌	❌	❌
tests.python.direct.test_repro.test_issue3292[nvfuser_direct_test=eager]	❌	❌	❌
tests.python.direct.test_repro.test_issue3292[nvfuser_direct_test=lru_cache]	❌	❌	❌
tests.python.direct.test_repro.test_issue3369[nvfuser_direct_test=eager]	❌	❌	❌
tests.python.direct.test_repro.test_issue3369[nvfuser_direct_test=lru_cache]	❌	❌	❌
tests.python.direct.test_repro.test_issue5377[nvfuser_direct_test=eager]	❌	❌	❌
tests.python.direct.test_repro.test_issue5377[nvfuser_direct_test=lru_cache]	❌	❌	❌
... with 13 more test failures omitted. Check internal logs.

(High, 3) NVFuser CUDA_ERROR_INVALID_ADDRESS_SPACE in tests.python.direct.test_repro.test_issue1246

Test Name A100 GB200 H100 Source

tests.python.direct.test_repro.test_issue1246[nvfuser_direct_test=lru_cache] ❌ ❌ ❌
(High, 1) CUDA unavailable during Llama4MoE initialization in tests.python.test_moe on dlcluster_h100

Test Name H100 Source

tests.python.test_moe.test_llama4_moe_thunderfx ❌

(Medium, 137) NVFuser internal assertion/validation failures (alias_memory.cpp:835 & validator_utils.cpp) across multiple nvFuser test suites

Test Name	A100	GB200	H100	Source
DynamicTransformTest.DynamicSqueezeTrivialReduction	❌	❌	❌	Link
DynamicTransformTest.DynamicSqueezeTrivialWelford	❌	❌	❌	Link
DynamicTransformTest.DynamicTransform3	❌	❌	❌	Link
DynamicTransformTest.DynamicTransformFusionExecutorCache	❌	❌	❌	Link
DynamicTransformTest.DynamicTransformIssue418	❌	❌	❌	Link
Gpu2Test.FusionBNRepro2_CUDA		❌	❌	Link
Gpu2Test.FusionVarMean_CUDA		❌	❌	Link
InsertReshardingTest.Execute/1	❌	❌	❌	Link
MoveRepeatForwardTest.MoveOverRotation	❌	❌	❌	Link
MoveRepeatForwardTest.Simple	❌	❌	❌	Link
... with 40 more test failures omitted. Check internal logs.

(Medium, 61) NVFuser codegen errors (duplicate symbols / alias_memory asserts) in multiple python/direct and multidevice test modules

Test Name	A100	A100 (dist.)	GB200	GB200 (dist.)	H100	H100 (dist.)
tests.python.direct.test_python_frontend.test_output_stride_order_with_reduction[nvfuser_direct_test=eager]	❌		❌		❌
tests.python.direct.test_python_frontend.test_output_stride_order_with_reduction[nvfuser_direct_test=lru_cache]	❌		❌		❌
tests.python.multidevice.test_communication.test_allreduce	❌	❌	❌	❌		❌
tests.python.multidevice.test_communication.test_reduce_scatter_noncontiguous	❌		❌
tests.python.multidevice.test_matmul.test_linear_reduce_scatter	❌	❌	❌	❌		❌
tests.python.multidevice.test_matmul.test_row_parallel_linear_with_bias	❌	❌	❌	❌		❌
tests.python.multidevice.test_multidevice.test_privatize_squeeze	❌	❌	❌	❌		❌
tests.python.opinfo.test_direct_ops.test_correctness_var_mean_float32	❌		❌
thunder.tests.test_grad.test_phantom_grad_vs_torch_consistency_softmin_nvfuser_cuda_thunder.dtypes.bfloat16			❌		❌
thunder.tests.test_grad.test_phantom_grad_vs_torch_consistency_softmin_nvfuser_cuda_thunder.dtypes.float16			❌		❌
... with 9 more test failures omitted. Check internal logs.

(Medium, 12) NVFuser codegen duplicate symbol compile errors across Alias, DynamicTransform, InsertResharding, Reshape test suites

Test Name	A100	GB200	H100	Source
AliasTest.Bookend_Issue2375		❌	❌	Link
DynamicTransformTest.FusionDynamicReshapeReductionShmoo	❌	❌	❌	Link
Gpu2Test.FusionVarMean_CUDA	❌			Link
InsertReshardingTest.Execute/3	❌	❌	❌	Link
ReshapeTest.ReductionFlatten1	❌	❌	❌	Link

(Medium, 8) nvFuser alias_memory internal assertion failures in MatmulNodeParameterizedTest on A100

Test Name	A100	Source
ReductionAxisIsOne/MatmulNodeParameterizedTest.MatmulNodeConcrete/11	❌	Link
ReductionAxisIsOne/MatmulNodeParameterizedTest.MatmulNodeConcrete/15	❌	Link
ReductionAxisIsOne/MatmulNodeParameterizedTest.MatmulNodeConcrete/17	❌	Link
ReductionAxisIsOne/MatmulNodeParameterizedTest.MatmulNodeSymbolic/12	❌	Link
ReductionAxisIsOne/MatmulNodeParameterizedTest.MatmulNodeSymbolic/13	❌	Link
ReductionAxisIsOne/MatmulNodeParameterizedTest.MatmulNodeSymbolic/14	❌	Link
ReductionAxisIsOne/MatmulNodeParameterizedTest.MatmulNodeSymbolic/19	❌	Link
ReductionAxisIsOne/MatmulNodeParameterizedTest.MatmulNodeSymbolic/3	❌	Link

(Medium, 8) nvFuser multi-device communication result mismatches (reduce_scatter_noncontiguous & insert_resharding_after tests)

Test Name	A100	A100 (dist.)	GB200	GB200 (dist.)	H100 (dist.)	Source
tests.python.multidevice.test_communication.test_reduce_scatter_noncontiguous		❌		❌	❌
tests.python.multidevice.test_multidevice.test_insert_resharding_after	❌	❌	❌	❌	❌

(Medium, 6) Phantom grad vs torch consistency mismatches in Thunder NVFuser (bf16/fp16)

Test Name	A100	GB200	H100	Source
thunder.tests.test_grad.test_phantom_grad_vs_torch_consistency_outer_nvfuser_cuda_thunder.dtypes.bfloat16	❌	❌	❌
thunder.tests.test_grad.test_phantom_grad_vs_torch_consistency_outer_nvfuser_cuda_thunder.dtypes.float16	❌	❌	❌

(Medium, 3) Thunder nvFuser VJP correctness mismatch for take (float64) on CUDA

Test Name A100 GB200 H100 Source

thunder.tests.test_grad.test_vjp_correctness_take_nvfuser_cuda_thunder.dtypes.float64 ❌ ❌ ❌
(Medium, 3) NVFuser IndexingTest.Reshape inline-expression mismatch across multiple GPU runners

Test Name A100 GB200 H100 Source

IndexingTest.Reshape ❌ ❌ ❌ Link

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[IR Container] Phase 2 Copy-Move Semantics#5964

[IR Container] Phase 2 Copy-Move Semantics#5964
mdavis36 wants to merge 1 commit intomd/phase2-per-fusionfrom
md/phase2-copy-move

mdavis36 commented Feb 12, 2026

Uh oh!

mdavis36 commented Feb 12, 2026

Uh oh!

github-actions bot commented Feb 12, 2026 •

edited by xwang233

Loading

Changes walkthrough

PR Reviewer Guide

Test failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mdavis36 commented Feb 12, 2026

Uh oh!

mdavis36 commented Feb 12, 2026

Uh oh!

github-actions bot commented Feb 12, 2026 • edited by xwang233 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes walkthrough

PR Reviewer Guide

Test failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Feb 12, 2026 •

edited by xwang233

Loading