[IR Container] Phase 2 Copy-Move Semantics#5964
[IR Container] Phase 2 Copy-Move Semantics#5964mdavis36 wants to merge 1 commit intomd/phase2-per-fusionfrom
Conversation
Copy constructor now shares the source's container pointer instead of creating a new one. Fusion::copy clones directly from per-Fusion filtered vals rather than delegating to IrContainer::copy. Swap changed from content-based (IrContainer::swap) to pointer-based with per-Fusion ownership tracking for both same-container and different-container cases.
|
!test |
Description
|
| Relevant files | |||
|---|---|---|---|
| Enhancement |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 No relevant tests |
| 🔒 No security concerns identified |
| ⚡ Recommended focus areas for review |
Memory Safety
|
Test failures
-
(High, 195)
CUDA InvalidAddressSpace errors in nvFuser direct test_repro suiteTest Name A100 GB200 H100 Source tests.python.direct.test_repro.test_issue1246[nvfuser_direct_test=eager] ❌ ❌ ❌ tests.python.direct.test_repro.test_issue1270[nvfuser_direct_test=eager] ❌ ❌ ❌ tests.python.direct.test_repro.test_issue1270[nvfuser_direct_test=lru_cache] ❌ ❌ ❌ tests.python.direct.test_repro.test_issue1273[nvfuser_direct_test=eager] ❌ ❌ ❌ tests.python.direct.test_repro.test_issue1273[nvfuser_direct_test=lru_cache] ❌ tests.python.direct.test_repro.test_issue1277[nvfuser_direct_test=eager] ❌ ❌ ❌ tests.python.direct.test_repro.test_issue1277[nvfuser_direct_test=lru_cache] ❌ ❌ ❌ tests.python.direct.test_repro.test_issue1279[nvfuser_direct_test=eager] ❌ ❌ ❌ tests.python.direct.test_repro.test_issue1279[nvfuser_direct_test=lru_cache] ❌ ❌ ❌ tests.python.direct.test_repro.test_issue1310[nvfuser_direct_test=eager] ❌ ❌ ❌ ... with 57 more test failures omitted. Check internal logs. -
(High, 66)
CUDA uniform_ random kernel ‘InvalidAddressSpace’ errors across NVFuser repro testsTest Name A100 GB200 H100 Source tests.python.direct.test_repro.test_ca_map_concrete_loop_id[nvfuser_direct_test=eager] ❌ ❌ ❌ tests.python.direct.test_repro.test_ca_map_concrete_loop_id[nvfuser_direct_test=lru_cache] ❌ ❌ ❌ tests.python.direct.test_repro.test_domain_map_hang[nvfuser_direct_test=eager] ❌ ❌ ❌ tests.python.direct.test_repro.test_domain_map_hang[nvfuser_direct_test=lru_cache] ❌ ❌ ❌ tests.python.direct.test_repro.test_issue3292[nvfuser_direct_test=eager] ❌ ❌ ❌ tests.python.direct.test_repro.test_issue3292[nvfuser_direct_test=lru_cache] ❌ ❌ ❌ tests.python.direct.test_repro.test_issue3369[nvfuser_direct_test=eager] ❌ ❌ ❌ tests.python.direct.test_repro.test_issue3369[nvfuser_direct_test=lru_cache] ❌ ❌ ❌ tests.python.direct.test_repro.test_issue5377[nvfuser_direct_test=eager] ❌ ❌ ❌ tests.python.direct.test_repro.test_issue5377[nvfuser_direct_test=lru_cache] ❌ ❌ ❌ ... with 13 more test failures omitted. Check internal logs. -
(High, 3)
NVFuser CUDA_ERROR_INVALID_ADDRESS_SPACE in tests.python.direct.test_repro.test_issue1246Test Name A100 GB200 H100 Source tests.python.direct.test_repro.test_issue1246[nvfuser_direct_test=lru_cache] ❌ ❌ ❌ -
(High, 1)
CUDA unavailable during Llama4MoE initialization in tests.python.test_moe on dlcluster_h100Test Name H100 Source tests.python.test_moe.test_llama4_moe_thunderfx ❌ -
(Medium, 137)
NVFuser internal assertion/validation failures (alias_memory.cpp:835 & validator_utils.cpp) across multiple nvFuser test suitesTest Name A100 GB200 H100 Source DynamicTransformTest.DynamicSqueezeTrivialReduction ❌ ❌ ❌ Link DynamicTransformTest.DynamicSqueezeTrivialWelford ❌ ❌ ❌ Link DynamicTransformTest.DynamicTransform3 ❌ ❌ ❌ Link DynamicTransformTest.DynamicTransformFusionExecutorCache ❌ ❌ ❌ Link DynamicTransformTest.DynamicTransformIssue418 ❌ ❌ ❌ Link Gpu2Test.FusionBNRepro2_CUDA ❌ ❌ Link Gpu2Test.FusionVarMean_CUDA ❌ ❌ Link InsertReshardingTest.Execute/1 ❌ ❌ ❌ Link MoveRepeatForwardTest.MoveOverRotation ❌ ❌ ❌ Link MoveRepeatForwardTest.Simple ❌ ❌ ❌ Link ... with 40 more test failures omitted. Check internal logs. -
(Medium, 61)
NVFuser codegen errors (duplicate symbols / alias_memory asserts) in multiple python/direct and multidevice test modulesTest Name A100 A100 (dist.) GB200 GB200 (dist.) H100 H100 (dist.) Source tests.python.direct.test_python_frontend.test_output_stride_order_with_reduction[nvfuser_direct_test=eager] ❌ ❌ ❌ tests.python.direct.test_python_frontend.test_output_stride_order_with_reduction[nvfuser_direct_test=lru_cache] ❌ ❌ ❌ tests.python.multidevice.test_communication.test_allreduce ❌ ❌ ❌ ❌ ❌ tests.python.multidevice.test_communication.test_reduce_scatter_noncontiguous ❌ ❌ tests.python.multidevice.test_matmul.test_linear_reduce_scatter ❌ ❌ ❌ ❌ ❌ tests.python.multidevice.test_matmul.test_row_parallel_linear_with_bias ❌ ❌ ❌ ❌ ❌ tests.python.multidevice.test_multidevice.test_privatize_squeeze ❌ ❌ ❌ ❌ ❌ tests.python.opinfo.test_direct_ops.test_correctness_var_mean_float32 ❌ ❌ thunder.tests.test_grad.test_phantom_grad_vs_torch_consistency_softmin_nvfuser_cuda_thunder.dtypes.bfloat16 ❌ ❌ thunder.tests.test_grad.test_phantom_grad_vs_torch_consistency_softmin_nvfuser_cuda_thunder.dtypes.float16 ❌ ❌ ... with 9 more test failures omitted. Check internal logs. -
(Medium, 12)
NVFuser codegen duplicate symbol compile errors across Alias, DynamicTransform, InsertResharding, Reshape test suitesTest Name A100 GB200 H100 Source AliasTest.Bookend_Issue2375 ❌ ❌ Link DynamicTransformTest.FusionDynamicReshapeReductionShmoo ❌ ❌ ❌ Link Gpu2Test.FusionVarMean_CUDA ❌ Link InsertReshardingTest.Execute/3 ❌ ❌ ❌ Link ReshapeTest.ReductionFlatten1 ❌ ❌ ❌ Link -
(Medium, 8)
nvFuser alias_memory internal assertion failures in MatmulNodeParameterizedTest on A100Test Name A100 Source ReductionAxisIsOne/MatmulNodeParameterizedTest.MatmulNodeConcrete/11 ❌ Link ReductionAxisIsOne/MatmulNodeParameterizedTest.MatmulNodeConcrete/15 ❌ Link ReductionAxisIsOne/MatmulNodeParameterizedTest.MatmulNodeConcrete/17 ❌ Link ReductionAxisIsOne/MatmulNodeParameterizedTest.MatmulNodeSymbolic/12 ❌ Link ReductionAxisIsOne/MatmulNodeParameterizedTest.MatmulNodeSymbolic/13 ❌ Link ReductionAxisIsOne/MatmulNodeParameterizedTest.MatmulNodeSymbolic/14 ❌ Link ReductionAxisIsOne/MatmulNodeParameterizedTest.MatmulNodeSymbolic/19 ❌ Link ReductionAxisIsOne/MatmulNodeParameterizedTest.MatmulNodeSymbolic/3 ❌ Link -
(Medium, 8)
nvFuser multi-device communication result mismatches (reduce_scatter_noncontiguous & insert_resharding_after tests)Test Name A100 A100 (dist.) GB200 GB200 (dist.) H100 (dist.) Source tests.python.multidevice.test_communication.test_reduce_scatter_noncontiguous ❌ ❌ ❌ tests.python.multidevice.test_multidevice.test_insert_resharding_after ❌ ❌ ❌ ❌ ❌ -
(Medium, 6)
Phantom grad vs torch consistency mismatches in Thunder NVFuser (bf16/fp16)Test Name A100 GB200 H100 Source thunder.tests.test_grad.test_phantom_grad_vs_torch_consistency_outer_nvfuser_cuda_thunder.dtypes.bfloat16 ❌ ❌ ❌ thunder.tests.test_grad.test_phantom_grad_vs_torch_consistency_outer_nvfuser_cuda_thunder.dtypes.float16 ❌ ❌ ❌ -
(Medium, 3)
Thunder nvFuser VJP correctness mismatch for take (float64) on CUDATest Name A100 GB200 H100 Source thunder.tests.test_grad.test_vjp_correctness_take_nvfuser_cuda_thunder.dtypes.float64 ❌ ❌ ❌ -
(Medium, 3)
NVFuser IndexingTest.Reshape inline-expression mismatch across multiple GPU runnersTest Name A100 GB200 H100 Source IndexingTest.Reshape ❌ ❌ ❌ Link
Copy constructor now shares the source's container pointer instead of creating a new one. Fusion::copy clones directly from per-Fusion filtered vals rather than delegating to IrContainer::copy. Swap changed from content-based (IrContainer::swap) to pointer-based with per-Fusion ownership tracking for both same-container and different-container cases.