Do Not Merge: Integration Branch for GT4Py Next #1

philip-paul-mueller · 2025-04-30T10:04:09Z

This is the PR/branch that GT4Py.Next uses to pull DaCe.
It is essentially DaCe main together with our fixes that, for various reasons have not made it yet into DaCe main.

The process for updating this branch is is as follows there are no exceptions:

You start with current DaCe main.
Then you include the PR that enables automatic Python index update, by squash merge it.
Then squash merge the PRs that are listed below, check if they have been merged into DaCe proper and if so remove them from the list.
Then update the version.py file in the dace/ subfolder. Make sure that there is no new line at the end. For next we are using the epoch 43, cartesian would use 42. As version number the date is used. Thus the version (for next) would look something like: '43!YYYY.MM.DD'.
Force push your changes to this branch (gt4py-next-integration).
Create a tag with the pattern __gt4py-next-integration_YYYY_MM_DD and push it as well.
Make sure that the workflow has been triggered.

Afterwards you have to update GT4Py's pyproject.toml file.
For this you have to update the version requirement of DaCe in the dace-next group at the beginning of the file to the version you just created, i.e. change it to dace==43!YYYY.MM.DD.
Then you have to update the the source in the uv specific parts of the file, there you have to change the source to the new tag you have just created.
Then you have to update the uv look by running uv sync --extra next --group dace-next, if you have installed the precommit hooks then this will be done automatically.

NOTE: Once PR#2423 has been merged the second step, i.e. adapting the tag in the uv specific parts is no longer needed.

On top of DaCe/main we are using the following PRs:

Fix DFS traversal in MapFusionVertical

No Longer Needed

Fix InlineMultistateSDFG: find new name for all control-flow blocks
Fixing concurrency bug in DaCe.Config
Fix codegen for cuda memory pool
Avoided costly copy operation
Better Use Of Simplify
Improve PruneSymbols
Fixing scope_tree_recursive()
Updated edge consolidation in MapFusion
Fixed other_subset validation
Handling of Empty Memlets by GPU Codegen
Modification of CUDA Codegen, setting streams
Fixes positional arguments if restored
Uses anonymous file to store configuration
Make GPU Code Generation Reentrant
Updated GPU Copies
Fixes Configuration Contextmanager
PruneConnector now uses the isolation function for nested SDFG
Modification of state_fission()
Update SubgraphView
Updated Map fusion
Refine try_initialize() edges in Map fusion
Improving symbol usage
Added Fine Grain Control to MapFusionVertical
Fixed MapFusion
Modifications of the DaCe Default Configuration - Instead, dace is configured in gt4py as required.
Faster check of SDFG return value
Canonicalize Memlets
Fixed RedundantSecondArray
Disable AddThreadBlockMap transformation in cuda codegen; no longer needed since DaCe PR#2202 has been merged.
Relocate import in fast_call()
Added compiled_sdfg_call_hooks_manager
Make self._lastargs Mutable No longer needed since we now use GT4Py PR#2353 and DaCe PR#2206.
MapFusion: Multiple Consumer of Second Map
Added canonicalized Memlet functions
Make self._lastargs Mutable (Should be replaced by a more permanent solution).
Refactored Reloading Scheme
Deterministic Map Labels in MapFusion*
Fixed AddThreadBlockMap
Fix apply_transformation_once_everywhere()
Fix convert block_size config
Add NVTX/rocTX ranges
CompiledSDFG refactoring (archive):
For some reason the original PR has been "taken over" by Tal.
Due to the inherent dependency that GT4Py has on this PR we should use the the archive (liked at the top).
Fix gpu reduction templates
Add #pragma unroll to LoopRegion generated code
Add HIP-codegen for stream-order memory allocation

Instead of pulling directly from the official DaCe repo, we now (for the time being) pull from [this PR](GridTools/dace#1). This became necessary as we have a lot of open PR in DaCe and need some custom fixes (that can in their current form not be merged into DaCe). In the long term however, we should switch back to the main DaCe repo.

Increased pytest timeout from 300 to 600 seconds.

## Refactor `dace/data.py` into `dace/data/` package ### Summary This PR refactors the monolithic `dace/data.py` file into a modular `dace/data/` package with separate files for different functionality, improving code organization and maintainability. ### Changes - [x] **`dace/data/core.py`**: Core data descriptor classes (`Data`, `Scalar`, `Array`, `ContainerArray`, `Stream`, `Structure`, `View`, `Reference` and their subclasses) - [x] **`dace/data/tensor.py`**: Tensor/sparse tensor support (`Tensor`, `TensorIndex*` classes) - [x] **`dace/data/creation.py`**: Data descriptor creation functions (`create_datadescriptor`, `make_array_from_descriptor`, `make_reference_from_descriptor`) - [x] **`dace/data/ctypes_interop.py`**: Ctypes interoperability (`make_ctypes_argument`) - [x] **`dace/data/ml.py`**: ML-related descriptors (`ParameterArray`) - [x] **`dace/data/__init__.py`**: Re-exports all public API for backward compatibility - [x] **`dace/utils.py`**: Utility functions (`find_new_name`, `deduplicate`, `prod`) - [x] **`dace/properties.py`**: Updated to handle circular import gracefully - [x] **`dace/autodiff/library/library.py`**: Updated to import `ParameterArray` from the new location - [x] **Deleted** old `dace/data.py` file - [x] **Removed** `Number` and `ArrayLike` from `dace/data/__init__.py` (other places import directly) - [x] **Moved** `_prod` to `dace/utils.py` as `prod` (kept `_prod` export for backward compat) - [x] **Fixed** broken imports in `data_report.py`, `data_layout_tuner.py`, and `cutout.py` ### Backward Compatibility All public APIs are re-exported from `dace.data`, ensuring backward compatibility with existing code.  <details> <summary>Original prompt</summary> > > ---- > > *This section details on the original issue you should resolve* > > <issue_title>Refactor `dace/data.py`</issue_title> > <issue_description>`data.py` is a monolithic file containing classes for core data containers (Data, Scalar, Array, Stream, View, Reference, and their subclasses `*{View, Reference}`; functionality to get data descriptors from arbitrary objects; derived objects for Tensors and sparse tensors; and other functions. > > This issue will be resolved once `data.py` is refactored to a `dace/data/*` folder, which will contain separate files for: > 1. core descriptor classes > 2. structures (the Structure class and similar functionality) > 3. tensors/sparse tensors > 4. descriptor creation > 5. ML-related data descriptors, such as parameter arrays (see `dace/autodiff/library/library.py`) > 6...N. Other functions and classes categorized by their semantic meaning. > > The code for `dace/data/*` will be refactored out of `data.py` (which should not exist at the end of this issue), `dtypes.py` (which may exist but be shorter), and other files that contain data descriptors (subclasses of Data/Array/Stream/Structure/View/Reference, such as ParameterArray. Try to find all such subclasses in the codebase barring tests/* and samples/*). > > Lastly, utility functions in `data.py` and `dtypes.py` (only those two files for this issue), such as `find_new_name` from data.py and `deduplicate` from dtypes.py, should find themselves in a new `dace/utils.py` file.</issue_description> > > ## Comments on the Issue (you are @copilot in this section) > > <comments> > </comments> > </details> - Fixes spcl#2244  --- 💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: tbennun <8348955+tbennun@users.noreply.github.com>

…to seq. maps inside GPU kernels or gpu dev. maps (spcl#2088) GPU codegen crashes and generates incorrect code with dynamic inputs to seq. maps inside GPU kernels or gpu dev. maps --------- Co-authored-by: alexnick83 <31545860+alexnick83@users.noreply.github.com> Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com>

…pcl#2246) Updated the documentation for proposed pass decomposition, including changes to pass names and descriptions for clarity.

The current version of DaCe only supports CUDA 12 since CUDA 13 has a few breaking changes. This PR updates the runtime headers according to https://nvidia.github.io/cccl/cccl/3.0_migration_guide.html, replacing classes that moved from cub to thrust. As I have no way of testing on CUDA 12, I have conservatively left the old code intact using preprocessor macros. Tested on CUDA 13.1

…spcl#2251) The values retrieved from `Config` are strings. For block_size configuration, the values need to be converted to `int` type. Co-authored-by: Philipp Schaad <schaad.phil@gmail.com> Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com>

As part of bringing our schedule tree branch from `v1/maintenance` to `main` (see spcl#2262), I have fixed a couple of typos. I'd like to pull these out into a separate PR to keep the schedule tree PR as small as possible (which will still be large once done). Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

The DaCe FPGA backend will be supported out-of-tree. This PR extracts the capabilities to a separate repository: https://github.com/spcl/dace-fpga This PR also reduces circular imports and encourages more object-oriented design for code generation.

- Added NVTX/rocTX markers to SDFGs, states, scopes and copies in the CPU code to analyze and measure easily the DaCe runtime using NVIDIA/AMD profiling tools - Added related test and documentation --------- Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com> Co-authored-by: Philip Müller <147368808+philip-paul-mueller@users.noreply.github.com> Co-authored-by: Philip Mueller <philip.paul.mueller@bluemain.ch> Co-authored-by: Tal Ben-Nun <tbennun@gmail.com>

Although the GPU codegen is being reworked, this small change would be useful for the time being. It a an extension to the HIP codegen of the current solution for stream-order memory allocation. It has been tested in GT4Py.

- [x] Remove ArrayInterface - [x] Remove other FPGA-related enumeration entries - [x] Mitigate extraneous arguments in `dace.codegen.targets.cpp` - [x] Remove runtime includes and references to DaCe-FPGA macros - [x] Address TODOs from spcl#2252

Allow the compiler to unroll itself the `LoopRegion`s instead of just having a manual unroll feature with the `LoopUnroll` pass

This enables array-like objects to be used in callbacks and retain the original object.

This new type enables DaCe users to perform calculations with stochastic rounding in single precision. This change is validated with unit tests. --------- Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com>

Allow compilation with latest LLVM versions. Without this change there's the following error: ``` > raise cgx.CompilationError('Compiler failure:\n' + ex.output) E dace.codegen.exceptions.CompilationError: Compiler failure: E [ 20%] Building CXX object CMakeFiles/vertically_implicit_solver_at_predictor_step.dir/capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/dycore_711_res_tmps_rcu/.gt4py_cache/vertically_implicit_solver_at_predictor_step_c2100e7df02fe12899d3acf60eb0b974edf9029c01b12b0414e215257065bc8c/src/cpu/vertically_implicit_solver_at_predictor_step.cpp.o E [ 40%] Building HIP object CMakeFiles/vertically_implicit_solver_at_predictor_step.dir/capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/dycore_711_res_tmps_rcu/.gt4py_cache/vertically_implicit_solver_at_predictor_step_c2100e7df02fe12899d3acf60eb0b974edf9029c01b12b0414e215257065bc8c/src/cuda/hip/vertically_implicit_solver_at_predictor_step_cuda.cpp.o E In file included from /capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/dycore_711_res_tmps_rcu/.gt4py_cache/vertically_implicit_solver_at_predictor_step_c2100e7df02fe12899d3acf60eb0b974edf9029c01b12b0414e215257065bc8c/src/cuda/hip/vertically_implicit_solver_at_predictor_step_cuda.cpp:3: E In file included from /capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/.venv/lib/python3.12/site-packages/dace/codegen/../runtime/include/dace/dace.h:17: E /capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/.venv/lib/python3.12/site-packages/dace/codegen/../runtime/include/dace/math.h:509:37: warning: 'host' attribute only applies to functions [-Wignored-attributes] E 509 | static const __attribute__((host)) __attribute__((device)) typeless_pi pi{}; E | ^ E In file included from /capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/dycore_711_res_tmps_rcu/.gt4py_cache/vertically_implicit_solver_at_predictor_step_c2100e7df02fe12899d3acf60eb0b974edf9029c01b12b0414e215257065bc8c/src/cuda/hip/vertically_implicit_solver_at_predictor_step_cuda.cpp:3: E In file included from /capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/.venv/lib/python3.12/site-packages/dace/codegen/../runtime/include/dace/dace.h:30: E /capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/.venv/lib/python3.12/site-packages/dace/codegen/../runtime/include/dace/cuda/copy.cuh:772:41: error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw] E 772 | wcr_custom<T>::template reduce( E | ^ E /capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/.venv/lib/python3.12/site-packages/dace/codegen/../runtime/include/dace/cuda/copy.cuh:779:45: error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw] E 779 | wcr_custom<T>::template reduce( E | ^ E /capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/.venv/lib/python3.12/site-packages/dace/codegen/../runtime/include/dace/cuda/copy.cuh:796:49: error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw] E 796 | wcr_fixed<REDTYPE, T>::template reduce_atomic( E | ^ E /capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/.venv/lib/python3.12/site-packages/dace/codegen/../runtime/include/dace/cuda/copy.cuh:803:53: error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw] E 803 | wcr_fixed<REDTYPE, T>::template reduce_atomic( E | ^ E 1 warning and 4 errors generated when compiling for gfx942. E gmake[2]: *** [CMakeFiles/vertically_implicit_solver_at_predictor_step.dir/build.make:92: CMakeFiles/vertically_implicit_solver_at_predictor_step.dir/capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/dycore_711_res_tmps_rcu/.gt4py_cache/vertically_implicit_solver_at_predictor_step_c2100e7df02fe12899d3acf60eb0b974edf9029c01b12b0414e215257065bc8c/src/cuda/hip/vertically_implicit_solver_at_predictor_step_cuda.cpp.o] Error 1 E gmake[1]: *** [CMakeFiles/Makefile2:90: CMakeFiles/vertically_implicit_solver_at_predictor_step.dir/all] Error 2 E gmake: *** [Makefile:91: all] Error 2 .venv/lib/python3.12/site-packages/dace/codegen/compiler.py:254: CompilationError ```

This PR refactors how calling an SDFG works. [PR#1467](github.com/spcl/pull/1467) introduced the `fast_call()` API, which allowed to call a compiled SDFG and skipping some tests. This was done to support the use case of calling the same SDFG with the same (as in pointers) multiple times. However, the PR did not introduced a simple way to generate the argument vector that had to be passed to `fast_call()` without relying on internal implementation details of the class. This PR, beside other things, introduces this use case and give access to the all steps needed to call an SDFG: - `construct_arguments()`: It accepts Python arguments, such as `int` or NumPy arrays and turns them into an argument vector in the right order and converted to the required C type. - `fast_call()`: Performs the actual call using the _passed_ argument vectors, if needed it will also run initialization. Note that this function is not new, but was slightly modified and no longer handles the return values, see below. - `convert_return_values()`: This function performs the actual `return` operation, i.e. composes the specified return type, i.e. either a single array or a tuple. Note, before this function was called by `fast_call()` but it was moved outside to reduce the hot path, because usually return values are passed using inout or out arguments directly. Beside these changes the PR also modifies the following things: - It was possible to pass return values, i.e. `__return`, as ordinary arguments, this is still supported, but now a warning is returned. - `CompiledSDFG` was technically able to handle scalar return values, however, due to technical limitation this is [not possible](spcl#1609), thus the feature was removed. However, the feature was not dropped completely and is still used to handle `pyobjects` since they are passed as pointer, see below. - The handling of `pyobject` return values was modified. Before it was not possible to use `pyobject` instances as return values that were manged _outside_, i.e. not allocated by, `CompiledSDFG`, now they are "handled". It is important that an _array_, i.e. multiple instances, of `pyobject`s are handled as a single object (this is the correct behaviour and retained for bug compatibility with the unit tests), however, a warning is generated. - It was possible to pass an argument as named argument and as positional argument, this is now forbidden. - `safe_call()` is not possible to handle return values, if the method is called on such an SDFG an error is generated. - Before it was not possible to return a `tuple` with a single argument, in that case the value was always directly returned, this has been fixed and is correctly handled. - The allocation of return values was inconsistent. If there was no change in size, then `__call__()` would always return the _same_ arrays, which might lead to very sudden bugs. The new behaviour is to always allocate new memory, this is done by `construct_arguments()`. - Shared return values. - Before `CompiledSDFG` had a member `_lastargs` which "cached" the last pointer arguments that were used to call the SDFG. It was updated by `_construct_args()` (old version of `construct_arguments()`), which did not make much sense. The original intention was to remove it, but this proved to be harder and it is thus maintained. However, it is now updated by `__call__()` and `initialize()` to support the use case for `{get, set}_workspace_size()`. --------- Co-authored-by: Philipp Schaad <schaad.phil@gmail.com> Co-authored-by: Tal Ben-Nun <tbennun@gmail.com>

commit 7e05c76 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Wed Jan 7 10:08:42 2026 +0100 Rename of the Index. commit 32224bb Author: Philip Mueller <philip.mueller@cscs.ch> Date: Thu Dec 18 09:53:57 2025 +0100 Updated the workflow file. commit ecb2785 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Wed Dec 17 08:19:40 2025 +0100 Updated the dace updater workflow file. commit f3198ef Author: Philip Mueller <philip.mueller@cscs.ch> Date: Wed Dec 17 07:41:26 2025 +0100 Made the update point to the correct repo. commit 96f963a Merge: 8b7cce5 387f1e8 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Wed Dec 17 07:37:48 2025 +0100 Merge remote-tracking branch 'spcl/main' into automatic_gt4py_deployment commit 8b7cce5 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Mon Dec 1 09:18:22 2025 +0100 Restored the original workflow files. commit 362ab70 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Mon Dec 1 07:41:40 2025 +0100 Now it has run once, so let's make it less runnable. commit 81b8cfa Author: Philip Mueller <philip.mueller@cscs.ch> Date: Mon Dec 1 07:39:09 2025 +0100 Made it run always. commit 6d71466 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Mon Dec 1 07:38:11 2025 +0100 Small update. commit eb31e6c Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Nov 21 15:23:33 2025 +0100 Empty commit in the branch containing the workflow file. commit 2970a75 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Nov 21 15:21:09 2025 +0100 Next step. commit f5d3d9d Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Nov 21 15:17:56 2025 +0100 Let's disable everything. commit 211e415 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Nov 21 15:10:43 2025 +0100 Disabled the kickstarter. commit d012c26 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Nov 21 15:05:38 2025 +0100 Updated everything.

commit 7c8e1d3 Merge: 5b068e7 a27e416 Author: Afif <37773945+affifboudaoud@users.noreply.github.com> Date: Thu Jan 15 14:40:56 2026 +0100 Merge branch 'main' into fix_map_fusion_vertical_dfs commit 5b068e7 Author: Affifboudaoud <hk_boudaoud@esi.dz> Date: Sun Nov 23 22:50:46 2025 +0100 Add visited set to avoid visiting same node multiple times

philip-paul-mueller marked this pull request as draft April 30, 2025 10:04

This was referenced Apr 30, 2025

Do Not Merge: Integration Branch for GT4Py philip-paul-mueller/dace#4

Closed

build[dace][next]: Changed DaCe Source GridTools/gt4py#2012

Merged

philip-paul-mueller force-pushed the gt4py-next-integration branch from 7fcf8f9 to 8b9b674 Compare May 1, 2025 10:14

philip-paul-mueller mentioned this pull request May 7, 2025

Do Not Merge: Changes to DaCe CUDA compilation flags #2

Closed

philip-paul-mueller force-pushed the gt4py-next-integration branch from 8b9b674 to 268fc18 Compare May 7, 2025 12:29

philip-paul-mueller force-pushed the gt4py-next-integration branch 3 times, most recently from 964e84b to 2d85437 Compare May 26, 2025 05:22

philip-paul-mueller force-pushed the gt4py-next-integration branch from 2d85437 to 9f72250 Compare June 5, 2025 07:43

philip-paul-mueller mentioned this pull request Jun 5, 2025

build[dace][next]: Updated DaCe GridTools/gt4py#2064

Merged

philip-paul-mueller force-pushed the gt4py-next-integration branch 2 times, most recently from 88c99f4 to d779cd1 Compare June 10, 2025 11:50

edopao force-pushed the gt4py-next-integration branch from d779cd1 to 4f40029 Compare June 12, 2025 12:46

philip-paul-mueller force-pushed the gt4py-next-integration branch from 4f40029 to 09dfda3 Compare June 13, 2025 07:18

philip-paul-mueller force-pushed the gt4py-next-integration branch from 09dfda3 to 87c77ef Compare June 24, 2025 10:49

edopao force-pushed the gt4py-next-integration branch from 87c77ef to c2a4e42 Compare June 27, 2025 14:08

philip-paul-mueller force-pushed the gt4py-next-integration branch from c2a4e42 to 0deba99 Compare July 2, 2025 14:24

philip-paul-mueller mentioned this pull request Jul 11, 2025

feat[cartesian]: DaCe bridge refactor: OIR -> TreeIR -> ScheduleTree -> SDFG GridTools/gt4py#2067

Merged

2 tasks

edopao force-pushed the gt4py-next-integration branch from 178037a to 9114985 Compare July 14, 2025 08:42

philip-paul-mueller changed the title ~~Do Not Merge: Integration Branch for GT4Py~~ Do Not Merge: Integration Branch for GT4Py Next Jul 15, 2025

philip-paul-mueller force-pushed the gt4py-next-integration branch 4 times, most recently from 33b63a1 to 2417e09 Compare July 21, 2025 07:42

philip-paul-mueller force-pushed the gt4py-next-integration branch 4 times, most recently from 3472895 to bed3b0e Compare July 24, 2025 07:24

affifboudaoud and others added 5 commits December 8, 2025 06:12

Increase timeout for ML tests (spcl#2243)

0efa622

Increased pytest timeout from 300 to 600 seconds.

Support dace.map syntax for struct fields (spcl#2187)

cc59d77

Modular Code Generation Docs: Add LowerConsume and remove numbering (s…

312f37f

…pcl#2246) Updated the documentation for proposed pass decomposition, including changes to pass names and descriptions for clarity.

edopao force-pushed the gt4py-next-integration branch from 3eec6f6 to ab9eaef Compare December 11, 2025 13:27

Update C++ standard to C++20 (spcl#2253)

387f1e8

philip-paul-mueller force-pushed the gt4py-next-integration branch 2 times, most recently from e552808 to d4db8e7 Compare December 17, 2025 07:26

edopao force-pushed the gt4py-next-integration branch from d4db8e7 to cd52c4b Compare December 18, 2025 08:33

zero9178 and others added 5 commits December 22, 2025 10:13

edopao force-pushed the gt4py-next-integration branch 2 times, most recently from 0362ff3 to 54b3349 Compare January 12, 2026 22:40

edopao and others added 12 commits January 13, 2026 14:25

Add HIP-codegen for stream-order memory allocation (spcl#2270)

ee38c13

Although the GPU codegen is being reworked, this small change would be useful for the time being. It a an extension to the HIP codegen of the current solution for stream-order memory allocation. It has been tested in GT4Py.

Add #pragma unroll <X> to LoopRegion generated code (spcl#2269)

a27e416

Allow the compiler to unroll itself the `LoopRegion`s instead of just having a manual unroll feature with the `LoopUnroll` pass

Remove the deprecated NestedSDFG.schedule property (spcl#2272)

e36c0a7

Remove the deprecated GPU_Default schedule type (spcl#2273)

ec0bb2f

Recover original Python object in callbacks (spcl#2275)

846d791

This enables array-like objects to be used in callbacks and retain the original object.

Add dace::float32sr type to DaCe (spcl#2148)

c9576df

This new type enables DaCe users to perform calculations with stochastic rounding in single precision. This change is validated with unit tests. --------- Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com>

Set version file

139389d

edopao force-pushed the gt4py-next-integration branch from 513e824 to 139389d Compare January 21, 2026 09:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Do Not Merge: Integration Branch for GT4Py Next #1

Do Not Merge: Integration Branch for GT4Py Next #1

Uh oh!

philip-paul-mueller commented Apr 30, 2025 •

edited by edopao

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Do Not Merge: Integration Branch for GT4Py Next #1

Are you sure you want to change the base?

Do Not Merge: Integration Branch for GT4Py Next #1

Uh oh!

Conversation

philip-paul-mueller commented Apr 30, 2025 • edited by edopao Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

No Longer Needed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

philip-paul-mueller commented Apr 30, 2025 •

edited by edopao

Loading