Still unreleased, but good to track cuTile Python commits related to it: - `ct.exp(x, rounding_mode=...)`: expose RoundingMode.FULL/APPROX (https://github.com/NVIDIA/cutile-python/commit/b2d3f82) - `@ct.kernel(num_worker_warps=...)`: entry hint for warp-specialized kernels (https://github.com/NVIDIA/cutile-python/commit/bfb2960) - `ct.mma(..., use_fast_acc=True)`: fp8 MMA fast accumulator (https://github.com/NVIDIA/cutile-python/commit/0d172bf) - `ct.atomic_add` on bfloat16 (sm_90+) (https://github.com/NVIDIA/cutile-python/commit/e517b6d) - Tiled-view atomic ops: `tv.atomic_add`, `atomic_max`, `atomic_min`, `atomic_and`, `atomic_or`, `atomic_xor` (https://github.com/NVIDIA/cutile-python/commit/e517b6d) - `ct.tiled_view(..., traversal_steps=...)` + load/store via `StridedView` (https://github.com/NVIDIA/cutile-python/commit/85da1e3) - `ct.load_advanced_indexing` / `ct.store_advanced_indexing`: GatherScatterView via advanced indexing (https://github.com/NVIDIA/cutile-python/commit/c2360bd, renamed in https://github.com/NVIDIA/cutile-python/commit/d10a5da)
Still unreleased, but good to track cuTile Python commits related to it:
ct.exp(x, rounding_mode=...): expose RoundingMode.FULL/APPROX (NVIDIA/cutile-python@b2d3f82)@ct.kernel(num_worker_warps=...): entry hint for warp-specialized kernels (NVIDIA/cutile-python@bfb2960)ct.mma(..., use_fast_acc=True): fp8 MMA fast accumulator (NVIDIA/cutile-python@0d172bf)ct.atomic_addon bfloat16 (sm_90+) (NVIDIA/cutile-python@e517b6d)tv.atomic_add,atomic_max,atomic_min,atomic_and,atomic_or,atomic_xor(NVIDIA/cutile-python@e517b6d)ct.tiled_view(..., traversal_steps=...)+ load/store viaStridedView(NVIDIA/cutile-python@85da1e3)ct.load_advanced_indexing/ct.store_advanced_indexing: GatherScatterView via advanced indexing (NVIDIA/cutile-python@c2360bd, renamed in NVIDIA/cutile-python@d10a5da)