[VERSION] Init triton_v3.6.x branch from triton_v3.5.x#429
Open
zhzhcookie wants to merge 46 commits intotriton_v3.6.xfrom
Open
[VERSION] Init triton_v3.6.x branch from triton_v3.5.x#429zhzhcookie wants to merge 46 commits intotriton_v3.6.xfrom
zhzhcookie wants to merge 46 commits intotriton_v3.6.xfrom
Conversation
--------- Co-authored-by: zhengyang <zhengyang@baai.ac.cn>
* [feat] support tle extension (#188) * Support Hints: shared_memory * Add a new pass 'ProcessSharedMemoryHintPass' for shared_memory hint * [feat] : support tle extension 1. Complete TLE implementation with comprehensive testing 2. support tle.alloc tle.copy tle.local_load tle.local_store tle.pipeline 3. xcore support tle pipeline * [chore/refactor] add tle marker && move tle from language extra to experimental * [chore] translate Chinese comments to English --------- Co-authored-by: liuxin <liuxin01@hnu.edu.cn> * [CI] Add auto code-format * Fix auto commit * Apply code-format changes * Clean code & [CI] Add tle tests * Fix tle pybind * [HINTS] [TLE] Fix comments * [HINTS] Fix LoadOp::build * Apply code-format changes * [TEST] Add --only_unit_test to tutorials * [TEST] [CI/CD] Add --only_unit_test to tutorials and add to workflow * Apply code-format changes * [TEST] [CI/CD] Add --only_unit_test to tutorials and add to workflow * [TEST] [CI/CD] Add --only_unit_test to tutorials and add to workflow * Move tle mlir test suites to third_party/tle/test * Apply code-format changes * [CI] Move import sys * Apply code-format changes * Remove TLE test runner implementation * Apply code-format changes * [TLE] Add Copyright * Apply code-format changes --------- Co-authored-by: huanghaoXcore <huanghao@xcoresigma.com> Co-authored-by: liuxin <liuxin01@hnu.edu.cn> Co-authored-by: sunnycase <sunnycase@users.noreply.github.com> Co-authored-by: zhengyang <zhengyang@baai.ac.cn> Co-authored-by: zhzhcookie <zhzhcookie@users.noreply.github.com>
* [tle-lite] Add tle.load async lowering for GPU * Add wgmma optimize for async load * Add copyright * Apply code-format changes * Add python/tutorials/tle/01-sparse-mla.py * Apply code-format changes * Fix code format --------- Co-authored-by: sunnycase <sunnycase@users.noreply.github.com>
--------- Co-authored-by: flagtree-bot <flagtree_ai@163.com>
--------- Co-authored-by: flagtree-bot <flagtree_ai@163.com>
* Setup the framework of edsl Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn> * support scf.for loop Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn> * support softmax Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn> * support matmul Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn> * modified: ../triton/experimental/flagtree/edsl/runtime.py (#216) * add topk example skeleton Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn> * implement topk with edsl Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn> * replace while-loop-inside part with edsl Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn> * fix CMake Dependence bug (#247) * [FEAT]enable use wheel as external LLVM (#248) Resolves #211. * update setup to support llvm from wheel * allow attempts mul times in one python progress * delete redundant code * delete redundant code _ 2 * delete redundant code_3 * [BACKEND] Solve addresspace issue (#251) resolve #209 * Reverse order of dimensions for memory descriptor * Update typing.py * support llvm.ptr as edsl input type Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn> * move the examples to tutorials Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn> * put edsl into tle Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn> * remove unused extension flag Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn> * tag modifications as tle raw Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn> * Refactor: Move LLVM wheel logic to setup_helper.py * Apply code-format changes --------- Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn> Co-authored-by: Phoenixtree <sunwenjia04@163.com> Co-authored-by: starrryz <760668919@qq.com> Co-authored-by: flagtree-bot <flagtree_ai@163.com>
Co-authored-by: zhzhcookie <zhengyang@baai.ac.cn>
* WIP topk[skip ci] Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn> * modified: python/examples/flagtree/03-matrix-multiplication.py modified: python/src/flagtree_ir.cc * modified: python/examples/flagtree/03-matrix-multiplication.py * Update FlagTree IR and semantic implementation * Add arg_type_hints attribute and type conversion validation - Add arg_type_hints attribute to DSLRegionOp to store EDSL function type declarations - Add comprehensive comments explaining TT IR type-> EDSL func type -> LLVM type conversion - Add error reporting for unsupported types and mismatched address spaces - All tests (example 01-03) passing * Update comments in flagtree_ir.cc * Current implementation before replacing loop with edsl1 * Add error reporting with parameter names and strict type consistency checks - Add edsl_param_names attribute to store parameter names - Print parameter names in error messages for easier debugging - Add strict type consistency checks before type conversion - Rename arg_type_hints to edsl_param_types for clarity --------- Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn> Co-authored-by: SabYic <sunwenjia04@163.com> Co-authored-by: lxy <15910307812@163.com>
Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
--------- Co-authored-by: zhzhcookie <zhengyang@baai.ac.cn>
* [CI] Fix check docs only * [BUILD] Add PYTHONPATH when set_llvm_env * [DOC] Fix doc link * [BUILD] Fix plugin mode check
Updated the title to reflect the correct name of the extension.
* modified: python/triton/experimental/tle/raw/mlir/codegen.py modified: python/triton/experimental/tle/raw/mlir/utils.py Author: SabYic <sunwenjia04@163.com> Date: Mon Jan 26 12:42:41 2026 +0000 * Apply code-format changes * Fix string encoding in global_string method * modified: python/triton/experimental/tle/raw/mlir/utils.py * modified: python/triton/experimental/tle/raw/mlir/utils.py --------- Co-authored-by: flagtree-bot <flagtree_ai@163.com>
modified stride computation method for matrix mul
* initial design for vassert func * implement vassert func in edsl * Apply code-format changes * remove anno * add CI tests && restore work for fix issue#328 * Apply code-format changes * [CI] export LLVM_SYSPATH to env for test * utilize vassert in 5-topk && add test * Apply code-format changes --------- Co-authored-by: flagtree-bot <flagtree_ai@163.com> Co-authored-by: Jinjie Liu <jjliu@baai.ac.cn> Co-authored-by: zhengyang <zhengyang@baai.ac.cn>
* Implement Local Pointers Support in TLE - Introduced `tle.local_ptr()` and `tl.load/tl.store` for accessing shared/tensor memory through pointer tensors. - Added `LocalPointersOp` to the TLE dialect for managing local pointers. - Implemented conversion patterns for `LocalPointersOp` to LLVM. - Enhanced the TLE README to reflect new local operations and usage examples. - Created passes for assigning shared encodings to local pointers and inserting barriers between local pointer operations. - Updated the C++ bridge to expose new builder methods for local pointers in Triton. - Added tests and documentation for the new features in TLE. * Apply code-format changes * Fix tests * Apply code-format changes * Redesign tle.local_ptr * Apply code-format changes * Update sparse-mla * Fix build * Add topk & fft tutorials * Update CI test * Restore sparse mla * Apply code-format changes * Fix build * Fix build * Avoid bar.sync between atomics * Apply code-format changes * Update lib/Analysis/Alias.cpp Co-authored-by: Jinjie Liu <jjliu@baai.ac.cn> * Update lib/Dialect/TritonGPU/Transforms/Pipeliner/AssignLatencies.cpp Co-authored-by: Jinjie Liu <jjliu@baai.ac.cn> * Update third_party/tle/dialect/lib/Transforms/TleInsertLocalPointerBarriers.cpp Co-authored-by: Jinjie Liu <jjliu@baai.ac.cn> * Apply code-format changes * Update * Improve perf * Apply code-format changes * Fix _moe_realistic_shapes * Vectorize moe_align_block_size_vllm_stage2_kernel * More vectorize * Fix * Apply code-format changes --------- Co-authored-by: flagtree-bot <flagtree_ai@163.com> Co-authored-by: Jinjie Liu <jjliu@baai.ac.cn>
* modified: python/triton/experimental/tle/raw/mlir/codegen.py modified: python/triton/experimental/tle/raw/mlir/utils.py Author: SabYic <sunwenjia04@163.com> Date: Mon Jan 26 12:42:41 2026 +0000 * Apply code-format changes * Fix string encoding in global_string method * modified: python/triton/experimental/tle/raw/mlir/utils.py * modified: python/tutorials/03-matrix-multiplication.py modified: python/tutorials/tle/raw/03-matrix-multiplication.py modified: third_party/tle/dialect/lib/Conversion/TleToLLVM/ExtractOpToLLVM.cpp modified: third_party/tle/dialect/lib/Transforms/ConvertArgToMemDesc.cpp * Apply code-format changes * modified: python/tutorials/03-matrix-multiplication.py modified: python/tutorials/tle/raw/03-matrix-multiplication.py * Apply code-format changes * modified: python/tutorials/tle/raw/03-matrix-multiplication.py * modified: third_party/tle/dialect/lib/Conversion/TleToLLVM/ExtractOpToLLVM.cpp * modified: python/triton/experimental/tle/language/raw/core.py * modified: python/triton/experimental/tle/language/raw/core.py modified: third_party/tle/dialect/include/IR/TleOps.td modified: third_party/tle/triton_tle.cc modified: third_party/tle/triton_tle_raw.cc modified: third_party/tle/utils/lib/Protocol.cpp * modified: third_party/tle/triton_tle_raw.cc * modified: third_party/tle/triton_tle_raw.cc * modified: third_party/tle/dialect/lib/Transforms/ConvertArgToMemDesc.cpp modified: third_party/tle/triton_tle_raw.cc * modified: third_party/tle/dialect/lib/Transforms/ConvertArgToMemDesc.cpp * Apply code-format changes * modified: third_party/tle/dialect/lib/Transforms/ConvertArgToMemDesc.cpp modified: third_party/tle/triton_tle_raw.cc modified: third_party/tle/utils/lib/Protocol.cpp modified: third_party/tle/dialect/lib/Transforms/ConvertArgToMemDesc.cpp modified: third_party/tle/triton_tle_raw.cc modified: third_party/tle/utils/lib/Protocol.cpp * modified: python/tutorials/tle/raw/04-hello-world.py * Apply code-format changes * modified: third_party/tle/triton_tle_raw.cc * modified: third_party/tle/triton_tle_raw.cc modified: third_party/tle/utils/lib/Protocol.cpp * modified: third_party/tle/triton_tle_raw.cc * modified: third_party/tle/utils/lib/Protocol.cpp * modified: third_party/tle/dialect/lib/Transforms/ConvertArgToMemDesc.cpp * modified: third_party/tle/dialect/lib/Transforms/ConvertArgToMemDesc.cpp * modified: third_party/tle/dialect/lib/Transforms/ConvertArgToMemDesc.cpp * modified: third_party/tle/triton_tle_raw.cc * modified: third_party/tle/dialect/lib/Transforms/ConvertArgToMemDesc.cpp modified: third_party/tle/triton_tle_raw.cc modified: third_party/tle/utils/lib/Protocol.cpp * modified: third_party/tle/dialect/lib/Transforms/ConvertArgToMemDesc.cpp * modified: third_party/tle/triton_tle_raw.cc * modified: third_party/tle/dialect/lib/Transforms/ConvertArgToMemDesc.cpp * modified: third_party/tle/triton_tle_raw.cc * modified: third_party/tle/triton_tle_raw.cc * modified: third_party/tle/triton_tle_raw.cc * Apply code-format changes --------- Co-authored-by: flagtree-bot <flagtree_ai@163.com>
* modified: python/setup_tools/setup_helper.py modified: python/setup_tools/utils/__init__.py modified: python/setup_tools/utils/aipu.py modified: python/setup_tools/utils/tools.py modified: setup.py * Apply code-format changes --------- Co-authored-by: flagtree-bot <flagtree_ai@163.com> Co-authored-by: Jinjie Liu <jjliu@baai.ac.cn> Co-authored-by: Galaxy1458 <55453380+Galaxy1458@users.noreply.github.com>
Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
#362) * [BUILD] Print color when building * [BUILD] Add unpack .triton shell for offline-building
Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
* add hint_manager for cuda * Apply code-format changes * update hintmanager : remove backend&version config and so on * Apply code-format changes * add TODO, remove priority 3 * Apply code-format changes --------- Co-authored-by: flagtree-bot <flagtree_ai@163.com>
* [TLERaw] revert llvm::CallOp related codes Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn> * [TLERaw] enable topk tle in ci Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn> * remove flake8 comments (#338) Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn> --------- Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
…ml files in the workflow folder on triton_v3.5.x (#397) Co-authored-by: zhzhcookie <zhengyang@baai.ac.cn>
* [BACKEND] support enflame backend * Apply code-format changes * update according to reviewer's comments * fix build issue of tle dialect and add MAX JOB 4 * remove 64 test * Apply code-format changes * correct llvm file name * remove excutable attr according to pre commit * update unit test * remove fake .coveragerc * update readme for latest * exclude tle test binary * fix opt not installed issue * Apply code-format changes * fix pre-commit * Apply code-format changes * fix opt not installed issue * update readme with FlagTree instead of flagtree * fix build issue in yaml * fix build issue of uninstall triton * offline case * rollback warning as error change * remove redundant space --------- Co-authored-by: flagtree-bot <flagtree_ai@163.com>
* Add initial tle dist impl * Add tle docs * Implement submesh distributed barrier support in TLE - Added support for submesh distributed barriers in the TLE dialect, allowing for more granular synchronization across submeshes. - Introduced new attributes for the `DistributedBarrierOp` to handle group metadata, including `group_kind`, `group_rank`, `group_shape`, `group_axes`, and `group_mask`. - Enhanced the `remote` function to validate and handle buffered tensors with appropriate metadata. - Updated the `device_mesh` class to include launch shape and dimension names for better submesh handling. - Implemented verification logic for the new attributes in `DistributedBarrierOp`. - Added tests to ensure correct behavior of the new submesh barrier functionality and validation of remote buffered tensors. * Add AxisInfoExt and RemotePointerUtils for Triton dialect enhancements - Introduced AxisInfoExt.h and AxisInfoExt.cpp to extend axis information analysis with new visitors for LocalPointersOp and RemotePointersOp. - Implemented RemotePointerUtils.h and RemotePointerUtils.cpp to handle remote pointer information extraction and inference of vector sizes/layouts. - Added utility functions for managing remote pointer metadata and determining vectorization hints based on tensor properties. - Enhanced axis analysis capabilities to support new Triton operations, improving optimization opportunities in the Triton dialect. * Fix TLE local pointer encoding and retune fused launch params * tle: add grid distributed barrier + optimize moe atomic fused * remove tle docs from repo and ignore them * ignore tle docs paths * remove unintended triton_shared gitlink * revert unnecessary cluster wait fence insertion * refine markers and relocate tle alignment test * Apply code-format changes * restore core alignment test and split tle coverage * update tle moe tuning and backend/lowering adjustments * Apply code-format changes * Fix * [TLERaw] Revert `LLVM::CallOp`-Related Modification (#387) * [TLERaw] revert llvm::CallOp related codes Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn> * [TLERaw] enable topk tle in ci Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn> * remove flake8 comments (#338) Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn> --------- Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn> * Apply code-format changes * Update .gitignore to remove specific entries Remove unnecessary files from .gitignore * Clean up .gitignore by removing unused entries Removed entries for Test and Agent from .gitignore * Enhance TLE Support in TritonNVIDIAGPUToLLVM - Added conditional compilation for TLE support in LoadStoreOpToLLVM.cpp, TargetInfo.cpp, TargetInfo.h, and TritonGPUToLLVM.cpp. - Introduced new functions and modified existing ones to handle remote pointer information and shared memory access for TLE. - Updated the TLE documentation to include examples for scalar shared-memory lookups and clarified the usage of local pointers. - Ensured compatibility with both TLE and non-TLE builds by using preprocessor directives. * Apply code-format changes * Fix * Refactor TLE-related code in TritonNVIDIAGPUToLLVM - Removed unnecessary comments marking the beginning and end of TLE sections. - Consolidated TLE-specific code under preprocessor directives to improve readability. - Streamlined the inclusion of TLE headers and related logic in LoadStoreOpToLLVM, TargetInfo, and TritonGPUToLLVM files. - Ensured consistent handling of shared and cluster shared pointers across various operations. - Enhanced the clarity of cache policy handling in load and store operations. * Apply code-format changes --------- Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn> Co-authored-by: flagtree-bot <flagtree_ai@163.com> Co-authored-by: Jinjie Liu <jjliu@baai.ac.cn>
…#400) * insert syncthreads before tle_dslregion automatically Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn> * check whether copy exists before inserting nvvm.barrier0 Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn> --------- Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn> Co-authored-by: Phoenixtree <sunwenjia04@163.com>
zhzhcookie
commented
Mar 13, 2026
| #define TRITON_ATTR_DEFS | ||
|
|
||
| include "mlir/IR/EnumAttr.td" | ||
| include "mlir/IR/AttrTypeBase.td" |
zhzhcookie
commented
Mar 13, 2026
| } | ||
|
|
||
| } // namespace | ||
|
|
zhzhcookie
commented
Mar 13, 2026
| auto ptrElems = unpackLLElements(loc, llPtr, rewriter); | ||
| assert(ptrElems.size() == numElems); | ||
| const bool isSharedPtr = isSharedPointerValue(ptrElems); |
|
FlagTree seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.