Skip to content

[TLE]: add triton language extension (tle) support#399

Open
small-cat wants to merge 13 commits intotriton_v3.2.xfrom
feature/triton_ascend_tle
Open

[TLE]: add triton language extension (tle) support#399
small-cat wants to merge 13 commits intotriton_v3.2.xfrom
feature/triton_ascend_tle

Conversation

@small-cat
Copy link
Collaborator

This pull request introduces Triton Language Extension (TLE) support for Triton, enabling high-performance computing features such as on-chip memory management, pipeline compilation hints, and corresponding arithmetic operations. The extension is currently optimized for Ascend 910B devices mainly.

Key Operations in TLE

  • Memory & Tensor Operations

    • tle.dsa.alloc
    • tle.dsa.copy
    • tle.dsa.to_tensor
    • tle.dsa.to_buffer
    • tle.dsa.extract_slice
    • tle.dsa.insert_slice
    • tle.dsa.extract_element
    • tle.dsa.subview
  • Arithmetic Operations

    • tle.dsa.add
    • tle.dsa.sub
    • tle.dsa.mul
    • tle.dsa.div
    • tle.dsa.min
    • tle.dsa.max
  • Pipeline & Parallelism

    • tle.dsa.pipeline – for pipeline iterations
    • tle.dsa.parallel – express parallel execution
    • tle.dsa.hint – provide compilation hints for optimization

Examples

Example usage can be found under python/test/tle.

@CLAassistant
Copy link

CLAassistant commented Mar 9, 2026

CLA assistant check
All committers have signed the CLA.

* move tle.ascend to tle.dsa.ascend
* move tle_ir to third_party/tle/dsa
* reimplement alloc/to_tensor/to_buffer reference to buffer_ir in third_party/ascend
* reimplement tle.dsa.ascend scope with address_space in ascend
* [FIX] remove memory_space_cast in dsa_to_tensor because the op removes the memory space attribute and result in compiling errors
* [TESTING] add collect_single method in ascend/testing.py to preserve the original benchmark statistics
* decouple TleOps from TritonOps and mov to third_party/tle/dsa/dialect
* implement the TleOp conversion in third_party/tle/dsa rather than in flir directly, flir just call the conversion in its pass
* backend/ascend/spec/triton/compiler/code_generator.py still use tle.dsa in its visitor to visit python ast
* fix tle.dsa.hint for nested usage, see python/test/tle/test_tle_with_hints.py
* implement extract_tle in experimental/tle
…mpiler/code_generator.py and add sparse_flash_attn_tle.py
@small-cat small-cat force-pushed the feature/triton_ascend_tle branch 3 times, most recently from 08c1606 to b6fb294 Compare March 10, 2026 09:07
@zhzhcookie
Copy link
Collaborator

zhzhcookie commented Mar 10, 2026

Please add tle tests to .github/workflows/ascend-build-and-test.yml

@small-cat small-cat force-pushed the feature/triton_ascend_tle branch 2 times, most recently from 0cb44f1 to d8fb02d Compare March 11, 2026 08:46
@small-cat small-cat force-pushed the feature/triton_ascend_tle branch 3 times, most recently from 9f8793a to bcd2520 Compare March 12, 2026 08:10
@small-cat small-cat force-pushed the feature/triton_ascend_tle branch 4 times, most recently from 426b4e5 to 860a1f8 Compare March 12, 2026 10:02
@small-cat small-cat force-pushed the feature/triton_ascend_tle branch from 860a1f8 to 980243b Compare March 13, 2026 08:20
@small-cat small-cat force-pushed the feature/triton_ascend_tle branch from 980243b to bdd0e47 Compare March 13, 2026 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants