Benchmark/v5.0.0 by chilianyi · Pull Request #6 · RACE-org/FlagGems

chilianyi · 2026-07-03T05:24:54Z

PR Category

Type of Change

Description

Issue

Progress

Change is properly reviewed (1 reviewer required, 2 recommended).
Change is responded to an issue.
Change is fully covered by a UT.

Performance

and add test case for input_guard

Kicking this in for testing. The new step only works after it lives on the `master` branch.

Removed 'labeled' and 'unlabeled' types from pull request triggers. Signed-off-by: Qiming Teng <tengqm@outlook.com>

This reverts commit c3b6d5f.

**Summary** - Fixes Kunlunxin `bmm_out` call signature to match `bmm_kernel`, avoiding duplicate constexpr binding errors (e.g., `TILE_M`). - Removes unintended stride arguments that are not accepted by the kernel. **Why** - `bmm_out` was passing extra parameters, leading to a runtime `TypeError` during JIT binding in tests.

* add test_perf_reshape_and_cache add benchmark for reshape_and_cache * update core shapes

* add test_perf_per_token_group_quant_fp8 add benchmark for per_token_group_quant_fp8 * Update core_shapes.yaml --------- Signed-off-by: kiddyjinjin <54064850+kiddyjinjin@users.noreply.github.com> Co-authored-by: kiddyjinjin <54064850+kiddyjinjin@users.noreply.github.com>

* add test_perf_concat_and_cache_mla add benchmark for concat_and_cache_mla * fix code-format-check * Update core_shapes.yaml * fix code-style --------- Signed-off-by: kiddyjinjin <54064850+kiddyjinjin@users.noreply.github.com> Co-authored-by: kiddyjinjin <54064850+kiddyjinjin@users.noreply.github.com>

This reverts commit 97c7e18.

… of (#1673) tx81 hw

* Add special_i0e operator implementation, tests and benchmark - Migrated from experimental_ops to ops - Added unit tests in tests/test_unary_pointwise_ops.py - Added to forward_operations in benchmark/test_unary_pointwise_perf.py - All 18 unit tests passed - Speedup: 1.0-1.9x across dtypes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: add KernelGen source comment * fix: codestyle fixes from pre-commit --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

… ops (#1912) * Migrate _upsample_nearest_exact1d from experimental_ops to ops Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct isort ordering in ops/__init__.py * chore: add KernelGen source comment * feat: add logger.debug for tracing * fix: add missing blank lines in test_special_ops.py * fix: repair broken lift_fresh_copy benchmark * fix: remove blank lines between decorators (E304) and add missing assert (F841) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Migrate logit_ from experimental to main ops Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use sigmoid-generated input in logit_ test for numerical stability Match the experimental test pattern using torch.sigmoid(uniform(-4,4)) to generate inputs in (0,1) range. Add manual_seed for reproducibility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add upcast=True for logit_ reference comparison Without upcast, ref stays in float16 causing precision mismatch with the Triton kernel which computes in float32. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: rename kernel from logit__kernel to logit_kernel for consistency * feat: add logger.debug for tracing * fix: add missing blank lines in test_unary_pointwise_ops.py * fix: add missing assert in absolute test and fix formatting * fix: add missing assert for rrelu_with_noise_backward * fix: codestyle fixes from pre-commit --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: factnn <factnn@example.com>

* [kunlunxin] fix arange operation * add pad

The CI job failure is not introduced in this PR.

Signed-off-by: Qiming Teng <tengqm@outlook.com>

dongjibin1996 and others added 30 commits February 16, 2026 00:32

[KUNLUNXIN] enable vdot benchmark test (#1604)

f626e65

Extract PR ID determination logic from individual workflows (#1628)

52d36e5

Merge unit tests for operators, examples, and utils (#1630)

cc574f8

Remove redundant settings for coverage tool (#1631)

ebca0a7

[Iluvatar] use torch_moe_align_block_size as reference. (#1626)

a85ce5f

[TSINGMICRO] add custom op for tsingmicro backend

7cfdf6d

[TSINGMICRO] fix hard code cuda in flag_gems.fused.FLA.utils.input_guard

99594ee

and add test case for input_guard

Auto label a PR regarding its size (#1654)

67e21a5

Kicking this in for testing. The new step only works after it lives on the `master` branch.

Bump wait-for-workflow version and revise size labels (#1665)

f266fff

Make the labeler sync the labels (#1666)

9e3117b

Update preprocess step to refetch PR labels (#1667)

4e5bcbe

Update pull request types in triage workflow

b1ce680

Removed 'labeled' and 'unlabeled' types from pull request triggers. Signed-off-by: Qiming Teng <tengqm@outlook.com>

[kunlunxin] fusion rmsnorm bwd (#1669)

0a34c73

Revert "[KUNLUNXIN] use the default bmm operator (#1593)" (#1680)

015d315

This reverts commit c3b6d5f.

Fix a nit in error message (#1670)

c51929c

[KUNLUNXIN] fix max_pool2d and avg_pool2d (#1602)

678ae41

[BACKEND] Fix triton_extra_name on ascend (#1684)

8715200

[KUNLUNXIN] Remove unnecessary comments (#1685)

68b0a06

[kunlunxin] fix fused add rms norm (#1688)

76f69ae

【KMPL】add test_perf_reshape_and_cache (#1585)

74650ac

* add test_perf_reshape_and_cache add benchmark for reshape_and_cache * update core shapes

[kunlunxin] remove skip (#1689)

a13b204

[FlagTree]: update the version of flagtree for all backends.

e1625a6

Revert "[BACKEND] Fix triton_extra_name on ascend"

5fd2dc9

This reverts commit 97c7e18.

[Fix] mean kernel grid Y overflow for large K dimensions (#1692)

9bdf551

improve corner case for topk (#1578)

6f2585d

[TSINGMICRO] skip mm and addmm with float32 input since low precision…

c7b3e98

… of (#1673) tx81 hw

Unify test script namems and the command sequence (#1792)

844254f

factnn and others added 22 commits March 24, 2026 21:37

Update operator inventory (20260325) (#2079)

4fbc709

change version to 5.0.0 (#2089)

405b2dd

[kunlunxin] fix arange operation (#2083)

8ddc7f7

* [kunlunxin] fix arange operation * add pad

Sort operator lists where vendors have specialized version (#2086)

a21ba05

The CI job failure is not introduced in this PR.

add select_backward (#2080)

268dfe6

Replace docs with hugo-docs (#2091)

991a6ea

Update operator data for 5.0 release (#2092)

f41f592

Update workflow for docs preparation (#2093)

89fdfd4

Update configuration for hugo docs (#2097)

4546c30

Mask out mkdocs workflow (#2098)

29be2a1

Update changelog for 5.0 release (#2095)

76d59ae

init (#2099)

dc997a8

Update project readme (#2100)

d1e5093

Fix a nit in link (#2101)

4e08b44

Signed-off-by: Qiming Teng <tengqm@outlook.com>

Align benchmark suite and add no-torch export

d162160

Use v500 benchmark result filenames

a447891

Ensure local package import in tests

d20f139

Restore llama benchmark shapes

57f10db

add some ops & tests & benchmark (#4501)

682e670

chilianyi marked this pull request as draft July 3, 2026 05:25

chilianyi changed the title ~~[WIP ]Benchmark/v5.0.0~~ Benchmark/v5.0.0 Jul 3, 2026

huangyiqun and others added 6 commits July 3, 2026 15:38

Benchmark/v5.0.0 (#4507)

b54662d

fix: update benchmark pytest marks and fix mm.py hasattr check (#4516)

018ca52

Benchmark/v5.0.0 (#4519)

259c6e8

update tests & benchmark; add w8a8_block_fp8_matmul benchmark (#4520)

1b28499

Benchmark/v5.0.0 (#4521)

57ae569

update tests (#4524)

e4341a5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmark/v5.0.0#6

Benchmark/v5.0.0#6
chilianyi wants to merge 1098 commits into
RACE-org:Flaggems127+50from
flagos-ai:benchmark/v5.0.0

chilianyi commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Uh oh!

Conversation

chilianyi commented Jul 3, 2026

PR Category

Type of Change

Description

Issue

Progress

Performance

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants