Skip to content

Benchmark/v5.0.0#6

Draft
chilianyi wants to merge 1098 commits into
RACE-org:Flaggems127+50from
flagos-ai:benchmark/v5.0.0
Draft

Benchmark/v5.0.0#6
chilianyi wants to merge 1098 commits into
RACE-org:Flaggems127+50from
flagos-ai:benchmark/v5.0.0

Conversation

@chilianyi

Copy link
Copy Markdown

PR Category

Type of Change

Description

Issue

Progress

  • Change is properly reviewed (1 reviewer required, 2 recommended).
  • Change is responded to an issue.
  • Change is fully covered by a UT.

Performance

dongjibin1996 and others added 30 commits February 16, 2026 00:32
Kicking this in for testing. The new step only works after it lives on the `master` branch.
Removed 'labeled' and 'unlabeled' types from pull request triggers.

Signed-off-by: Qiming Teng <tengqm@outlook.com>
**Summary**
- Fixes Kunlunxin `bmm_out` call signature to match `bmm_kernel`, avoiding duplicate constexpr binding errors (e.g., `TILE_M`).
- Removes unintended stride arguments that are not accepted by the kernel.

**Why**
- `bmm_out` was passing extra parameters, leading to a runtime `TypeError` during JIT binding in tests.
* add test_perf_reshape_and_cache

add benchmark for reshape_and_cache

* update core shapes
* add test_perf_per_token_group_quant_fp8

add benchmark for per_token_group_quant_fp8

* Update core_shapes.yaml

---------

Signed-off-by: kiddyjinjin <54064850+kiddyjinjin@users.noreply.github.com>
Co-authored-by: kiddyjinjin <54064850+kiddyjinjin@users.noreply.github.com>
* add test_perf_concat_and_cache_mla

add benchmark for concat_and_cache_mla

* fix code-format-check

* Update core_shapes.yaml

* fix code-style

---------

Signed-off-by: kiddyjinjin <54064850+kiddyjinjin@users.noreply.github.com>
Co-authored-by: kiddyjinjin <54064850+kiddyjinjin@users.noreply.github.com>
factnn and others added 22 commits March 24, 2026 21:37
* Add special_i0e operator implementation, tests and benchmark

- Migrated from experimental_ops to ops
- Added unit tests in tests/test_unary_pointwise_ops.py
- Added to forward_operations in benchmark/test_unary_pointwise_perf.py
- All 18 unit tests passed
- Speedup: 1.0-1.9x across dtypes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: add KernelGen source comment

* fix: codestyle fixes from pre-commit

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
… ops (#1912)

* Migrate _upsample_nearest_exact1d from experimental_ops to ops

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: correct isort ordering in ops/__init__.py

* chore: add KernelGen source comment

* feat: add logger.debug for tracing

* fix: add missing blank lines in test_special_ops.py

* fix: repair broken lift_fresh_copy benchmark

* fix: remove blank lines between decorators (E304) and add missing assert (F841)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Migrate logit_ from experimental to main ops

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use sigmoid-generated input in logit_ test for numerical stability

Match the experimental test pattern using torch.sigmoid(uniform(-4,4))
to generate inputs in (0,1) range. Add manual_seed for reproducibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add upcast=True for logit_ reference comparison

Without upcast, ref stays in float16 causing precision mismatch with
the Triton kernel which computes in float32.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: rename kernel from logit__kernel to logit_kernel for consistency

* feat: add logger.debug for tracing

* fix: add missing blank lines in test_unary_pointwise_ops.py

* fix: add missing assert in absolute test and fix formatting

* fix: add missing assert for rrelu_with_noise_backward

* fix: codestyle fixes from pre-commit

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: factnn <factnn@example.com>
* [kunlunxin] fix arange operation

* add pad
The CI job failure is not introduced in this PR.
Signed-off-by: Qiming Teng <tengqm@outlook.com>
@chilianyi chilianyi marked this pull request as draft July 3, 2026 05:25
@chilianyi chilianyi changed the title [WIP ]Benchmark/v5.0.0 Benchmark/v5.0.0 Jul 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.