Skip to content

feat: align quant and fused kernels with Triton in FlyDSL#421

Open
cschenjunlin wants to merge 5 commits intomainfrom
quant_cjl
Open

feat: align quant and fused kernels with Triton in FlyDSL#421
cschenjunlin wants to merge 5 commits intomainfrom
quant_cjl

Conversation

@cschenjunlin
Copy link
Copy Markdown

@cschenjunlin cschenjunlin commented Apr 21, 2026

Motivation

Align quant and fused kernels with Triton in FlyDSL

Technical Details

Test Plan

Test Result

====================================================================================================
Perf Compare (gpu us): FlyDSL vs AIter
====================================================================================================
op         shape              dtype  FlyDSL(gpu us)  AIter(gpu us)    speedup
rmsnorm_dq 64x256             f32              45.7           66.4      1.45x
rmsnorm_dq 128x1024           f32              43.0           65.2      1.52x
rmsnorm_dq 32x128             f16              43.4           64.1      1.48x
rmsnorm_dq 64x2000            f32              42.0           67.2      1.60x
rmsnorm_dq 16x512             bf16             42.8           63.6      1.49x
rmsnorm_dq 1024x8192          bf16             42.7           65.8      1.54x
rmsnorm_dq 32768x8192         bf16            398.5        1,000.6      2.51x
rmsnorm_sq 64x256             f32              46.5           68.7      1.48x
rmsnorm_sq 128x1024           f32              46.2           70.0      1.52x
rmsnorm_sq 32x128             f16              48.1           70.2      1.46x
rmsnorm_sq 64x2000            f32              47.0           69.8      1.49x
rmsnorm_sq 16x512             bf16             47.3           68.8      1.45x
rmsnorm_sq 1024x8192          bf16             46.6           70.0      1.50x
rmsnorm_sq 32768x8192         bf16            478.2        1,155.5      2.42x
====================================================================================================

Submission Checklist

rmsnorm

  • quant_rms_norm_kernel
  • fused_add_rmsnorm_kernel
  • quant_fused_add_rmsnorm_kernel
  • rmsnorm_kernel_large_m_small_n

layernorm

  • fused_add_layernorm_kernel
  • quant_layernorm_kernel
  • quant_fused_add_layernorm_kernel

@i-chaochen i-chaochen requested a review from coderfeli April 21, 2026 09:10
@coderfeli
Copy link
Copy Markdown
Collaborator

@cschenjunlin conflicts.

@cschenjunlin
Copy link
Copy Markdown
Author

@cschenjunlin conflicts.

The conflicts are caused by the importing of vector, as I have not completed all the replacement of the vector usage in the quant kernels. I will complete all the replacement in the next commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants