Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
aa34638
grouped_gemm.py: initial version
aryaman-gupta Mar 27, 2026
986c110
adds tests for grouped_gemm
aryaman-gupta Mar 27, 2026
8bd38a6
corrects test_grouped_gemm
aryaman-gupta Mar 31, 2026
405bfb6
test_grouped_gemm: add argparse CLI entry point
aryaman-gupta Mar 31, 2026
ef75618
test_grouped_gemm: use test_common verify_output and run_perftest
aryaman-gupta Mar 31, 2026
de7ee4a
test_grouped_gemm: add large_shape marks to expensive test cases
aryaman-gupta Mar 31, 2026
d888d28
test_grouped_gemm: add m_per_group sweep mode (--m_per_group 0)
aryaman-gupta Mar 31, 2026
7cebf2d
test_grouped_gemm: default to sweep mode (--m_per_group 0)
aryaman-gupta Mar 31, 2026
1cb5e45
test_grouped_gemm: match perf output format to blockscale test
aryaman-gupta Mar 31, 2026
d18edbc
test_grouped_gemm: wire argparse args through to test functions
aryaman-gupta Mar 31, 2026
d4794b8
test_grouped_gemm: fix device mismatch with set_default_device
aryaman-gupta Mar 31, 2026
0eafe32
grouped_gemm.py: fixes bug in buffer_load offsets
aryaman-gupta Mar 31, 2026
d278331
grouped_gemm.py: adds LDS ping-pong
aryaman-gupta Mar 31, 2026
84936b9
grouped_gemm.py: adds XOR swizzle to LDS
aryaman-gupta Apr 1, 2026
c9aed0a
grouped_gemm, test: implements preshuffle optimization for weights
aryaman-gupta Apr 1, 2026
a8e8397
group_gemm.py: implements prefetch optimization
aryaman-gupta Apr 2, 2026
d687bd3
adds masked group gemm kernels
aryaman-gupta Apr 2, 2026
fa6570b
test_grouped_gemm_blockscale_masked.py: corrects import
aryaman-gupta Apr 2, 2026
546b5c5
grouped_gemm_blockscale_masked.py: corrects group_index assignment
aryaman-gupta Apr 2, 2026
bc06150
Merge branch 'main' into aryaman/group-gemm
aryaman-gupta Apr 10, 2026
c59a4d2
Merge branch 'main' into aryaman/group-gemm-optimizations
aryaman-gupta Apr 10, 2026
3ce8d4e
Merge branch 'aryaman/group-gemm' into aryaman/group-gemm-optimizations
aryaman-gupta Apr 10, 2026
b9bcc29
grouped gemm masked: adds A0 prefetch optimization
aryaman-gupta Apr 10, 2026
53442a9
group gemm contiguous: renames files
aryaman-gupta Apr 10, 2026
7e82c87
group gemm kernels: adds cshuffle epilogue stores
aryaman-gupta Apr 13, 2026
25b449a
group gemm kernels: two-phase A load (prefetch + store separation)
aryaman-gupta Apr 13, 2026
abec5b6
group gemm kernels: adds sched_group_barrier instruction scheduling
aryaman-gupta Apr 13, 2026
52db079
group gemm kernels: adds waves_per_eu compile hint support
aryaman-gupta Apr 13, 2026
6a6f661
group gemm tests: improved correctness coverage and fixed masked f16 bug
aryaman-gupta Apr 13, 2026
806816c
group gemm kernels: use blockscale intrinsic for gfx950
aryaman-gupta Apr 17, 2026
e89141b
group gemm tests: fix DS-V3 test shape N=2112→2048 for scale_block_n …
aryaman-gupta Apr 17, 2026
2f8a43f
group gemm kernels: use blockscale intrinsic for gfx950
aryaman-gupta Apr 23, 2026
3b3cb08
group gemm kernels: fix E8M0 conversion to use ArithValue operators
aryaman-gupta Apr 23, 2026
68bfd1c
group gemm tests: align quantization with hardware E8M0 block scaling
aryaman-gupta Apr 23, 2026
1ce790b
group gemm kernels: broadcast wave-uniform scale_b via readfirstlane
aryaman-gupta Apr 23, 2026
78fe200
group gemm: pre-pack E8M0 scales as uint8 on host
aryaman-gupta Apr 23, 2026
964e1e3
group gemm: gate gfx942 SW scale loads behind not _is_gfx950
aryaman-gupta Apr 23, 2026
98c9d10
Merge branch 'main' into aryaman/group-gemm-optimizations
aryaman-gupta Apr 23, 2026
c9ae27b
group gemm: prefetch scales across K-tile boundaries
aryaman-gupta Apr 23, 2026
1e7bc57
group gemm tests: align with repo conventions
aryaman-gupta Apr 23, 2026
06d7697
group gemm: clean up dead code and stale comments before PR
aryaman-gupta Apr 23, 2026
7cb0795
group gemm: rename compile entry points to match file names
aryaman-gupta Apr 23, 2026
f58db56
group gemm tests: tidies up comments
aryaman-gupta Apr 24, 2026
0d35bfe
Merge branch 'main' into aryaman/group-gemm
aryaman-gupta Apr 24, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Loading