Conversation
|
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6573 +/- ##
==========================================
- Coverage 93.18% 93.17% -0.02%
==========================================
Files 832 829 -3
Lines 266714 265989 -725
==========================================
- Hits 248545 247839 -706
+ Misses 18169 18150 -19 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR extends the Vulkan GEMM implementation to support runtime packing (elempack 1/4) and to optionally write packed outputs directly from the shaders, reducing the need for explicit pre/post packing conversions in gemm_vulkan.cpp.
Changes:
- Add
out_elempack,A_elempack, andB_elempackpush constants and introduce pack-aware load/store paths in GEMM shaders. - Update
gemm_vulkan.cppto stop forcing A/B to pack1, compute logical M/N/K using elempack, and allocatetop_blobwith the chosenout_elempack. - Add additional descriptor bindings to provide alternate scalar/vec4 views for packed I/O.
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/layer/vulkan/shader/gemm_sg.comp | Adds elempack-aware loads and pack4 output stores for the subgroup GEMM shader. |
| src/layer/vulkan/shader/gemm_cm.comp | Adds elempack-aware load paths and pack4 output stores for the cooperative-matrix GEMM shader. |
| src/layer/vulkan/shader/gemm.comp | Adds new bindings/constants and implements pack4 output write paths for the non-subgroup GEMM shader. |
| src/layer/vulkan/gemm_vulkan.cpp | Removes forced input packing, computes M/N/K with elempack, allocates packed output, and updates bindings/constants passed to shaders. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
我不满意! |
900aa37 to
dd45db7
Compare
511dea3 to
e68d933
Compare
|
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
z-image-turbo 1024 end2end (s)