Unify Gemm by eugenebokhan · Pull Request #411 · trymirai/uzu

eugenebokhan · 2026-05-14T14:09:40Z

No description provided.

leftover references from the main merge — unified gemm and matmul kernel signatures, plus the size() deref tweak in dense_buffer.rs.

datatype now derives serialize/deserialize directly with per-variant #[serde(rename)] ("bfloat16", "float16", "int8", ...), so the wire format matches what configdatatype produced. configdatatype and config/common.rs are deleted; every caller now uses crate::DataType. quantizationmode variants renamed UINT4/INT8/UINT8 -> U4/I8/U8 to match the datatype convention, with #[serde(rename)] preserving the json wire format. Includes the matching kernel-side updates in quant_embedding.metal and the regenerated quantization.h header.

the metal codegen emits the full path crate::backends::common::gpu_types::quantization::QuantizationMode for the SPECIALIZE arg, and the build script's cross-backend signature equality check requires both backends to agree on the type string.

gpu_types: - promote QuantizedFormat (MLX | AWQ) into gpu_types/, replacing the kernel- layer QuantizedMatmulType { Mlx, ZeroPoint }; loader and matmul callers speak the same type now - promote GemmTilingConfig into gpu_types/unified_gemm/; collapse the three separate Threadgroup/Simdgroup/Fragment tile structs into the one 11-u32-field aggregate that already lived kernel-side - drop the BitsPerWeight enum and its companion bits_per_weight.h header; bit width is derived on demand from QuantizationMode via DataType::size_in_bits kernel layer: - flatten WeightsStorageFormat to FullPrecision | Quantized { format, mode, group_size }; collapse the quantized_storage/ subdirectory into a single weights_storage_format.rs at gemm/ - gemm.metal kernel signature: rename a/b/d -> activations/weights/result; drop the separate full-precision-only b buffer (weights is always present and reinterpreted in body); add scales/biases/zero_points OPTIONAL slots gated on use_mlx_quant / use_zero_points bool SPECIALIZEs; collapse the three tile constant-buffer args into one GemmTilingConfig - introduce GemmWeightsBuffers enum (FullPrecision | Mlx | Awq) so the Rust-side encode() takes one typed bundle instead of four loose buffers - inline GemmTile::validate into UnifiedGemmSpecialization::validate

# Conflicts: # Cargo.lock # crates/backend-uzu/build/cpu/compiler.rs # crates/backend-uzu/build/metal/bindgen.rs # crates/backend-uzu/build/metal/compiler.rs # crates/backend-uzu/build/metal/mod.rs # crates/backend-uzu/build/metal/wrapper.rs # crates/backend-uzu/src/backends/common/gpu_types/mod.rs # crates/backend-uzu/src/backends/common/kernel/quant_matmul.rs # crates/backend-uzu/src/backends/metal/metal_extensions/function_constant_values_extensions_set_value.rs # crates/backend-uzu/src/encodable_block/embedding.rs # crates/backend-uzu/src/encodable_block/linear/quantized.rs # crates/backend-uzu/tests/unit/backends/common/kernel/quant_matmul_test.rs

# Conflicts: # crates/backend-uzu/src/backends/common/kernel/quant_matmul.rs # crates/backend-uzu/src/encodable_block/embedding.rs # crates/backend-uzu/src/encodable_block/linear/quantized.rs # crates/backend-uzu/tests/unit/backends/common/kernel/quant_matmul_test.rs # crates/backend-uzu/tests/unit/kernel/quant_matmul/qmm_transposed_test.rs # crates/backend-uzu/tests/unit/kernel/quant_matmul/qmv_fast_test.rs # crates/backend-uzu/tests/unit/kernel/quant_matmul/qmv_test.rs

# Conflicts: # crates/backend-uzu/src/backends/metal/kernel/generated/quantization_method.h # crates/backend-uzu/tests/unit/kernel/quant_matmul/qmm_transposed_test.rs # crates/backend-uzu/tests/unit/kernel/quant_matmul/qmv_fast_test.rs

# Conflicts: # crates/backend-uzu/src/backends/metal/kernel/matmul/gemm.metal # crates/backend-uzu/src/backends/metal/kernel/matmul/gemm.rs # crates/backend-uzu/src/backends/metal/kernel/matmul/mod.rs # crates/backend-uzu/tests/performance/matmul/bench.rs # crates/backend-uzu/tests/unit/kernel/matmul/gemm_mpp_test.rs # crates/backend-uzu/tests/unit/kernel/matmul/gemm_test.rs

eugenebokhan added 30 commits April 30, 2026 19:11

extend thread context

be12a1d

introduce unified mamtul kernel

d79256c

add types

cf6621a

remove signed from quantized storage

b1d2994

wip

92ddb15

specialization

385395b

wip

171bead

unified bits per weight

b551723

Merge branch 'main' into unify-matmul

bb8b488

metal: finish buffer to dense_buffer migration

34d10cc

leftover references from the main merge — unified gemm and matmul kernel signatures, plus the size() deref tweak in dense_buffer.rs.

kernel: pass typed quantization_mode through quantized embedding

bbd9eca

codegen: type-aware optional expression rewriter

92c971f

gemm: drop use_mlx_quant/use_zero_points specialize bools

e477056

codegen: type-aware optional expression rewriter

9eae976

migrate to quantization method enum

f968854

fmt

728c64e

introduce rewriter

b3cb3ea

decouple bindgen

a6002ec

fmt

e75abc1

satisfy ci

063ff4b

fix after merge

58e2de9

migrate to unified dispatch

2111894

remove error omitting

fbadfcc

Merge branch 'main' into type-aware-optional-expression-rewriter

5da6acc

Merge branch 'type-aware-optional-expression-rewriter' into unify-matmul

eb8c23c

bind full precision computations

0373188

eugenebokhan added 30 commits May 12, 2026 11:29

fix merge resolution

770bf9a

fmt

93dca87

move metal only files to metal mod

2f2052d

put back morton

f8cd3a4

gate bf16 and f16 for now

d19dd8c

merge main

a59423a

merge 'type-aware-optional-expression-rewriter'

579d9a1

migrate to defines

4f46674

fmt

f9556b9

remove debug-display

0b8909c

fix comments

0d21556

fmt

ffea4bc

merge type-aware-optional-expression-rewriter

d49a76d

fix build after merg

2d80581

linear weights

ff30a01

clean up gemm tests

ab36194

perf parity

c8e7d92

move types to common

09fbd51

Merge branch 'main' into unify-matmul

809d473

# Conflicts: # crates/backend-uzu/src/backends/metal/kernel/generated/quantization_method.h # crates/backend-uzu/tests/unit/kernel/quant_matmul/qmm_transposed_test.rs # crates/backend-uzu/tests/unit/kernel/quant_matmul/qmv_fast_test.rs

cleanup

c766825

fmt

9433c80

remove gemm and gemm mpp

3e7736c

rename

78c4e31

more renames

6d4ad58

fmt

888e003

fix thread context and reuse block id

42e21b5

update doc

31a121e

remove dead code

3228915

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify Gemm#411

Unify Gemm#411
eugenebokhan wants to merge 75 commits into
mainfrom
unify-matmul

eugenebokhan commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eugenebokhan commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant