-
Notifications
You must be signed in to change notification settings - Fork 17
Open
Description
Thanks for sharing your work, it helps me a lot.
But I have some confusions about the default order performance and implementation, the code here to deal with default order are seems to be mismatch with the original GPTQ code. When default order and groupsize are applied, the original GPTQ will re-compute the scale and zeros in calibration steps using the following code:
if groupsize != -1:
if not static_groups:
if (i1 + i) % groupsize == 0:
self.quantizer.find_params(W[:, (i1 + i):(i1 + i + groupsize)], weight=True)
else:
idx = i1 + i
if actorder:
idx = perm[idx]
self.quantizer = groups[idx // groupsize]
But this is removed in this repository, so the scale is fixed before all calibration steps, making the scale of the latter quantization group/block sub-optimal. I wonder why this is removed, because this seems not to bring overhead in inference? If I have misunderstand something, please point it out, I will be very appreciated!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels