Skip to content

About default order implementation and performance #22

@JeffLee1874

Description

@JeffLee1874

Thanks for sharing your work, it helps me a lot.
But I have some confusions about the default order performance and implementation, the code here to deal with default order are seems to be mismatch with the original GPTQ code. When default order and groupsize are applied, the original GPTQ will re-compute the scale and zeros in calibration steps using the following code:

if groupsize != -1:
    if not static_groups:
        if (i1 + i) % groupsize == 0:
            self.quantizer.find_params(W[:, (i1 + i):(i1 + i + groupsize)], weight=True)
    else:
        idx = i1 + i
        if actorder:
            idx = perm[idx]
        self.quantizer = groups[idx // groupsize]

But this is removed in this repository, so the scale is fixed before all calibration steps, making the scale of the latter quantization group/block sub-optimal. I wonder why this is removed, because this seems not to bring overhead in inference? If I have misunderstand something, please point it out, I will be very appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions