Skip to content

Detailed Debugging Journey: Still hitting 'TypeError: PlainLayout must define __torch_dispatch__' after workarounds for UnpicklingError and AttributeError #13

@erozekdk

Description

@erozekdk

Hello ZenCtrl developers and community,

I've been extensively trying to get ZenCtrl app/gradio_app.py running on Windows 10/11 with an NVIDIA RTX 3050 Ti Laptop GPU (4GB VRAM), Python 3.10, and PyTorch 2.5.1 (cuda 12.1). I'm encountering a series of issues related to loading the quantized model sayakpaul/flux.1-schell-int8wo-improved.
Following up on the discussion here, particularly the _pickle.UnpicklingError due to weights_only=True and the subsequent AttributeError: Can't get attribute 'PlainAQTLayout', I've gone through a detailed debugging process.

Environment:
OS: Windows 10/11
GPU: NVIDIA RTX 3050 Ti Laptop GPU (4GB VRAM)
Conda Environment: Python 3.10 (located at C:\Users\Antonio\anaconda3\envs\zenctrl_env)
PyTorch: 2.5.1 (installed via conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia)
ZenCtrl: Cloned fresh from GitHub into D:\ZenCtrl.
requirements.txt: Installed as per the repository (which installs torchao==0.11.0 by default when no version is pinned).

Summary of Issues and Attempts:

  1. Initial Error (with original gradio_app.py and torchao==0.11.0):
    The application fails with:
    _pickle.UnpicklingError: Weights only load failed. ... WeightsUnpickler error: Unsupported global: GLOBAL torchao.dtypes.affine_quantized_tensor.PlainAQTLayout was not an allowed global by default.
    This occurs because accelerate calls torch.load with weights_only=True, and PlainAQTLayout is not a default safe global, nor is it directly importable from torchao 0.11.0 to be added via torch.serialization.add_safe_globals.

  2. Attempting umerkayvyro's Global torch.load Monkey Patch (from Issue torch.load fails due to weights_only=True in PyTorch >=2.6 when loading quantized model #11):
    This patch successfully forces weights_only=False and bypasses the initial _pickle.UnpicklingError.
    The patch applied in D:\ZenCtrl\app\gradio_app.py involved these core lines:
    original_torch_load = torch.load
    def patched_torch_load(*args, **kwargs):
    kwargs['weights_only'] = False
    return original_torch_load(*args, **kwargs)
    torch.load = patched_torch_load

  3. Error After Applying Global torch.load Patch:
    As Jandown also reported, after forcing weights_only=False, the next error is:
    AttributeError: Can't get attribute 'PlainAQTLayout' on <module 'torchao.dtypes.affine_quantized_tensor' from 'C:\Users\Antonio\anaconda3\envs\zenctrl_env\lib\site-packages\torchao\dtypes\affine_quantized_tensor.py'>
    This happens because the unpickler, now allowed to run more freely, cannot find the definition for PlainAQTLayout within the torchao.dtypes.affine_quantized_tensor module of torchao 0.11.0.

  4. Attempting to Fix AttributeError for PlainAQTLayout with an Alias:
    Based on the torchao 0.11.0 source, I tried aliasing the expected name to the existing PlainLayout.
    The relevant lines added to D:\ZenCtrl\app\gradio_app.py (in addition to the global torch.load patch) were:
    import torchao.dtypes.affine_quantized_tensor
    from torchao.dtypes.utils import PlainLayout as UtilsPlainLayout
    torchao.dtypes.affine_quantized_tensor.PlainAQTLayout = UtilsPlainLayout

  5. Error After Aliasing PlainAQTLayout (with global torch.load patch still active):
    This resolved the AttributeError for PlainAQTLayout, but a new one appeared:
    AttributeError: Can't get attribute 'PlainLayoutType' on <module 'torchao.dtypes.utils' from 'C:\Users\Antonio\anaconda3\envs\zenctrl_env\lib\site-packages\torchao\dtypes\utils.py'>
    This indicates the model pickle also expects a PlainLayoutType class/attribute within torchao.dtypes.utils.

  6. Attempting to Fix AttributeError for PlainLayoutType with a Second Alias:
    Again, assuming PlainLayout from torchao.dtypes.utils (v0.11.0) is the intended functional equivalent.
    The relevant lines added to D:\ZenCtrl\app\gradio_app.py (in addition to the global torch.load patch and first alias) were:
    import torchao.dtypes.utils

UtilsPlainLayout was already imported as: from torchao.dtypes.utils import PlainLayout as UtilsPlainLayout

torchao.dtypes.utils.PlainLayoutType = UtilsPlainLayout

  1. Final Error (with global torch.load patch + both aliases for PlainAQTLayout and PlainLayoutType, using torchao==0.11.0):
    After applying all the above patches, the _pickle.UnpicklingError and AttributeErrors for the names are gone. However, the loading process now fails with:
    TypeError: PlainLayout must define torch_dispatch
    This occurs deep in the torch.load -> _load -> unpickler.load() -> _rebuild_from_type_v2 -> _rebuild_wrapper_subclass call stack.

Conclusion from these steps:
It appears that the sayakpaul/flux.1-schell-int8wo-improved model was serialized with an older/different version of torchao where:
a. torchao.dtypes.affine_quantized_tensor.PlainAQTLayout was a defined class/type.
b. torchao.dtypes.utils.PlainLayoutType was a defined class/type.
c. These types were likely proper torch.Tensor subclasses or compatible types that correctly interacted with PyTorch's dispatch mechanism (e.g., by defining torch_dispatch or implementing the necessary tensor protocols).

In torchao==0.11.0 (and nearby versions like 0.9.0, 0.10.0 which were also tested and gave the initial PlainAQTLayout unpickling error), these exact names/definitions do not exist. While aliasing them to torchao.dtypes.utils.PlainLayout allows the unpickler to find names, the underlying PlainLayout class in torchao 0.11.0 is not a functional substitute, as it lacks the required torch_dispatch method, leading to the TypeError.
The version of transformers installed by requirements.txt also requires a relatively recent torchao (versions like 0.7.0, 0.8.0 cause an ImportError: cannot import name 'Int4WeightOnlyConfig').

This creates a difficult compatibility situation for users with GPUs that necessitate the quantized model.

Current State:
Unable to run ZenCtrl with the sayakpaul/flux.1-schell-int8wo-improved model on torchao==0.11.0 due to these deep unpickling/type compatibility issues, even when weights_only=False is forced and name aliasing is attempted.
Any guidance or an update to the model/requirements would be greatly appreciated. My 4GB VRAM GPU makes the quantized model a necessity.

Thank you for your work on ZenCtrl.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions