Skip to content

Blackwell GPU compatibility #5

@sistar2020

Description

@sistar2020

On Blackwell GPU, I got the following messages:

$ bash inference_demo.sh
......
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
:
Traceback (most recent call last):
  File "/opt/modeling/molecule-generation/ODesign/src/utils/inference/infer_runner.py", line 201, in run
    pred_backbone_output, all_sequence_variants = self.predict(data)
  File "/opt/anaconda3/envs/odesign/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/modeling/molecule-generation/ODesign/src/utils/inference/infer_runner.py", line 114, in predict
    data = to_device(data, self.device)
  File "/opt/modeling/molecule-generation/ODesign/src/utils/model/torch_utils.py", line 81, in to_device
    obj[k] = to_device(v, device)
  File "/opt/modeling/molecule-generation/ODesign/src/utils/model/torch_utils.py", line 100, in to_device
    return attr.evolve(obj, **updates)
  File "/opt/anaconda3/envs/odesign/lib/python3.10/site-packages/attr/_make.py", line 634, in evolve
    return cls(**changes)
  File "<attrs generated methods src.api.data_interface.OFeatureData>", line 124, in __init__
    self.__attrs_post_init__()
  File "/opt/modeling/molecule-generation/ODesign/src/api/_base.py", line 108, in __attrs_post_init__
    convert_types(self)
  File "/opt/modeling/molecule-generation/ODesign/src/api/_base.py", line 102, in convert_types
    converted_value = converter(current_value)
  File "/opt/modeling/molecule-generation/ODesign/src/api/_base.py", line 17, in to_mask_type
    return tensor.bool()
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I wish there would be ODesign versions for cuda 13.0.
But required python packages such as torch-scatter should also support this cuda version, so I guess running ODesign on Blackwell may not be feasible this time.
Any workarounds?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions