Errors running structure tokenizer

Hello thanks for all the work! My goal is to generate structure tokens for genbio-ai/AIDO.Protein-RAG-16B and I am trying to use the structure tokenizer genbio-ai/AIDO.StructureTokenizer. I followed the instructions on the huggingface page to use register_dataset.py followed by the mgen predict command. However I am getting this when running that:

`AttributeError: 'EquiformerEncoderLightning' object has no attribute 'check_data_compatibility'` 

Which is coming from modelgenerator/main.py. I was thinking this was a pytorch lightning compatability error but I can't find this method anywhere in their docs. I tried downgrading from 2.6 to 2.4 anyway but same issue. 

I also tried just commenting this out but am getting a different error which I will give the stacktrace for below.

The register_dataset.py file ran fine so wondering what the issue could be as my dataset is just a bunch of standard pdb files (the af2 ones from proteingym). I was also wondering if genbio-ai/IDO.Protein2StructureToken-16B may work as I can run that programatically. But since it was trained using the earlier model, I was thinking this probably wouldn't work. Anyway, thanks for your help and would appreciate your insight. 

Attaching my conda environment here for reference. 

[env.txt](https://github.com/user-attachments/files/26631539/env.txt)

And the output after commenting out line 38 of main.py
`
/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/fabric/utilities/seed.py:44: No seed found, seed set to 0
[rank: 0] Seed set to 0
Loading weights from local directory
/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/fabric/plugins/environments/slurm.py:204: The `srun` command is available on your system but is not used. HINT: If your intention is to run Lightning on SLURM, prepend your python command with `srun` like so: srun python3 /location/anaconda3/envs/env ...
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
💡 Tip: For seamless cloud logging and experiment tracking, try installing [litlogger](https://pypi.org/project/litlogger/) to enable LitLogger, which logs metrics and artifacts automatically to the Lightning Experiments platform.
💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
You are using a CUDA device ('NVIDIA L40') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/pytorch/utilities/_pytree.py:21: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
Traceback (most recent call last):
  File "/location/anaconda3/envs/env/bin/mgen", line 10, in <module>
    sys.exit(cli_main())
             ^^^^^^^^^^
  File "/location/repos/ModelGenerator/modelgenerator/main.py", line 61, in cli_main
    MyLightningCLI(
  File "/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/pytorch/cli.py", line 421, in __init__
    self._run_subcommand(self.subcommand)
  File "/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/pytorch/cli.py", line 759, in _run_subcommand
    fn(**fn_kwargs)
  File "/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 941, in predict
    return call._call_and_handle_interrupt(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/pytorch/trainer/call.py", line 49, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 990, in _predict_impl
    results = self._run(model, ckpt_path=ckpt_path, weights_only=weights_only)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 1079, in _run
    results = self._run_stage()
              ^^^^^^^^^^^^^^^^^
  File "/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 1118, in _run_stage
    return self.predict_loop.run()
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/pytorch/loops/utilities.py", line 179, in _decorator
    return loop_run(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/pytorch/loops/prediction_loop.py", line 122, in run
    batch, batch_idx, dataloader_idx = next(data_fetcher)
                                       ^^^^^^^^^^^^^^^^^^
  File "/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/pytorch/loops/fetchers.py", line 134, in __next__
    batch = super().__next__()
            ^^^^^^^^^^^^^^^^^^
  File "/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/pytorch/loops/fetchers.py", line 61, in __next__
    batch = next(self.iterator)
            ^^^^^^^^^^^^^^^^^^^
  File "/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/pytorch/utilities/combined_loader.py", line 341, in __next__
    out = next(self._iterator)
          ^^^^^^^^^^^^^^^^^^^^
  File "/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/pytorch/utilities/combined_loader.py", line 142, in __next__
    out = next(self.iterators[0])
          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/location/anaconda3/envs/env/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 741, in __next__
    data = self._next_data()
           ^^^^^^^^^^^^^^^^^
  File "/location/anaconda3/envs/env/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 801, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/location/anaconda3/envs/env/lib/python3.12/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
            ~~~~~~~~~~~~^^^^^
  File "/location/repos/ModelGenerator/modelgenerator/structure_tokenizer/datasets/protein_dataset.py", line 166, in __getitem__
    chain = row["chain"] if not np.isnan(row["chain"]) else "nan"
                                ^^^^^^^^^^^^^^^^^^^^^^
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Errors running structure tokenizer #28

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Errors running structure tokenizer #28

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions