Skip to content

Errors running structure tokenizer #28

@morgankohler

Description

@morgankohler

Hello thanks for all the work! My goal is to generate structure tokens for genbio-ai/AIDO.Protein-RAG-16B and I am trying to use the structure tokenizer genbio-ai/AIDO.StructureTokenizer. I followed the instructions on the huggingface page to use register_dataset.py followed by the mgen predict command. However I am getting this when running that:

AttributeError: 'EquiformerEncoderLightning' object has no attribute 'check_data_compatibility'

Which is coming from modelgenerator/main.py. I was thinking this was a pytorch lightning compatability error but I can't find this method anywhere in their docs. I tried downgrading from 2.6 to 2.4 anyway but same issue.

I also tried just commenting this out but am getting a different error which I will give the stacktrace for below.

The register_dataset.py file ran fine so wondering what the issue could be as my dataset is just a bunch of standard pdb files (the af2 ones from proteingym). I was also wondering if genbio-ai/IDO.Protein2StructureToken-16B may work as I can run that programatically. But since it was trained using the earlier model, I was thinking this probably wouldn't work. Anyway, thanks for your help and would appreciate your insight.

Attaching my conda environment here for reference.

env.txt

And the output after commenting out line 38 of main.py
/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/fabric/utilities/seed.py:44: No seed found, seed set to 0 [rank: 0] Seed set to 0 Loading weights from local directory /location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/fabric/plugins/environments/slurm.py:204: Thesruncommand is available on your system but is not used. HINT: If your intention is to run Lightning on SLURM, prepend your python command withsrunlike so: srun python3 /location/anaconda3/envs/env ... GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores 💡 Tip: For seamless cloud logging and experiment tracking, try installing [litlogger](https://pypi.org/project/litlogger/) to enable LitLogger, which logs metrics and artifacts automatically to the Lightning Experiments platform. 💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry. You are using a CUDA device ('NVIDIA L40') that has Tensor Cores. To properly utilize them, you should settorch.set_float32_matmul_precision('medium' | 'high')which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] /location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/pytorch/utilities/_pytree.py:21:isinstance(treespec, LeafSpec)is deprecated, useisinstance(treespec, TreeSpec) and treespec.is_leaf()instead. Traceback (most recent call last): File "/location/anaconda3/envs/env/bin/mgen", line 10, in <module> sys.exit(cli_main()) ^^^^^^^^^^ File "/location/repos/ModelGenerator/modelgenerator/main.py", line 61, in cli_main MyLightningCLI( File "/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/pytorch/cli.py", line 421, in __init__ self._run_subcommand(self.subcommand) File "/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/pytorch/cli.py", line 759, in _run_subcommand fn(**fn_kwargs) File "/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 941, in predict return call._call_and_handle_interrupt( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/pytorch/trainer/call.py", line 49, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 990, in _predict_impl results = self._run(model, ckpt_path=ckpt_path, weights_only=weights_only) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 1079, in _run results = self._run_stage() ^^^^^^^^^^^^^^^^^ File "/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 1118, in _run_stage return self.predict_loop.run() ^^^^^^^^^^^^^^^^^^^^^^^ File "/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/pytorch/loops/utilities.py", line 179, in _decorator return loop_run(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/pytorch/loops/prediction_loop.py", line 122, in run batch, batch_idx, dataloader_idx = next(data_fetcher) ^^^^^^^^^^^^^^^^^^ File "/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/pytorch/loops/fetchers.py", line 134, in __next__ batch = super().__next__() ^^^^^^^^^^^^^^^^^^ File "/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/pytorch/loops/fetchers.py", line 61, in __next__ batch = next(self.iterator) ^^^^^^^^^^^^^^^^^^^ File "/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/pytorch/utilities/combined_loader.py", line 341, in __next__ out = next(self._iterator) ^^^^^^^^^^^^^^^^^^^^ File "/location/anaconda3/envs/env/lib/python3.12/site-packages/lightning/pytorch/utilities/combined_loader.py", line 142, in __next__ out = next(self.iterators[0]) ^^^^^^^^^^^^^^^^^^^^^^^ File "/location/anaconda3/envs/env/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 741, in __next__ data = self._next_data() ^^^^^^^^^^^^^^^^^ File "/location/anaconda3/envs/env/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 801, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/location/anaconda3/envs/env/lib/python3.12/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] ~~~~~~~~~~~~^^^^^ File "/location/repos/ModelGenerator/modelgenerator/structure_tokenizer/datasets/protein_dataset.py", line 166, in __getitem__ chain = row["chain"] if not np.isnan(row["chain"]) else "nan" ^^^^^^^^^^^^^^^^^^^^^^ TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions