! cd multi_token && python scripts/serve_model.py \
--model_name_or_path mistralai/Mistral-7B-Instruct-v0.1 \
--model_lora_path sshh12/Mistral-7B-LoRA-AudioCLAP \
--port 7860
2024-02-25 00:08:32.729122: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-25 00:08:32.729175: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-25 00:08:32.730543: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-25 00:08:34.708616: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/usr/local/lib/python3.10/dist-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
INFO:root:Loading base model from mistralai/Mistral-7B-Instruct-v0.1 as 16 bits
Downloading shards: 100% 2/2 [00:00<00:00, 3.66it/s]
INFO:accelerate.utils.modeling:We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
Loading checkpoint shards: 100% 2/2 [01:05<00:00, 32.67s/it]
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
INFO:root:Loading projector weights for ['audio_clap']
INFO:root:Loading pretrained weights: ['audio_clap_lmm_projector.mlps.0.0.weight', 'audio_clap_lmm_projector.mlps.0.0.bias', 'audio_clap_lmm_projector.mlps.0.2.weight', 'audio_clap_lmm_projector.mlps.0.2.bias', 'audio_clap_lmm_projector.mlps.1.0.weight', 'audio_clap_lmm_projector.mlps.1.0.bias', 'audio_clap_lmm_projector.mlps.1.2.weight', 'audio_clap_lmm_projector.mlps.1.2.bias', 'audio_clap_lmm_projector.mlps.2.0.weight', 'audio_clap_lmm_projector.mlps.2.0.bias', 'audio_clap_lmm_projector.mlps.2.2.weight', 'audio_clap_lmm_projector.mlps.2.2.bias', 'audio_clap_lmm_projector.mlps.3.0.weight', 'audio_clap_lmm_projector.mlps.3.0.bias', 'audio_clap_lmm_projector.mlps.3.2.weight', 'audio_clap_lmm_projector.mlps.3.2.bias', 'audio_clap_lmm_projector.mlps.4.0.weight', 'audio_clap_lmm_projector.mlps.4.0.bias', 'audio_clap_lmm_projector.mlps.4.2.weight', 'audio_clap_lmm_projector.mlps.4.2.bias']
INFO:root:Loading and merging LoRA weights from sshh12/Mistral-7B-LoRA-AudioCLAP
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 286, in hf_raise_for_status
response.raise_for_status()
File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/sshh12/Mistral-7B-LoRA-AudioCLAP/resolve/main/adapter_model.bin
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/peft/utils/save_and_load.py", line 307, in load_peft_weights
filename = hf_hub_download(model_id, WEIGHTS_NAME, **hf_hub_download_kwargs)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1238, in hf_hub_download
metadata = get_hf_file_metadata(
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1631, in get_hf_file_metadata
r = _request_wrapper(
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 385, in _request_wrapper
response = _request_wrapper(
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 409, in _request_wrapper
hf_raise_for_status(response)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 296, in hf_raise_for_status
raise EntryNotFoundError(message, response) from e
huggingface_hub.utils._errors.EntryNotFoundError: 404 Client Error. (Request ID: Root=1-65da8557-336c3dff333e363725b1f3e0;9a03876c-65da-4c5b-b7e7-75f834f4ed81)
Entry Not Found for url: https://huggingface.co/sshh12/Mistral-7B-LoRA-AudioCLAP/resolve/main/adapter_model.bin.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/content/multi_token/scripts/serve_model.py", line 31, in <module>
model, tokenizer = load_trained_lora_model(
File "/content/multi_token/multi_token/inference.py", line 72, in load_trained_lora_model
model = PeftModel.from_pretrained(model, model_lora_path)
File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 354, in from_pretrained
model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 695, in load_adapter
adapters_weights = load_peft_weights(model_id, device=torch_device, **hf_hub_download_kwargs)
File "/usr/local/lib/python3.10/dist-packages/peft/utils/save_and_load.py", line 309, in load_peft_weights
raise ValueError(
ValueError: Can't find weights for sshh12/Mistral-7B-LoRA-AudioCLAP in sshh12/Mistral-7B-LoRA-AudioCLAP or in the Hugging Face Hub. Please check that the file adapter_model.bin or adapter_model.safetensors is present at sshh12/Mistral-7B-LoRA-AudioCLAP.