Do you have an example of training a modality that has no pretrained encoder? I want to only train the projector on ready embeddings.
My use case is a dataset of an array of numbers (each number indicating a voxel (from fmri data) intensity) and their corresponding english sentence.
I want to treat the voxel array as an embedding vector that needs to be projected into a higher dimension according to the textual embeddings of each array and its corresponding sentence.
Any help would be appreciated.
Do you have an example of training a modality that has no pretrained encoder? I want to only train the projector on ready embeddings.
My use case is a dataset of an array of numbers (each number indicating a voxel (from fmri data) intensity) and their corresponding english sentence.
I want to treat the voxel array as an embedding vector that needs to be projected into a higher dimension according to the textual embeddings of each array and its corresponding sentence.
Any help would be appreciated.