PyTorch implementation for MSCA proposed in the following paper:
Multi-view Semantic Contrastive Alignment for Multimodal Recommendation
Jiuqiang Li, Hongjun Wang*
In WWW 2026
Paper
- Python 3.8.10
- PyTorch 1.11.0+cu113
For dependency details, refer to requirements.txt.
Download from Google Drive: Baby/Sports/Electronics (Raw Data). The data includes image and text features provided by the MMRec framework, extracted from VGG and Sentence-Transformers. Preprocessing from raw data can be found here.
Download a supplementary dataset for micro-video recommendation: MicroLens (Raw Data) within MMRec.
-
Download the datasets and place them in the
datafolder. -
Set the hyperparameters in the
src/configs/model/MSCA.yamlfile. -
Run:
cd ./src
python main.py -m MSCA -d {dataset_name}- Test:
python test.py -m MSCA -d {dataset_name} -c {checkpoint_path}We report the best hyperparameters of MSCA to reproduce the results in Table 2 and 6 of our paper.
| Dataset | n_layers | fusion_coeff | cl_weight | reg_weight |
|---|---|---|---|---|
| Baby | 2 | 0.4 | 0.005 | 3e-7 |
| Sports | 3 | 0.3 | 0.005 | 5e-8 |
| Electronics | 4 | 0.2 | 0.01 | 5e-10 |
| MicroLens | 4 | 0.3 | 0.01 | 5e-9 |
The training logs and model checkpoints are provided below:
| Dataset | Download | |
|---|---|---|
| Baby | log | checkpoint |
| Sports | log | checkpoint |
| Electronics | log | checkpoint |
| MicroLens | log | checkpoint |
If you find MSCA helpful to your research, please consider citing the following paper.
@inproceedings{li2026multi,
title={Multi-view Semantic Contrastive Alignment for Multimodal Recommendation},
author={Li, Jiuqiang and Wang, Hongjun},
booktitle={Proceedings of the ACM Web Conference 2026},
pages={5941--5952},
year={2026}
}Licensed under the GNU GPL v3.0. See LICENSE.
This repository is based on MMRec. Thanks for their work.

