Official PyTorch implementation of Variational Bayesian Personalized Ranking (IEEE TPAMI, 2026). Should you use this work in your research, please cite the following paper:
# bibtex
@article{11429075,
author={Liu, Bin and Liu, Xiaohong and Luo, Qin and Shang, Ziqiao and Chu, Jielei and Ma, Lin and Li, Zhaoyu and Teng, Fei and Zhai, Guangtao and Li, Tianrui},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
title={Variational Bayesian Personalized Ranking},
year={2026},
pages={1-16},
doi={10.1109/TPAMI.2026.3672705}
}
📖 ArXiv: Link to paper Variational Bayesian Personalized Ranking
🎯 VarBPR is a unified variational framework that integrates preference alignment, popularity debiasing, and denoising into a single pairwise learning objective for implicit collaborative filtering. Whether you're pushing the boundaries of recommendation research or building real-world systems with fairness requirements, VarBPR offers both performance and policy control in one elegant package.
- Unified Noise & Bias Handling — Seamlessly integrate denoising, popularity debiasing, and preference alignment in a unified variational framework.
- Controllable Long-tail Exposure — VarBPR enables controllable long-tail exposure through a flexible direction-strength variational inference mechanism.
- Linear-Time Scalability — Maintain the time complexity and peak GPU memory usage of BPR.
- Theoretical Guarantees — We provide interpretable generalization guarantees, reveal the 🔴opportunity cost of prioritizing certain exposure patterns(e.g. long-tail), and offer a analytical tool for analyzing and designing controllable recommendation systems.
🚀 Get started in minutes — Transform your recommender from a black box into a controllable, interpretable system.
- Plug-and-play replacement for standard BPR loss
- Support for MF, LightGCN, XSimGCL backbones
- Dual implementation: efficient plug-in & exact ELBO versions
- Full reproducibility with open datasets
- Practical generalization guarantees and controllable exposure policies.
- Introduce a designable prior
(π⁺, π⁻)that encodes customizable exposure objectives—such as promoting long‑tail items, ensuring content quality, or enhancing diversity—into the variational inference framework. - Explicitly separate and implement the two-stage training procedure: variational inference (solving for posteriors) and variational learning (updating model parameters).
- Update license from MIT to Apache License 2.0 for broader compatibility and explicit patent protection in open‑source and commercial use.
- Improved evaluation by incorporating long-tail exposure (APLT) and Top-5 evaluation.
- Refined the multi-threaded data loading pipeline to minimize idle time.
We thank the anonymous reviewers of IEEE TPAMI for their insightful comments and constructive suggestions, which have greatly improved VarBPR.
- Python 3.7
- PyTorch 1.12.1
| File | Description |
|---|---|
utils.py |
Data loading utilities and GPU-optimized dataset organization |
model.py |
Backbone architecture implementation |
evaluation.py |
Top-k performance evaluation on validation set |
main.py |
Central workflow controller (data processing, training, evaluation) |
--loss: loss function, choose from ['VarBPRExact' , 'VarBPRPlugIn']
VarBPRExactThis is the VarBPR implementation without plug-in approximation (ELBO), for small M,N, choose VarBPRExact to achieve better performance.VarBPRPlugInThis is the VarBPR implementation with plug-in approximation, for large M,N, choose VarBPRPlugIn to ensure efficiency.
--dataset: dataset name, choose from ['100k', '1M', 'gowalla' or 'yelp2018'].
--backbone: Backbone model to encoder feature representations, choose from ['MF', 'LightGCN'].
(VarBPR-specific Hyperparameters)
--M: Number of positive samples.
--N: Number of negative samples.
--cpos: Regularization strength for denoising and exposure control on the positive side.
--cneg: Regularization strength for denoising and exposure control on the negative side.
For each dataset, the backbone model hyperparameters for VarBPR are fixed the same. For instance, run the following command to train an embedding on different datasets.
python main.py
| Dataset | Backbone | Feature dim | Learning Rate | l2 | Batch Size | Hop |
|---|---|---|---|---|---|---|
| MovieLens 100K | MF | 64 | 1e-3 | 1e-5 | 1024 | - |
| MovieLens 1M | MF | 64 | 1e-3 | 1e-6 | 1024 | - |
| Gowalla | MF | 128 | 1e-3 | 1e-6 | 1024 | - |
| Yelp2018 | MF | 128 | 1e-3 | 1e-6 | 1024 | - |
| MovieLens 100K | LightGCN | 64 | 1e-3 | 1e-6 | 1024 | 1 |
| MovieLens 1M | LightGCN | 64 | 1e-3 | 1e-6 | 1024 | 2 |
| Gowalla | LightGCN | 128 | 1e-3 | 5e-7 | 1024 | 1 |
| Yelp2018 | LightGCN | 128 | 1e-3 | 1e-7 | 1024 | 2 |
| Dataset | Backbone | ||||
|---|---|---|---|---|---|
| MovieLens 100K | MF | 2 | 4 | 4 | 4 |
| MovieLens 1M | MF | 2 | 4 | 4 | 4 |
| Gowalla | MF | 2 | 16 | 8 | 8 |
| Yelp2018 | MF | 2 | 16 | 8 | 8 |
| MovieLens 100K | LightGCN | 2 | 4 | 4 | 4 |
| MovieLens 1M | LightGCN | 2 | 4 | 4 | 4 |
| Gowalla | LightGCN | 2 | 20 | 8 | 8 |
| Yelp2018 | LightGCN | 2 | 20 | 8 | 8 |
| Component | pos_rarity |
pos_quality |
pos_hardnees |
neg_popularity |
neg_badquality |
neg_hardnees |
|---|---|---|---|---|---|---|
| MovieLens100k/1M | 0 | 1 | 0 | 0 | 0.5 | 0.5 |
| Yelp2018/Gowalla | 0.2 | 0 | 0.8 | 0 | 0 | 1 |
To train on a private dataset or use a customized encoder, follow these steps:
-
Data Formatting:
- Organize the data into (u, i) tuples, as shown in the example data in the data directory.
- Split data as
tran.txtandtest.txt
-
Encoder Configuration:
- Rewrite
model.pyto implemente YOUR_OWN_Backbone architecture. - Ensure that for any user or item, the encoding results in a
$d$ - dimensional feature representation.
- Rewrite
-
Parameter Configuration:
# In main.py parser.add_argument('--dataset', default='YOUR_DATA_SET', type=str, help='Dataset name') parser.add_argument('--backbone', default='YOUR_BACKBONE', type=str, help='Backbone model')
This project is licensed under the Apache License 2.0. - see the LICENSE file for details.

