Accepted by SIGIR 2026 (Full Paper)
AMPO introduces an adaptive margin mechanism for pairwise preference optimization. Rather than applying a uniform margin to every preference pair, it calibrates optimization strength according to model confidence, making training more stable under heterogeneous recommendation signals and varying pair difficulty.
The repository is organized for direct experimentation, with a lightweight training entry and a compact implementation path for extending optimization objectives in practical recommendation settings.
git clone https://github.com/jumbo-q/ampo.git
cd ampo
pip install -r requirements.txt-
Modify Configuration: Edit configs/default.yaml and fill in your
<model_path>and<data_path>.model_args: model_name_or_path: "<your_model_path>" data: train_files: - "<your_train_data_path>"
-
Launch Training: Run the provided shell script. It supports both single-GPU and multi-GPU training via DeepSpeed.
# Multi-GPU training (default: 8 GPUs) bash scripts/train.sh # Single-GPU training NUM_GPUS=1 bash scripts/train.sh
Note:
scripts/train.shuses DeepSpeed for distribution by default. You can override the GPU count and config file path usingNUM_GPUSandCONFIG_FILEenvironment variables.
AMPO expects pairwise preference data with the following logical fields:
| Column | Description |
|---|---|
prompt |
input context or user history |
chosen |
preferred response / item |
rejected |
non-preferred response / item |
Example:
{
"prompt": "User history ...",
"chosen": "Preferred item",
"rejected": "Rejected item"
}The default workflow is intentionally minimal:
- implement the core optimization logic in
src/ampo/trainer.py - define training and evaluation flow in
main_ampo.py
AMPO supports custom loss extension while preserving the standard tokenization, collation, logging, and optimization pipeline.
To be released