OpenConceptLab/ocl_issues#2388 | custom encoder model for reranking#8
OpenConceptLab/ocl_issues#2388 | custom encoder model for reranking#8snyaggarwal wants to merge 1 commit intomainfrom
Conversation
paynejd
left a comment
There was a problem hiding this comment.
Review — OpenConceptLab/ocl_issues#2388
1. Add drop-in models to ENCODER_MODEL_OPTIONS with descriptions
Currently rerankerModels.js only has the default. Add 3 additional drop-in compatible models (no backend code changes needed — these all work with the existing CrossEncoder in sentence-transformers 3.3.1):
export const ENCODER_MODEL_OPTIONS = [
{
id: 'BAAI/bge-reranker-v2-m3',
description: 'Multilingual, general-purpose (0.6B)',
},
{
id: 'cross-encoder/ms-marco-MiniLM-L-6-v2',
description: 'Fast and lightweight, English-only (23M)',
},
{
id: 'ncbi/MedCPT-Cross-Encoder',
description: 'Biomedical domain, trained on PubMed (110M)',
},
{
id: 'Alibaba-NLP/gte-reranker-modernbert-base',
description: 'Balanced quality, supports longer descriptions (149M)',
},
]This changes the data shape from string array to object array, so RerankerConfig.jsx needs updating to use option.id and display option.description in the dropdown. Each model offers a genuinely different tradeoff:
- ms-marco-MiniLM: 27x smaller than default, ~10x faster — good for latency-critical or large batch runs
- MedCPT: only biomedical-domain cross-encoder available, trained on 18M PubMed query-article pairs — most relevant for health terminology mapping
- gte-modernbert: near-default quality at 4x smaller, 8192 token context window (vs 128 default) for longer concept descriptions
2. Log the encoder model on rerank events
At line 2226, the rerank_finished log should include which model was used:
log({action: 'rerank_finished', description: `Reranked with ${encoderModel}`}, index)Same for rerank_failed at line 2231:
log({action: 'rerank_failed', description: `Rerank failed with ${encoderModel}`}, index)This is visible in the row's Discuss/log panel and persists with the project — important for debugging and reproducibility when users are experimenting with different models.
3. Fix prop naming inconsistency
MapProject.jsx passes rerankerConfig={encoderModel} and setRerankerConfig={setEncoderModel} to ConfigurationForm, but the value is a plain string, not a config object. The naming is misleading. Suggest renaming to encoderModel/setEncoderModel or rerankerModel/setRerankerModel throughout.
4. Fix Spanish translation accents
In es/translations.json:
"Configuracion del reranker"→"Configuración del reranker""automaticamente"→"automáticamente"
5. Coordinate Closes keyword with oclapi2#839
Both PRs say "Closes OpenConceptLab/ocl_issues#2388". Whichever merges first will auto-close the issue prematurely. Suggest this PR (oclmap) keeps the Closes since it's the user-facing final piece, and oclapi2#839 changes to Ref #2388.
Related follow-up tickets filed:
- OpenConceptLab/ocl_issues#2463 — Upgrade
sentence-transformersfrom 3.3.1 to 5.4+ - OpenConceptLab/ocl_issues#2464 — Add Qwen3-Reranker models (blocked by #2463)
Linked Issue
Closes OpenConceptLab/ocl_issues#2388