GitHub - KingsleyXW/Vision-Language-Model-for-Text-object-Retrieval · GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
COCOdataset.py		COCOdataset.py
FPN_ROI align.ipynb		FPN_ROI align.ipynb
FPN_ROI.py		FPN_ROI.py
README.md		README.md
a4_helper.py		a4_helper.py
bpe_simple_vocab_16e6.txt.gz		bpe_simple_vocab_16e6.txt.gz
common.py		common.py
dataset.ipynb		dataset.ipynb
simple_tokenizer.py		simple_tokenizer.py

Repository files navigation

The code specify a cutting-edge vision-language model designed to detect objects based on textual input. We leverage 512-dimensional embeddings to replace the standard classification output of the detector. Since the available dataset lacks object-level text annotations, we introduce a novel training approach utilizing prompts with category names to effectively train our object detector. Furthermore, to optimize performance for arbitrary text inputs, we employ knowledge distillation with the state-of-the-art CLIP model to transfer object embedding knowledge. The integration of these techniques results in a highly robust and efficient vision-language object detection system.

About

No description, website, or topics provided.

Report repository

Releases

No releases published

Packages

Contributors

Languages