A curated list of awesome Multimodal studies.
-
Updated
Mar 11, 2026
A curated list of awesome Multimodal studies.
Awesome Multimodal Assistant is a curated list of multimodal chatbots/conversational assistants that utilize various modes of interaction, such as text, speech, images, and videos, to provide a seamless and versatile user experience.
[ECCV'24] Official code for "BI-MDRG: Bridging Image History in Multimodal Dialogue Response Generation"
[Paperlist] Awesome paper list of multimodal dialog, including methods, datasets and metrics
Recent Advances in Visual Dialog
Paper, dataset and code list for multimodal dialogue.
[EMNLP'23 Oral] ReSee: Responding through Seeing Fine-grained Visual Knowledge in Open-domain Dialogue PyTorch Implementation
Code for a vision-enabled dialogue system, combining dialogue and visual inputs to enhance contextual awareness. Utilizing GPT-4, the system summarizes prompt images for brevity and can be employed as a standalone application with a webcam or integrated into a Furhat robot.
Summary of Visual Dialogue Papers
Official PyTorch implementation of ACL 2023 paper "Listener Model for the PhotoBook Referential Game with CLIPScores as Implicit Reference Chain"
This work is for the MA's stage and paper. It is about multimodal datasets and schema-guided dialogue datasets.
Add a description, image, and links to the multimodal-dialogue topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-dialogue topic, visit your repo's landing page and select "manage topics."