mutli-modal

Star

Here are 10 public repositories matching this topic...

krantiparida / awesome-audio-visual

Star

A curated list of different papers and datasets in various areas of audio-visual processing

awesome localization awesome-list cross-modal source-separation audio-visual mutli-modal

Updated Jan 30, 2024

LittlePey / SFD

Star

Sparse Fuse Dense: Towards High Quality 3D Detection with Depth Completion (CVPR 2022, Oral)

point-clouds 3d-object-detection depth-completion mutli-modal

Updated Jul 30, 2022
Python

youngbin-ro / audiotext-transformer

Star

Multimodal Transformer for Korean Sentiment Analysis with Audio and Text Features

natural-language-processing sentiment-analysis transformer audio-processing mutli-modal

Updated Sep 7, 2021
Python

VachanVY / Transfusion.torch

Sponsor

Star

PyTorch Implementation of Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

ai deep-learning transformers pytorch artificial-intelligence pytorch-implementation mutli-modal multimodal-transformer multimodal-large-language-models

Updated Oct 10, 2024
Python

atlas-2192 / Multi-AI-Chat-APP

Star

python agent ai chatbot assistant openai llama mutli-modal

Updated Dec 16, 2024
Python

Kcrypto126 / Multi-Ai-Chat-App

Star

chatting app

python agent ai chatbot assistant openai llama mutli-modal

Updated Jul 19, 2025
Python

zjukg / MANS

Star

[Paper][IJCNN2023] Modality-Aware Negative Sampling for Multi-modal Knowledge Graph Embedding

pytorch knowledge-graph negative-sampling mutli-modal

Updated Feb 10, 2024
Python

Multi-modal AI agent that extracts information from PDFs, images, and documents to answer questions. Combines vision models with RAG architecture for intelligent document understanding. Upload any file and chat with your documents. Built with LangChain, vision APIs, and vector embeddings.

question-answering ai-agents rag mutli-modal vision-ai document-processing-pipeline

Updated Dec 15, 2025
Python

Siva-Dev-001 / AI_invoice_Extractor

Star

A multi-language invoice data extractor tool using Google Gemini Pro and Streamlit with Prompt Engineering.

python python-library python-script invoice invoice-pdf gemini-api streamlit mutli-modal streamlit-webapp prompt-tuning llm prompt-engineering gemini-ai

Updated May 12, 2024
Python

jayenliao / MAE

Star

This repo reproduces key findings from Masked Autoencoders Are Scalable Vision Learners (MAE) on CIFAR-10: self-supervised pretraining improves downstream classification versus training from scratch, and we studied how decoder depth and decoder width affect MAE pretraining and downstream results.

image-classification vit vlm masked-autoencoder mutli-modal

Updated Oct 14, 2025
Jupyter Notebook

Improve this page

Add a description, image, and links to the mutli-modal topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the mutli-modal topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mutli-modal

Here are 10 public repositories matching this topic...

krantiparida / awesome-audio-visual

LittlePey / SFD

youngbin-ro / audiotext-transformer

VachanVY / Transfusion.torch

atlas-2192 / Multi-AI-Chat-APP

Kcrypto126 / Multi-Ai-Chat-App

zjukg / MANS

atahabilder1 / DocuMind

Siva-Dev-001 / AI_invoice_Extractor

jayenliao / MAE

Improve this page

Add this topic to your repo