MediVoxAI 🩺🤖

MediVoxAI is a next-generation Medical Chatbot powered by a Multimodal Large Language Model (LLM) with both Vision and Voice capabilities. MediVoxAI can converse with patients, understand spoken questions, analyze medical images, and respond with empathetic and informative answers — making healthcare assistance more accessible and interactive. LINK: https://huggingface.co/spaces/jv456/MediVoxAI

Features

Multimodal LLM: Handles both medical images and text inputs.
Speech-to-Text (STT): Records and transcribes patient voice input.
Text-to-Speech (TTS): Responds with realistic doctor voice output.
Intuitive UI: User-friendly interface using Gradio.

Project Layout

Phase 1: Setup the Brain of the Doctor (Multimodal LLM)

Configure GROQ API key for fast AI inference.
Prepare images in required format.
Integrate the Llama 3 Vision model for image and text understanding.

Phase 2: Setup Voice of the Patient

Set up audio recording using ffmpeg and portaudio.
Implement speech-to-text transcription with OpenAI Whisper.

Phase 3: Setup Voice of the Doctor

Integrate TTS using gTTS and ElevenLabs.
Convert model-generated text responses into human-like voice.

Phase 4: User Interface

Build an interactive UI with Gradio for seamless conversation.

Technical Architecture

How It Works

User speaks or uploads an image via the Gradio UI.
Voice input is transcribed to text using Whisper.
Image and text are processed by the multimodal LLM (Llama 3 Vision).
AI generates response as text.
Doctor's response is converted to voice using TTS and played back.
All interaction happens in a web UI (Gradio).

Tools and Technologies

Groq for AI Inference
OpenAI Whisper for transcription
Llama 3 Vision for multimodal understanding
gTTS & ElevenLabs for speech synthesis
Gradio for UI
Python, VS Code

Output

⭐ Support

If you like this project, please give it a star!

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.gradio/flagged		.gradio/flagged
AI_Medical_Chatbot.egg-info		AI_Medical_Chatbot.egg-info
__pycache__		__pycache__
.gitignore		.gitignore
README.md		README.md
doctor_brain.py		doctor_brain.py
doctor_voice.py		doctor_voice.py
elevenlabs_testing.mp3		elevenlabs_testing.mp3
final.mp3		final.mp3
gradio_app.py		gradio_app.py
gtts_testing.mp3		gtts_testing.mp3
gtts_testing_autoplay.mp3		gtts_testing_autoplay.mp3
patient_voice.py		patient_voice.py
patient_voice_test_for_patient.mp3		patient_voice_test_for_patient.mp3
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MediVoxAI 🩺🤖

Features

Project Layout

Phase 1: Setup the Brain of the Doctor (Multimodal LLM)

Phase 2: Setup Voice of the Patient

Phase 3: Setup Voice of the Doctor

Phase 4: User Interface

Technical Architecture

How It Works

Tools and Technologies

Output

⭐ Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MediVoxAI 🩺🤖

Features

Project Layout

Phase 1: Setup the Brain of the Doctor (Multimodal LLM)

Phase 2: Setup Voice of the Patient

Phase 3: Setup Voice of the Doctor

Phase 4: User Interface

Technical Architecture

How It Works

Tools and Technologies

Output

⭐ Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages