A fully voice-controlled personal desktop assistant built in Python, inspired by Jarvis in Ironman movie.This assistant can listen, speak, execute system actions, open apps, control volume/brightness, search YouTube, play songs on Spotify, interact with ChatGPT, and display a fully animated UI.
Important
Operating System: This project involves system-level commands (e.g., os.system, ctypes, powershell) that are designed specifically for Windows 10/11. It will not function correctly on macOS or Linux without modification.Jarvis uses:
- Speech Recognition
- Text-To-Speech
- OpenAI API
- Tkinter animated GUI
- Local OS automation
- Custom command execution engine
This project started from a very basic speech-to-text experiment using PyAudio & SpeechRecognition. No AI, no UI — just a simple goal: make the computer listen and respond.
Later, the project evolved step-by-step:
- Started with PyAudio +
speech_recognition - Recognized simple speech commands
- Executed local Python functions
- No AI used at this stage
- Added OS-level control using
os,webbrowser,ctypes,pyautogui - Could open apps, browsers, control volume/brightness
- Added pyttsx3
- Jarvis started speaking responses
- Basic conversational ability
- YouTube search + auto-play
- Spotify search
- Open google and search
- Open any web
- Open system apps
- Shutdown, restart, sleep, lock
- Close all apps
- Built a full Tkinter UI
- Added animated 3D globe effect
- Added audio visualizer bars
- Added typewriter text animation
- Added fade-out effects
- Clean “Jarvis-like” theme
-
Added direct OpenAI API calls
-
Built a custom JSON action interpreter
-
Jarvis could now:
- Understand natural language
- Answer general questions
- Use AI fallback when speech commands fail
- Execute system actions through AI JSON
- Fixed YouTube search
- Fixed repeated answers
- Fixed Spotify handling
- Added command normalization
- Added ambient noise reduction
- Improved streaming text display
- Added multi-threaded listening
This project now evolved into a full-featured desktop AI assistant.
- Continuous listening mode
- Google Speech Recognition
- Ambient noise cancellation
- Real-time display of what user said
- pyttsx3 engine
- Professional Jarvis-style narration
- Adjustable speed + volume
- UI typewriter animation
-
Open apps:
- CMD
- PowerShell
- File Explorer
- Notepad
- Calculator
- Chrome
- Spotify
- YouTube
- Settings
- Task Manager
-
System power actions:
- Shutdown
- Restart
- Sleep
- Lock
-
Volume control:
- Increase
- Decrease
- Mute
-
Brightness control:
- Increase
- Decrease
- Open YouTube
- Search + auto-play top YouTube result
- Spotify play/search
- ChatGPT website
- ChatGPT voice mode
- General questions answered by GPT
- Fallback to local commands if API fails
- JSON action extraction
- Unified command execution
- Custom system prompt for Jarvis personality
- Fully animated 3D rotating globe
- Audio visualizer
- Fade-out effects
- Typewriter text rendering
- Smooth UX
- Modern neon theme
- Central status panel (“Listening… / Processing…”)
- Real-time user text + AI response
- Background listening thread
- Robust command interpreter
- Clean error handling
- YouTube ID regex extraction
- URL encoding
- OS-level process control
- App protection list (python, VSCode)
Jarvis Assistant/
│
├── jarvis.py # Main assistant code
├── requirements.txt # Dependencies
├── README.md # Documentation
└── assets/ # (Optional) icons, images
git clone https://github.com/yourusername/jarvis-assistant
cd jarvis-assistant
pip install -r requirements.txt
speechrecognition
pyttsx3
pyautogui
psutil
screen_brightness_control
openai
requests
tkinter (built-in)
pyaudio
If PyAudio fails:
pip install pipwin
pipwin install pyaudio
Inside the code:
OPENAI_KEY = "your-key-here"
Or use environment variable:
setx OPENAI_API_KEY "your-key-here"
python jarvis.py
open chrome
open calculator
open file explorer
open chatgpt
open youtube
play despacito on youtube
play songs on spotify
shutdown
restart
lock system
increase volume
decrease brightness
close all apps
who is sundar pichai?
what is quantum computing?
explain python in simple words
bye
stop
exit
Speech → Microphone → Google STT → Text
Jarvis checks if your command matches:
- system actions
- app open
- YouTube
- Spotify
- volume/brightness
If matched → executes instantly.
If command is NOT recognized locally:
-
Sent to ChatGPT
-
GPT responds with:
- A normal text answer
- OR a JSON command like:
{
"action": { "type":"open_app", "params":{ "app":"calculator" } },
"speak": "Opening calculator"
}- Typewriter text
- Animated globe
- Audio bars
Try:
pipwin install pyaudio
Make sure microphone is selected.
Fixed in:
- Improved JSON extraction
- Stripping duplicate replies
Fixed using:
- Regex video ID extractor
- Correct search URL
- Offline STT with Whisper
- Real-time wake word (“Hey Jarvis”)
- Better 3D UI
- Add system tray mode
- Add weather, news APIs
- Add email automation
- Add reminders + notes
Free to use for personal projects.