Privacy-first, AI-powered focus tracking for students and remote workers — 100% local inference, zero cloud uploads.
- Overview
- Key Features
- How It Works
- Tech Stack
- Project Structure
- Getting Started
- Usage
- Privacy Guarantee
- Contributing
- License
Focus Guardian is a real-time, AI-powered desktop proctoring and attention-tracking application. It monitors your focus during study sessions or deep work, detects signs of distraction, and gently nudges you back on track — all without ever sending your video data to the cloud.
Whether you're a student preparing for exams or a remote worker trying to stay in flow, Focus Guardian acts as your personal productivity co-pilot: silent when you're focused, helpful when you drift.
Uses advanced facial landmarking via MediaPipe and OpenCV's solvePnP to calculate your head's Yaw, Pitch, and Roll in real time. The geometry is carefully calibrated using nose, chin, and eye anchor points, so natural facial expressions like smiling don't trigger false distraction alerts.
Continuously monitors the Eye Aspect Ratio (EAR) to detect signs of sleepiness. Iris placement tracking ensures you're actually looking at your screen rather than zoning out.
Leverages Ultralytics YOLOv8 object detection to scan your environment for cell phones held in front of the camera — one of the most common study distractors.
A proprietary scoring algorithm maintains a live Focus Score that drains when you look away or pick up your phone, and gradually recovers as you regain focus.
Built in Next.js, the live dashboard shows:
- Current Focus Score with live visual feedback
- Active Focus Streak timer
- Session duration tracker
- Instant Red/Green alert banners when distraction is detected
- Audio nudges via the Web Audio API if distraction persists beyond 5 seconds
Browser (Next.js)
│
│ Compressed JPEG frames @ ~10 FPS
▼
FastAPI Backend
├── MediaPipe Face Landmarker → 468 facial points → Head pose, EAR, gaze
└── YOLOv8 Object Detection → Phone detection in frame
│
│ JSON response: { distraction_type, focus_score, alerts }
▼
React Hooks
├── Update Focus Score
├── Trigger visual banners (Red / Green)
└── Fire audio nudge (Web Audio API) if distracted > 5 seconds
The Next.js frontend captures your webcam feed and sends aggressively compressed JPEG frames to the FastAPI backend at approximately 10 FPS. The backend routes each frame through MediaPipe and YOLO sequentially and returns a lightweight JSON payload describing your current focus state. The React frontend interprets the response and instantly updates the UI — no page reload, no lag.
| Technology | Purpose |
|---|---|
| Next.js & React | Live dashboard and real-time UI |
| HTML5 Media API | Webcam frame capture and streaming |
| Web Audio API | Local audio nudges for distraction alerts |
| Technology | Purpose |
|---|---|
| FastAPI | Async Python server handling incoming video frames |
| Google MediaPipe | Face Landmarker — maps 468 facial points per frame |
| Ultralytics YOLOv8 | Spatial distractor detection (cell phone recognition) |
OpenCV (opencv-python-headless) |
Image array manipulation and 3D geometric calculations |
focus-guardian/
├── backend/
│ ├── main.py # FastAPI app and frame routing
│ ├── pose_estimator.py # Head pose (solvePnP), EAR, gaze logic
│ ├── phone_detector.py # YOLOv8 phone detection
│ ├── focus_score.py # Dynamic Focus Score algorithm
│ └── requirements.txt
│
├── frontend/
│ ├── app/
│ │ ├── page.tsx # Main dashboard
│ │ └── components/
│ │ ├── FocusScore.tsx
│ │ ├── AlertBanner.tsx
│ │ └── SessionTimer.tsx
│ ├── hooks/
│ │ └── useFocusStream.ts # Webcam capture + backend polling
│ └── package.json
│
└── README.md
- Python 3.10 or higher
- Node.js 18 or higher
- A system webcam
- (Optional but recommended) A CUDA-capable GPU for faster YOLO inference
# 1. Clone the repository
git clone https://github.com/your-username/focus-guardian.git
cd focus-guardian/backend
# 2. Create and activate a virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Start the FastAPI server
uvicorn main:app --host 0.0.0.0 --port 8000 --reloadThe backend will be available at http://localhost:8000.
# From the project root
cd frontend
# 1. Install dependencies
npm install
# 2. Start the development server
npm run devThe dashboard will be available at http://localhost:3000.
Note: Make sure the backend server is running before opening the frontend, as the dashboard begins streaming frames on load.
- Open
http://localhost:3000in your browser. - Grant camera access when prompted.
- Click Start Session to begin focus tracking.
- Work normally — Focus Guardian runs silently in the background.
- If you look away, get drowsy, or pick up your phone, you'll receive an instant on-screen alert and an audio nudge (after 5 seconds of sustained distraction).
- Review your session summary — streak duration, average focus score, and distraction events — at the end of each session.
No video data ever leaves your machine.
- All webcam frames are processed locally by the FastAPI backend running on your own hardware.
- Only a minimal JSON payload (focus state, score, distraction type) is exchanged between the backend and the browser — never raw video.
- No accounts, no telemetry, no cloud inference pipelines.
Focus Guardian is built on the principle that attention data is personal data. Your sessions are yours.
Contributions are welcome! If you'd like to fix a bug, add a feature, or improve the documentation:
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature-name - Commit your changes:
git commit -m 'Add your feature' - Push to the branch:
git push origin feature/your-feature-name - Open a Pull Request
Please open an issue first for major changes so we can discuss the approach.
This project is licensed under the MIT License.
Built with 🧠 for anyone who needs a little help staying in the zone.



