StudyBuddy is a full-stack academic productivity platform that helps university students organize their coursework, automatically generate optimized study plans, and discover the best nearby study spots using AI-powered scoring and real-time occupancy data.
graph TD
subgraph Client["Frontend — React 18 + Vite"]
UI[SPA Components]
OCR[Client-Side OCR Engine<br/>Tesseract.js + PDF.js]
Map[Mapbox GL JS<br/>Geospatial Map]
end
subgraph API["Backend — Django 6 + DRF"]
Views[REST API Views]
Parser[Schedule Parser<br/>Hybrid OCR + Gemini Escalation]
Planner[Study Planner<br/>Greedy Scheduling Algorithm]
Auth[Token Authentication]
Spectacular[drf-spectacular<br/>OpenAPI Schema]
end
subgraph Workers["Celery Workers"]
W1["worker<br/>queue: coffeeshops"]
W2["apify-worker<br/>queue: apify<br/>concurrency: 2"]
W3["ai-worker<br/>queue: ai<br/>concurrency: 1"]
W4["besttime-worker<br/>queue: besttime<br/>concurrency: 1"]
end
subgraph Storage["Data Layer"]
DB[(PostgreSQL + PostGIS)]
Redis[(Redis)]
end
subgraph External["External Services"]
Gemini[Google Gemini API<br/>gemini-2.5-flash]
Apify[Apify Crawler<br/>Google Places Reviews]
Ollama[Ollama — Local LLM<br/>Llama 3.2]
BestTime[BestTime.app API<br/>Live Occupancy]
end
UI -->|REST + Token Auth| Views
OCR -->|Extracted Text| Views
Map -->|Mapbox Token| UI
Views --> DB
Views -->|Enqueue Tasks| Redis
Parser -->|Confidence < 0.62| Gemini
Planner --> DB
Redis --> W1
Redis --> W2
Redis --> W3
Redis --> W4
W1 --> DB
W2 --> Apify
W3 --> Ollama
W4 --> BestTime
| Layer | Technology | Role |
|---|---|---|
| Frontend | React 18, Vite, Tailwind CSS 4 | SPA with client-side OCR and interactive map |
| Backend | Django 6.0, Django REST Framework, drf-spectacular | REST API, business logic, OpenAPI docs |
| Database | PostgreSQL 17 + PostGIS 3.4 | Relational data + geospatial indexing (SRID 4326) |
| Async | Celery 5.5 + Redis 7 | Distributed task queues with per-queue concurrency |
| Static | WhiteNoise | Compressed static file serving |
| Containerization | Docker Compose | Multi-service orchestration (8 containers) |
The frontend handles document preprocessing entirely in the browser before sending extracted text to the backend. This avoids uploading raw files to the server and reduces API latency.
flowchart TD
Upload([User uploads PDF or Image]) --> TypeCheck{File type?}
TypeCheck -->|PDF| NativeExtract[Extract native text layer<br/>via PDF.js]
TypeCheck -->|Image| ImagePath[Direct to OCR]
NativeExtract --> CharCheck{Characters ≥ 50<br/>per page?}
CharCheck -->|Yes — digital PDF| SendText([Send text to backend])
CharCheck -->|No — scanned PDF| RenderCanvas[Render pages to Canvas<br/>at 2x scale]
RenderCanvas --> DualPass
ImagePath --> DualPass
DualPass[Dual-pass OCR via Tesseract.js]
DualPass --> PassA[Pass A: Original image]
DualPass --> PassB[Pass B: Enhanced image]
PassB --> Upscale[Dynamic upscale<br/>target 2200px longest edge]
Upscale --> Binarize[Adaptive binarization<br/>threshold = avg luminance − 12]
PassA --> Score[Weighted quality scoring<br/>confidence × 4 + textLen/100<br/>+ lines × 0.5 + words × 0.05]
Binarize --> Score
Score --> Best[Select highest-scoring pass]
Best --> SendText
Key implementation details:
- Dual-pass strategy: Every image is OCR'd twice — once raw, once with preprocessing (dynamic upscaling + adaptive contrast binarization). The pass with the highest composite quality score wins.
- PDF hybrid extraction: Digital PDFs use native text extraction via
pdfjs-dist. Only pages with insufficient text (< 50 chars) are rendered to canvas and run through Tesseract.js. - Preprocessing algorithm: Computes per-pixel luminance (
0.2126R + 0.7152G + 0.0722B), calculates an adaptive threshold from the image's average luminance, and binarizes at 1.35× contrast.
The backend ScheduleParser extracts structured schedule blocks from raw OCR text using a tiered approach:
flowchart TD
Input([OCR text from frontend]) --> LocalParse[Stage 1: Local Regex Parser<br/>Pattern matching for days, times, subjects]
LocalParse --> Confidence[Stage 2: Confidence Scoring<br/>Per-block heuristic 0.0 → 1.0]
Confidence --> Evaluate{Average confidence ≥ 0.62<br/>AND table ratio < 30%<br/>AND block count ≥ 4?}
Evaluate -->|Yes| ReturnLocal([Return local results])
Evaluate -->|No| GeminiCheck{GEMINI_API_KEY<br/>configured?}
GeminiCheck -->|No| FallbackLocal([Return local results + warning])
GeminiCheck -->|Yes| GeminiCall[Stage 3: Gemini Vision API<br/>Structured JSON prompt → gemini-2.5-flash]
GeminiCall --> GeminiOK{Gemini returned<br/>valid blocks?}
GeminiOK -->|Yes| ReturnGemini([Return Gemini results])
GeminiOK -->|No / Rate limited| FallbackLocal
Confidence scoring formula (per block):
time_score(60% weight): 1.0 if both start+end times present, 0.5 if only start, 0.0 if noneday_score(25% weight): 1.0 for weekdays, 0.0 for weekendstitle_score(15% weight): proportional to title length (capped at 20 chars)- Table-origin blocks receive a 0.65× penalty; fragmented rows get an additional 0.6× penalty
Escalation triggers:
- Average confidence <
0.62(configurable viaGEMINI_ESCALATION_CONFIDENCE_THRESHOLD) - Table-heavy layout (≥ 30% of blocks extracted via table heuristics)
- Fewer than 4 blocks extracted (configurable via
GEMINI_ESCALATION_MIN_BLOCKS)
The planner generates optimized study plans by distributing incomplete assignments across a date range while respecting class schedules and avoiding conflicts.
flowchart TD
Start([Generate plan for date range]) --> Categorize[Auto-categorize assignments<br/>by keyword analysis]
Categorize --> Split[Split into two groups]
Split --> Imminent[Imminent: due within plan window<br/>Sort by due date ASC, then priority]
Split --> Flexible[Flexible: no/late due date<br/>Sort by priority DESC, then due date]
Imminent --> Merged[Merged queue: imminent first, then flexible]
Flexible --> Merged
Merged --> DayLoop[For each day in date range]
DayLoop --> ScoreCandidates[Score candidates for today<br/>Score = Urgency × Priority]
ScoreCandidates --> Note["Urgency = 100 / (days_remaining + 1)<br/>No due date → urgency = 1.0"]
Note --> FillDay[Greedy slot filling<br/>up to 180 min/day cap]
FillDay --> FindSlot{Free slot in<br/>14:00–20:00?}
FindSlot -->|Yes| Book[Book 60-min chunk]
FindSlot -->|No| Fallback{Free slot in<br/>09:00–22:00?}
Fallback -->|Yes| Book
Fallback -->|No| SkipDay[Skip to next day]
Book --> ConflictCheck[Verify no overlap with:<br/>• SchoolClass timetable<br/>• Approved TaskBlocks<br/>• Earlier draft blocks in this plan]
ConflictCheck --> DayLoop
Algorithm characteristics:
- Two-tier sorting: Imminent items (deadline within plan window) are scheduled first, flexible items fill remaining capacity
- Urgency-priority scoring:
Score = (100 / (days_remaining + 1)) × priority— assignments due tomorrow score 50× higher than those due in two months - Chunking: Large assignments are broken into 60-minute blocks across multiple days to prevent fatigue
- Cramming mode: When an assignment is due within 1 day, the per-day chunk limit is bypassed (up to the full 180-minute daily cap)
- Conflict avoidance: Every proposed slot is validated against the student's recurring class schedule, existing task blocks, and previously generated draft blocks in the same plan
Background processing is split across four dedicated Celery queues, each tuned for its specific workload:
flowchart LR
API[Django API] -->|task.delay| Redis[(Redis Broker)]
Redis -->|coffeeshops queue| W1[celery-worker<br/>Orchestrator]
Redis -->|apify queue| W2[celery-apify-worker<br/>concurrency=2]
Redis -->|ai queue| W3[celery-ai-worker<br/>concurrency=1]
Redis -->|besttime queue| W4[celery-besttime-worker<br/>concurrency=1]
W1 -->|Enqueues sub-tasks| Redis
W2 -->|HTTP I/O| Apify[Apify Google Places]
W3 -->|Local inference| Ollama[Ollama Llama 3.2]
W4 -->|Rate-limited API| BestTime[BestTime.app]
| Queue | Worker | Concurrency | Bottleneck Type | What It Does |
|---|---|---|---|---|
coffeeshops |
celery-worker |
default | General | Orchestrates the AI profile pipeline, dispatches sub-tasks |
apify |
celery-apify-worker |
2 | I/O-bound (HTTP wait) | Crawls Google Places reviews via Apify actors |
ai |
celery-ai-worker |
1 | CPU/RAM-bound | Scores reviews & generates AI profiles via Ollama (Llama 3.2) |
besttime |
celery-besttime-worker |
1 | Rate-limited API | Fetches live occupancy forecasts from BestTime.app |
Why concurrency=1 on the AI queue? Running local LLM inference (Ollama) is CPU/RAM-intensive. Concurrent inference would cause thread thrashing and OOM kills. Serializing to a single worker ensures stable execution.
AI Profile Pipeline flow:
sequenceDiagram
participant User
participant API as Django API
participant CQ as coffeeshops queue
participant AQ as apify queue
participant AIQ as ai queue
participant DB as PostgreSQL
User->>API: POST /generate-ai-profile/{id}
API->>CQ: process_location_profile_task
CQ->>AQ: fetch_reviews_task (rate_limit=5/m, max_retries=3)
AQ-->>AQ: Crawl Google reviews via Apify
AQ-->>AQ: Merge with in-app user reviews
AQ->>AIQ: score_location_task (reviews payload)
AIQ-->>AIQ: Send reviews to Ollama with structured JSON schema
AIQ-->>AIQ: Parse scores: laptop_friendly, study_friendly, noise_level
AIQ->>DB: Upsert AIAggregateProfile + compute overall_rating
AIQ-->>API: Return profile result
Study locations are stored with PostGIS PointField coordinates (SRID 4326 / WGS 84). The frontend renders locations on an interactive Mapbox GL JS map with:
- Real-time filtering by AI-scored attributes (study-friendly, laptop-friendly, noise level)
- User favorites with custom notes
- Per-location review system with weighted overall ratings
- Live crowdness indicators from BestTime.app
The streak engine (app/utils.py) recalculates on every dashboard visit:
- Tracks consecutive study days using the student's configured timezone
- Accumulates
total_study_hoursfrom completedTaskBlockdurations - Awards achievements based on rule triggers (e.g., "First Step" for 1 completed block, "Academic Beast" for 5 sessions in one day, "Night Owl" for sessions ending after 22:00)
erDiagram
User ||--o| UserProfile : has
User ||--o{ Assignment : creates
User ||--o{ SchoolClass : has
User ||--o{ TaskBlock : schedules
User ||--o{ UserReview : writes
User ||--o{ UserFavPlace : saves
User ||--o{ UserAchievement : earns
User ||--o{ GeneratedPlan : generates
Assignment ||--o{ TaskBlock : "scheduled into"
Assignment ||--o{ DraftTaskBlock : "planned into"
GeneratedPlan ||--o{ DraftTaskBlock : contains
Location ||--o{ UserReview : "reviewed at"
Location ||--o{ UserFavPlace : "favorited"
Location ||--o{ AIAggregateProfile : "AI-scored"
Location ||--o{ AIProfileGenerationJob : "generation tracked"
Location ||--o{ BestTimeCrowdnessJob : "crowdness tracked"
Achievement ||--o{ UserAchievement : "earned by"
├── backend/
│ ├── app/ # UserProfile, Assignment, SchoolClass, TaskBlock,
│ │ # Achievement models; auth views; streak engine
│ ├── coffeeshops/ # Location, UserReview, UserFavPlace,
│ │ │ # AIAggregateProfile models; Celery tasks
│ │ └── services/ # ai_profile_service (Ollama integration)
│ │ # ai_study_recommender (mood/assignment matching)
│ │ # apify_reviews, apify_places, google_places
│ ├── schedule/ # GeneratedPlan, DraftTaskBlock models
│ │ └── services/ # schedule_parser (hybrid OCR + Gemini)
│ │ # planner (greedy scheduling algorithm)
│ │ # assignment_parser (Gemini text extraction)
│ └── core/ # Django settings, Celery config, URL routing
├── frontend/
│ └── assets/
│ ├── javascript/
│ │ ├── components/ # PlacesMap, Schedule, Planner, TaskForm,
│ │ │ # Achievements, AIStudyFinder, etc.
│ │ ├── api/ # REST client (openapi-fetch)
│ │ └── utils/ # ocrService.js (Tesseract.js + PDF.js)
│ └── styles/
├── compose.yaml # 8 services: db, redis, web, vite,
│ # worker, apify-worker, ai-worker, besttime-worker
└── README.md
| Endpoint Group | Routes | Auth | Description |
|---|---|---|---|
| Auth | POST /api/register/, POST /api/login/ |
Public | Registration + token-based login |
| User | GET /api/me/, GET/PUT /api/me/profile/ |
Token | Profile, streak, study hours |
| Assignments | GET/POST /api/schedule/assignments/ |
Token | CRUD for academic tasks |
| Schedule | POST /api/schedule/parse/ |
Token | OCR text → parsed schedule blocks |
| Planner | POST /api/schedule/planner/generate/ |
Token | Generate draft study plan |
| Planner Approval | POST /api/schedule/planner/{id}/approve/ |
Token | Approve draft → create TaskBlocks |
| Locations | GET /api/coffeeshops/map/ |
Public | All locations with coordinates + profiles |
| Reviews | GET/POST /api/coffeeshops/{id}/reviews/ |
Mixed | Per-location reviews (public read, auth write) |
| AI Profile | POST /api/coffeeshops/{id}/generate-ai-profile/ |
Token | Trigger async AI scoring pipeline |
| AI Recommend | POST /api/coffeeshops/ai-recommend/mood/ |
Token | AI-powered study spot by mood |
| Crowdness | POST /api/coffeeshops/{id}/besttime/generate/ |
Token | Trigger BestTime occupancy fetch |
| Achievements | GET /api/achievements/ |
Token | List all achievements with status |
| OpenAPI | GET /api/schema/, GET /api/schema/redoc/ |
Public | Auto-generated spec via drf-spectacular |
Minimum required for backend:
DJANGO_SECRET_KEYDEBUGDJANGO_ALLOWED_HOSTSDATABASE_ENGINEDATABASE_NAMEDATABASE_USERNAMEDATABASE_PASSWORDDATABASE_HOSTDATABASE_PORT
Optional integrations:
GEMINI_API_KEY,GEMINI_MODEL,GEMINI_ESCALATION_CONFIDENCE_THRESHOLDAPIFY_API_TOKEN,APIFY_REVIEWS_ACTOR_ID,APIFY_MAX_REVIEWS_PER_PLACEOLLAMA_HOST,OLLAMA_MODELVITE_MAPBOX_ACCESS_TOKEN(frontend map)
-
Create a
.envfile in the project root:cp .env.example .env
-
Start the stack:
docker compose up -d --build
-
Run migrations:
docker compose exec web python manage.py makemigrations docker compose exec web python manage.py migrate
-
(Optional) create an admin user:
docker compose exec web python manage.py createsuperuser
Useful URLs:
- App/API:
http://localhost:8000/ - Admin:
http://localhost:8000/admin/ - Frontend dev (Vite):
http://localhost:5173/ - OpenAPI:
http://localhost:8000/api/schema/ - ReDoc:
http://localhost:8000/api/schema/redoc/
For detailed Docker troubleshooting and operations, see README.Docker.md.
Prerequisites:
- Python 3.11+
- Node.js 20+
- PostgreSQL + PostGIS extension
- Redis
Backend:
cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtImportant: local Django commands run from backend/ do not automatically load the root .env file. Run source ../.env (or export variables manually) before using manage.py.
Frontend:
cd frontend
npm install
npm run dev# backend logs
docker compose logs -f web
# service status
docker compose ps
# regenerate OpenAPI schema in container
docker compose exec web python manage.py spectacular --file /app/openapi-schema.yaml --validate
# regenerate frontend API types
docker compose exec vite npm run gen:api# backend tests (54 tests)
docker compose exec web python manage.py testFor local runs, the equivalent command is python manage.py test in backend/ with environment variables already loaded.