An AI-powered web application that transforms PDF resumes into beautiful, customizable personal portfolios. Folibuddy uses Google Gemini 2.0 Flash to intelligently parse resume content, extract structured data, and generate publish-ready portfolio websites.
Live Demo: https://folibuddy.onrender.com/
- PDF Resume Upload → Extracts text and hyperlinks from PDF files
- AI-Powered Parsing → Uses Google Gemini 2.0 Flash to extract projects, experience, research
- Interactive Editor → Users can edit extracted data before generating portfolio
- Portfolio Generation → Creates a static HTML/CSS portfolio website ready for GitHub Pages
- Profile Image Support → Optional profile image upload and integration
- AI Description Generator → Enhance project descriptions using AI
- Tech Stack
- High-Level Architecture
- Complete Workflow
- File Structure & Responsibilities
- Detailed Pipeline Steps
- API Endpoints
- Data Flow Diagram
- Installation & Setup
- Troubleshooting
- Future Enhancements
- FastAPI - Modern, high-performance web framework for building APIs
- Uvicorn - Lightning-fast ASGI server for running FastAPI applications
- Jinja2 - Powerful template engine for HTML rendering
- Requests - HTTP client library for API communication
- Python-Multipart - Multipart form data parsing for file uploads
- pdfplumber - Advanced text extraction from PDF documents
- PyPDF2 - Hyperlink and annotation extraction from LaTeX-generated PDFs
- Google Gemini API - Advanced AI for intelligent resume parsing and content extraction
- SDK Version:
google-genai1.61.0+ (released January 30, 2026) - Model Used:
gemini-2.5-flash- Latest Google Gemini model for structured data extraction
- HTML5 - Modern semantic markup
- CSS3 - Advanced styling and responsive design
- Vanilla JavaScript - Dynamic interactivity without framework dependencies
- JSON - Lightweight data persistence for portfolio information
┌─────────────────────────────────────────────────────────────────┐
│ USER UPLOADS PDF │
└────────────────────┬────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ STEP 1: PDF EXTRACTION (resume_parser.py) │
│ - Extract text using pdfplumber │
│ - Extract hyperlinks using PyPDF2 │
│ - Combine text + hyperlinks │
└────────────────────┬────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ STEP 2: LLM PARSING (llm_gemini_parser.py) │
│ - Send raw text to Google Gemini API │
│ - Extract: Projects, Research, Experience │
│ - Classify and structure data │
└────────────────────┬────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ STEP 3: DATA STORAGE (portfolio_generator.py) │
│ - Save extracted data to output/portfolio.json │
│ - Single source of truth for all portfolio data │
└────────────────────┬────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ STEP 4: EDITOR (editor.html) │
│ - Display extracted data in editable form │
│ - Allow user to add/edit/remove content │
│ - Upload optional profile image │
│ - Generate project descriptions using AI │
└────────────────────┬────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ STEP 5: PORTFOLIO GENERATION (portfolio.py) │
│ - Load template.html │
│ - Render with Jinja2 using user data │
│ - Copy CSS and assets │
│ - Output to Desktop/Personal Portfolio/ │
└─────────────────────────────────────────────────────────────────┘
- User navigates to
http://127.0.0.1:8000/ - Lands on
frontend/upload.html(upload page) - User selects PDF resume file
- Form submits to
/upload-resume-webendpoint - Backend extracts text + hyperlinks from PDF
- Raw text sent to
parse_resume(text)function - Regex extractors pull: name, email, phone, skills, links
- LLM extractor (
parse_resume_gemini) sends text to Google Gemini API - LLM returns structured JSON with:
- Projects (title, description, repo)
- Research (title, description)
- Experience (company, role, dates, description, skills)
- Extracted data saved to
output/portfolio.json - This becomes the single source of truth
- Data includes: name, email, phone, skills, projects, experience, research, links
- User sees
backend/templates/editor.html - All extracted fields displayed in editable form
- User can:
- Edit any text field
- Add/remove projects
- Add custom links
- Upload profile image
- Generate AI descriptions for projects (via
/generate-description)
- User clicks "Generate Portfolio"
- Form data POST to
/generateendpoint - Backend:
- Parses form data
- Updates
portfolio.json - Calls
generate_portfolio(resume)
- Portfolio generator:
- Loads
template.html(Jinja2 template) - Renders with user data
- Copies CSS files
- Saves to
Desktop/Personal Portfolio/
- Loads
- User receives success message with file path
Folibuddy/
│
├── run.py # Application entry point
├── README.md # Project documentation (this file)
├── PIPELINE_DOCUMENTATION.md # Detailed technical documentation
│
├── backend/ # Core backend logic
│ ├── main.py # FastAPI app & routes
│ ├── resume_parser.py # PDF text extraction
│ ├── llm_project_extractor.py # LLM-based project extraction
│ ├── portfolio.py # Portfolio HTML generation
│ ├── portfolio_generator.py # Data persistence (JSON save/load)
│ ├── llm_generator.py # AI description generator
│ ├── requirements.txt # Python dependencies
│ ├── utils/
│ │ └── formatters.py # Text formatting utilities
│ └── templates/
│ ├── editor.html # Interactive editor page
│ └── template.html # Portfolio HTML template
│
├── frontend/ # Static frontend files
│ ├── upload.html # Initial upload page
│ ├── script.js # Frontend JavaScript
│ └── style.css # Basic styles
│
├── static/ # Static assets
│ ├── template-style.css # Portfolio CSS
│ └── uploads/ # User-uploaded images
│
├── output/ # Generated data
│ └── portfolio.json # Extracted resume data
│
└── test_extraction.py # Diagnostic test script
Purpose: Start the FastAPI server
# Adds project root to Python path
# Starts uvicorn server on http://127.0.0.1:8000
# Enables hot reload for developmentKey Details:
- Host:
127.0.0.1 - Port:
8000 - Reload:
True(watches for file changes)
Purpose: Central FastAPI application with all endpoints
- Returns
frontend/upload.html - Entry point for users
- Input: PDF file (multipart/form-data)
- Process:
- Extract text from PDF
- Parse resume with LLM
- Save to
output/portfolio.json - Format descriptions for editor
- Return editor HTML
- Output: Rendered
editor.htmlwith extracted data
- Input: Form data from editor
- Process:
- Parse all form fields (projects, experience, skills, etc.)
- Handle profile image upload
- Update
portfolio.json - Generate portfolio HTML
- Output: Success message + file path
- Input: JSON
{title, repo_url, current_description} - Process: Calls LLM to generate/enhance project description
- Output: JSON
{description}
- Input: None
- Process: Loads
portfolio.jsonand renderstemplate.html - Output: Portfolio preview (HTML)
Purpose: Extract all text and hyperlinks from PDF
- Uses pdfplumber to extract text
- Uses PyPDF2 to extract hyperlinks (annotations)
- Appends hyperlinks to text for later extraction
- Essential for: LaTeX-generated PDFs with embedded links
- Scans first 8 lines of resume
- Looks for 2-5 word sequences (typical names)
- Excludes lines with "email", "phone", etc.
- Regex:
[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}
- Regex:
\b\d{10}\b(10-digit numbers)
- Matches against predefined skill vocabulary
- Includes: Python, C, C++, Java, TensorFlow, React, etc.
- Finds explicit URLs:
https?://... - Infers usernames: "GitHub: username" →
https://github.com/username - Returns structured dict with github, linkedin, leetcode, website, custom
- Main orchestrator for resume parsing
- Calls
extract_projects_with_llm()for complex parsing - Returns complete structured data
Purpose: Extract projects, research, and experience using Google Gemini AI
- Step 1: Initialize Gemini client with API key
- Step 2: Send resume text to Gemini with structured prompt
- Step 3: Request JSON output with specific schema
- Step 4: Handle retry logic for rate limits (503/429 errors)
- Step 5: Parse and validate JSON response
- Step 6: Return structured resume data
LLM Prompt Structure:
Task: Extract structured information from resume text
Rules:
- Extract ALL projects, experience, education, and skills
- DO NOT invent or hallucinate data
- If a field is missing, use empty string "" or empty array []
- description fields MUST be arrays of strings (bullet points)
- Research papers go in "research", NOT "projects"
- Work experience goes in "experience", NOT "projects"
Output JSON schema:
{
"projects": [{title, description[], technologies[], repo}],
"research": [{title, description[], publication}],
"experience": [{company, role, from, to, description[], skills[]}]
}
Purpose: Save/load portfolio data as JSON
- Creates
output/directory if missing - Writes
portfolio.jsonwith UTF-8 encoding - Pretty-printed with 2-space indent
- Reads
output/portfolio.json - Returns None if file doesn't exist
Data Schema:
{
"name": "string",
"email": "string",
"phone": "string",
"headline": "string",
"about": "string",
"skills": ["string"],
"links": {
"github": "url",
"linkedin": "url",
"custom": [{"label": "", "url": ""}]
},
"projects": [
{
"title": "string",
"description": ["string"],
"repo": "url"
}
],
"experience": [
{
"company": "string",
"role": "string",
"from": "date",
"to": "date",
"description": ["string"],
"skills": ["string"]
}
],
"research": [
{
"title": "string",
"description": ["string"]
}
],
"profile_image": "/uploads/filename.jpg"
}Purpose: Render final portfolio website
- Step 1: Load Jinja2 template (
template.html) - Step 2: Prepare portfolio data with defaults
- Step 3: Render HTML with template engine
- Step 4: Fix CSS paths for static generation
- Step 5: Create output folder (
Desktop/Personal Portfolio/) - Step 6: Write
index.html - Step 7: Copy
template-style.css - Step 8: Copy profile image (if exists)
- Step 9: Generate
README.mdwith deployment instructions
Output Structure:
Desktop/Personal Portfolio/
├── index.html # Main portfolio page
├── template-style.css # Styles
├── uploads/
│ └── profile.jpg # User's profile image
└── README.md # Deployment guide
Purpose: Editable form for all extracted data
Key Features:
- Pre-filled with extracted data
- Dynamic project/experience addition
- Profile image upload
- AI description generator button
- Custom link management
- Real-time form validation
JavaScript Functions:
addProject()- Dynamically add project fieldsremoveProject(index)- Remove projectaddCustomLink()- Add custom link fieldgenerateDescription(index)- Call AI endpoint for descriptions
Purpose: Jinja2 template for final portfolio
Sections:
- Hero Section (name, headline, profile image)
- About Section
- Skills Grid
- Projects Showcase (with repo links)
- Experience Timeline
- Research/Publications
- Contact Links
Template Variables:
{{ name }}{{ headline }}{{ about }}{{ skills }}(loop){{ projects }}(loop){{ experience }}(loop){{ research }}(loop){{ links }}
Purpose: Landing page with resume upload form
Form:
- File input (accepts PDF only)
- Submit button
- Posts to
/upload-resume-web - Uses
multipart/form-dataencoding
Purpose: Enhance project descriptions using Google Gemini AI
- Accepts
{title, repo_url, current_description}as JSON - Fetches GitHub README via the public GitHub API (no auth required)
- Sends project title + README context to Gemini
- Returns 4–6 bullet-point descriptions formatted for the portfolio
| Method | Endpoint | Input | Output | Purpose |
|---|---|---|---|---|
| GET | / |
None | HTML | Serve upload page |
| POST | /upload-resume-web |
PDF file | HTML (editor) | Parse resume & show editor |
| POST | /generate |
Form data | JSON | Generate portfolio |
| POST | /generate-description |
JSON | JSON | AI description generation |
| GET | /portfolio |
None | HTML | Preview portfolio |
┌─────────┐
│ PDF File│
└────┬────┘
│
▼
┌──────────────────┐
│ resume_parser.py │──┐
└──────────────────┘ │
│ Raw Text
▼
┌───────────────────────────┐
│ llm_gemini_parser.py │
│ (Google Gemini API) │
└────────────┬──────────────┘
│
│ Structured Data
▼
┌────────────────────────┐
│ portfolio_generator.py │
│ (Save JSON) │
└────────────┬───────────┘
│
│ portfolio.json
▼
┌──────────────┐
│ editor.html │
│ (User Edits) │
└──────┬───────┘
│
│ Edited Data
▼
┌──────────────┐
│ portfolio.py │
│ (Jinja2) │
└──────┬───────┘
│
▼
┌────────────────────────┐
│ Desktop/Personal │
│ Portfolio/index.html │
└────────────────────────┘
- Python 3.8+
- Google Gemini API key (Get one at "https://aistudio.google.com/app/apikey")
- Git (optional)
-
Clone the repository
git clone https://github.com/Rakshak05/Folibuddy.git cd Folibuddy -
Install dependencies
pip install -r requirements.txt
Note: If upgrading from an older version, ensure you have
google-genai>=1.61.0:pip install --upgrade google-genai
-
Set up Gemini API key
# Option 1: Export as environment variable export GEMINI_API_KEY="your-api-key-here" # Option 2: Create a .env file in project root echo "GEMINI_API_KEY=your-api-key-here" > .env
-
Start the development server
python -m uvicorn run:app --host 127.0.0.1 --port 8000 --reload
run:app→ loads theappobject fromrun.py
--host 127.0.0.1→ local only (use0.0.0.0to expose on network)
--port 8000→ serves on port 8000
--reload→ auto-restarts on file changes (development mode) -
Access the application
- Open your browser and navigate to
http://127.0.0.1:8000/
- Open your browser and navigate to
Cause: Breaking changes in google-genai library version 1.61.0 released on January 30, 2026.
Solution:
- Update the library:
pip install --upgrade google-genai>=1.61.0 - Verify installation:
pip show google-genai # Should show version 1.61.0 or higher - If your code is from before Feb 2026, pull the latest updates from the repository
Solution:
- Check if your API key is set:
echo $GEMINI_API_KEY - Verify API key is valid at Google AI Studio
- Ensure you have internet connection for API calls
Causes:
- PDF doesn't have "PROJECTS" section
- Section is named differently (e.g., "PERSONAL PROJECTS")
- Gemini API failed to parse
Solution:
- Check
llm_project_extractor.pyline 136 regex - Add alternative keywords
- Run
test_extraction.pyto debug
Causes:
- Image path incorrect in JSON
- Image not copied to output folder
- CSS path issue
Solution:
- Check
output/portfolio.json→profile_imagefield - Verify
Desktop/Personal Portfolio/uploads/contains image - Check
portfolio.pyline 58-70 (image copy logic)
Solution:
- Ensure "EXPERIENCE" keyword exists in resume
- Check if LLM is classifying as "projects" instead
- Review Gemini prompt in
llm_gemini_parser.py
Run diagnostic test:
python test_extraction.pyCheck what's being extracted:
# Add to resume_parser.py after extract_text_from_pdf
print("DEBUG: Extracted text:")
print(text[:1000]) # First 1000 charsCheck the JSON:
cat output/portfolio.json
# or on Windows
type output\portfolio.json- Edit:
backend/templates/template.html - Edit styles:
static/template-style.css - Server auto-reloads on changes
- Multiple Templates: Support for different portfolio styles
- Cover Letter Generator: AI-generated cover letters
- SEO Optimization: Meta tags and structured data
- Dark Mode: Theme toggle
- Export to PDF: Portfolio as PDF resume
- GitHub Deploy: One-click GitHub Pages deployment
- Analytics: Track portfolio views
- Multi-language Support: International resume parsing
- Custom Themes: User-customizable color schemes
- Cloud Storage Integration: Save portfolios to cloud
This project is under active development.
Project: Folibuddy
Repository: github.com/Rakshak05/Folibuddy
Tech Stack: FastAPI, Google Gemini API, Jinja2, pdfplumber, PyPDF2
LLM Model: gemini-2.5-flash
Last Updated: April 14, 2026