PavanYellathakota yellatp

Hey DataGeeks, I'm Pavan 👋

I am a Data Explorer passionate about diving into every field where data is prominent. My journey spans the full spectrum from Market Research & Supply Chain Analytics to designing Databases & ETL Pipelines. I extend this expertise into AI, developing Machine Learning and Deep Learning models with a specific focus on BERT-based Text & Semantic Analysis.

Data Scientist | ML Engineer | Product Analytics

📍 Location : Seattle, WA, USA
📞 Mobile : +1 (929) 278-4589
✉️ Email : pavan.yellathakota.ds@gmail.com
Linkedin : https://linkedin.com/in/yellatp
GitHub : https://github.com/yellatp

👨‍💻 Professional Summary

Data Scientist with 3+ years of experience developing predictive models and automated data infrastructure. Proven track record in improving search precision, designing quantitative research pipelines, and implementing data-driven solutions for marketing and product growth. Skilled in bridging the gap between data engineering and stakeholder decision-making through statistical validation, A/B testing, and interactive analytics.

🛠️ Technical Skills

Domain	Stack
Languages & Databases
AWS Cloud Data
ML Frameworks
Tools & Visualization

💼 Professional Experience

Alphonso AI, backed by Shipley Center for Innovation | Founding ML Engineer

Potsdam, NY | Jul 2025 – Present

Backend Architecture: Designed a 0→1 Backend Ecosystem using FastAPI and PostgreSQL, orchestrating a scalable microservices bridge between Java-based core services and Python-native ML workloads.
Cost-Efficient Infrastructure: Deployed and managed production services on DigitalOcean VPS to optimize infrastructure overhead; implemented Docker-based containerization to ensure environment parity across R&D and production.
Advanced Retrieval (RAG): Engineered a Multi-Model "Text-to-Query" (TTQ) engine leveraging Gemini (Vertex AI) and DeepSeek APIs to enable dynamic, prompt-driven semantic search across high-dimensional talent data.
Search Optimization: Deployed a multi-stage retrieval pipeline utilizing pgvector for Approximate Nearest Neighbor (ANN) search and CUDA-accelerated Cross-Encoders for high-precision re-ranking (targeting 38% improvement in Precision@N).
Domain-Aware Recommendation: Developed a sector-specific ranking system using Vectorized Embeddings; shifted logic from generic role-matching to domain-expertise alignment, improving candidate-to-company fit.
Generative Team-Composition: Built a module that translates natural language product descriptions into granular technical requirements and specific candidate matches, bridging the gap for non-technical founders.
System Design & MCP: Led relational schema normalization, API contract definition, and R&D into Model Context Protocol (MCP) for agentic, self-correcting database interactions.

Key Technologies Used

Student Managed Investment Fund, Clarkson University | Graduate Quantitative Researcher

Potsdam, NY | Sep 2024 – Apr 2025

Portfolio Management: Managed a $650K real-capital portfolio, delivering a 51% total return and outperforming the S&P 500 benchmark by 26% (2,600 bps).
Alternative Data Pipeline: Built a sentiment analysis engine scraping Reddit/YouTube to validate fundamental buy signals, using BERT-based sentiment scoring to overlay quantitative signals on traditional financial metrics.
Automation: Automated the extraction of financial statements from SEC EDGAR using Python & Vertex AI, reducing data collection time by 80% for the analyst team.
Risk Modeling: Developed Monte Carlo simulations and risk-parity models to stress-test overweight positions and quantify potential drawdowns for high-conviction trades.

Key Technologies Used

HAVK Mladost (Elite Athletics Club) | Graduate Data Science Consultant

Potsdam, NY | Oct 2023 – May 2025

Cloud Migration: Architected a centralized data lake on AWS S3, migrating legacy records to a queryable cloud environment and reducing data retrieval latency by 40%.
ETL Optimization: Developed PySpark ETL jobs on AWS Glue to process 1M+ cross-channel events; utilized partition pruning to optimize query costs and speed.
Uplift Modeling: Applied uplift modeling and behavioral clustering to identify high-value fan segments, optimizing marketing spend and merchandise revenue.
Performance Analytics: Developed backend services with FastAPI and built interactive dashboards that delivered real-time performance insights to World Championship coaches.

Key Technologies Used

eAppSys Limited | Business Data Analyst

Hyderabad, India | Jul 2022 – Dec 2022

Forecasting: Developed demand forecasting models (Prophet/SARIMAX) for 1,500+ SKUs, integrating exogenous variables (holidays, promotions) to improve forecast accuracy (MAPE) by 15%.
Reporting Automation: Designed and deployed automated KPI dashboards in Oracle Analytics Cloud (OAC), saving the procurement team 12+ hours/week of manual reporting time.
ML Workflows: Implemented GxP-compliant ML workflows on Oracle Cloud Infrastructure (OCI) with real-time alerts, achieving 99.9% uptime for critical inventory monitoring.

Key Technologies Used

Kantar GDC India | Data Analyst

Pune, India | Sep 2021 – May 2022

Pipeline Automation: Built automated data pipelines for Tracker and Syndicated Research projects using Python and PySpark, integrating 10M+ survey records from 30+ sources and reducing processing latency by 30%.
Statistical Analysis: Developed sampling approaches and statistical significance testing to ensure data representativeness across Middle East and Central Africa markets.
Consumer Insights: Supported recurring monthly/quarterly client tracking projects by developing regression models and delivering insights for 10+ FMCG and Telecom clients.

Key Technologies Used

🏗️ Some Notable Projects

Project	Description	Tech Stack
Text-Analysis-using-NLP-LDA	NLP project focused on topic modeling and text analysis.	NLP, LDA, Python
Detoxify Telugu	Toxic comment classification for Telugu language.	NLP, Deep Learning
Synthetic Data Generator	Tool to generate synthetic datasets for testing/training.	Python, Data Gen
BingeMax Recommendation Engine	Personalized movie recommendation system.	ML, Recommender Systems
Fintech Sales GAP Analysis	Analyzing sales gaps in fintech products.	Data Analysis, Visualization
KonnectR Fullstack App	Fullstack web application built with Flask.	Flask, Python, Web
PreOwned Cars Price Prediction	ML model to predict prices of used cars.	Regression, Scikit-learn
Fake News Classifier	Identification of fake news articles using ML.	Classification, NLP
Content Strategy Netflix	Data-driven strategy analysis for Netflix content.	Data Science, EDA
Supply Chain Analysis	Optimization and analysis of supply chain data.	Python, Logistics
GenZ Career Preferences	Analysis report on GenZ career trends.	Research, Analytics
Website A/B Testing	Statistical analysis of A/B test results.	Statistics, Python

_{Last Updated: 2026 by PAVAN YELLATHAKOTA}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PavanYellathakota yellatp

Achievements

Achievements

Block or report yellatp

👨‍💻 Professional Summary

🛠️ Technical Skills

💼 Professional Experience

Alphonso AI, backed by Shipley Center for Innovation | Founding ML Engineer

Student Managed Investment Fund, Clarkson University | Graduate Quantitative Researcher

HAVK Mladost (Elite Athletics Club) | Graduate Data Science Consultant

eAppSys Limited | Business Data Analyst

Kantar GDC India | Data Analyst

🏗️ Some Notable Projects

Pinned Loading

Uh oh!