Skip to content
View yellatp's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report yellatp

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
yellatp/README.md
Pavan Yellathakota

Hey DataGeeks, I'm Pavan 👋

I am a Data Explorer passionate about diving into every field where data is prominent. My journey spans the full spectrum from Market Research & Supply Chain Analytics to designing Databases & ETL Pipelines. I extend this expertise into AI, developing Machine Learning and Deep Learning models with a specific focus on BERT-based Text & Semantic Analysis.

Data Scientist | ML Engineer | Product Analytics

📍 Location : Seattle, WA, USA
📞 Mobile : +1 (929) 278-4589
✉️ Email : pavan.yellathakota.ds@gmail.com
Linkedin : https://linkedin.com/in/yellatp
GitHub :   https://github.com/yellatp



👨‍💻 Professional Summary

Data Scientist with 3+ years of experience developing predictive models and automated data infrastructure. Proven track record in improving search precision, designing quantitative research pipelines, and implementing data-driven solutions for marketing and product growth. Skilled in bridging the gap between data engineering and stakeholder decision-making through statistical validation, A/B testing, and interactive analytics.


🛠️ Technical Skills

Domain Stack
Languages & Databases Python Pandas NumPy Scikit-learn SQL PySpark PostgreSQL R
AWS Cloud Data AWS S3 Athena Glue SageMaker Lambda Redshift
ML Frameworks XGBoost Hugging Face OpenAI NLTK Spacy
Tools & Visualization Tableau QuickSight Power BI Excel Git

💼 Professional Experience

Alphonso AI, backed by Shipley Center for Innovation | Founding ML Engineer

Potsdam, NY | Jul 2025 – Present

  • Backend Architecture: Designed a 0→1 Backend Ecosystem using FastAPI and PostgreSQL, orchestrating a scalable microservices bridge between Java-based core services and Python-native ML workloads.
  • Cost-Efficient Infrastructure: Deployed and managed production services on DigitalOcean VPS to optimize infrastructure overhead; implemented Docker-based containerization to ensure environment parity across R&D and production.
  • Advanced Retrieval (RAG): Engineered a Multi-Model "Text-to-Query" (TTQ) engine leveraging Gemini (Vertex AI) and DeepSeek APIs to enable dynamic, prompt-driven semantic search across high-dimensional talent data.
  • Search Optimization: Deployed a multi-stage retrieval pipeline utilizing pgvector for Approximate Nearest Neighbor (ANN) search and CUDA-accelerated Cross-Encoders for high-precision re-ranking (targeting 38% improvement in Precision@N).
  • Domain-Aware Recommendation: Developed a sector-specific ranking system using Vectorized Embeddings; shifted logic from generic role-matching to domain-expertise alignment, improving candidate-to-company fit.
  • Generative Team-Composition: Built a module that translates natural language product descriptions into granular technical requirements and specific candidate matches, bridging the gap for non-technical founders.
  • System Design & MCP: Led relational schema normalization, API contract definition, and R&D into Model Context Protocol (MCP) for agentic, self-correcting database interactions.

Key Technologies Used
Python FastAPI PostgreSQL Docker pgvector HuggingFace Gemini Vertex AI

Student Managed Investment Fund, Clarkson University | Graduate Quantitative Researcher

Potsdam, NY | Sep 2024 – Apr 2025

  • Portfolio Management: Managed a $650K real-capital portfolio, delivering a 51% total return and outperforming the S&P 500 benchmark by 26% (2,600 bps).
  • Alternative Data Pipeline: Built a sentiment analysis engine scraping Reddit/YouTube to validate fundamental buy signals, using BERT-based sentiment scoring to overlay quantitative signals on traditional financial metrics.
  • Automation: Automated the extraction of financial statements from SEC EDGAR using Python & Vertex AI, reducing data collection time by 80% for the analyst team.
  • Risk Modeling: Developed Monte Carlo simulations and risk-parity models to stress-test overweight positions and quantify potential drawdowns for high-conviction trades.

Key Technologies Used
Python BERT HuggingFace Vertex AI Pandas

HAVK Mladost (Elite Athletics Club) | Graduate Data Science Consultant

Potsdam, NY | Oct 2023 – May 2025

  • Cloud Migration: Architected a centralized data lake on AWS S3, migrating legacy records to a queryable cloud environment and reducing data retrieval latency by 40%.
  • ETL Optimization: Developed PySpark ETL jobs on AWS Glue to process 1M+ cross-channel events; utilized partition pruning to optimize query costs and speed.
  • Uplift Modeling: Applied uplift modeling and behavioral clustering to identify high-value fan segments, optimizing marketing spend and merchandise revenue.
  • Performance Analytics: Developed backend services with FastAPI and built interactive dashboards that delivered real-time performance insights to World Championship coaches.

Key Technologies Used
AWS S3 Glue PySpark FastAPI Python

eAppSys Limited | Business Data Analyst

Hyderabad, India | Jul 2022 – Dec 2022

  • Forecasting: Developed demand forecasting models (Prophet/SARIMAX) for 1,500+ SKUs, integrating exogenous variables (holidays, promotions) to improve forecast accuracy (MAPE) by 15%.
  • Reporting Automation: Designed and deployed automated KPI dashboards in Oracle Analytics Cloud (OAC), saving the procurement team 12+ hours/week of manual reporting time.
  • ML Workflows: Implemented GxP-compliant ML workflows on Oracle Cloud Infrastructure (OCI) with real-time alerts, achieving 99.9% uptime for critical inventory monitoring.

Key Technologies Used
Python Prophet SARIMAX Oracle OCI

Kantar GDC India | Data Analyst

Pune, India | Sep 2021 – May 2022

  • Pipeline Automation: Built automated data pipelines for Tracker and Syndicated Research projects using Python and PySpark, integrating 10M+ survey records from 30+ sources and reducing processing latency by 30%.
  • Statistical Analysis: Developed sampling approaches and statistical significance testing to ensure data representativeness across Middle East and Central Africa markets.
  • Consumer Insights: Supported recurring monthly/quarterly client tracking projects by developing regression models and delivering insights for 10+ FMCG and Telecom clients.

Key Technologies Used
Python PySpark Pandas SQL


🏗️ Some Notable Projects

Project Description Tech Stack
Text-Analysis-using-NLP-LDA NLP project focused on topic modeling and text analysis. NLP, LDA, Python
Detoxify Telugu Toxic comment classification for Telugu language. NLP, Deep Learning
Synthetic Data Generator Tool to generate synthetic datasets for testing/training. Python, Data Gen
BingeMax Recommendation Engine Personalized movie recommendation system. ML, Recommender Systems
Fintech Sales GAP Analysis Analyzing sales gaps in fintech products. Data Analysis, Visualization
KonnectR Fullstack App Fullstack web application built with Flask. Flask, Python, Web
PreOwned Cars Price Prediction ML model to predict prices of used cars. Regression, Scikit-learn
Fake News Classifier Identification of fake news articles using ML. Classification, NLP
Content Strategy Netflix Data-driven strategy analysis for Netflix content. Data Science, EDA
Supply Chain Analysis Optimization and analysis of supply chain data. Python, Logistics
GenZ Career Preferences Analysis report on GenZ career trends. Research, Analytics
Website A/B Testing Statistical analysis of A/B test results. Statistics, Python


Last Updated: 2026 by PAVAN YELLATHAKOTA

Pinned Loading

  1. BingeMax-Personalized-Movie-Recommendation-Engine BingeMax-Personalized-Movie-Recommendation-Engine Public

    An AI-powered movie recommender using content-based, collaborative, and cosine similarity models. Built with Streamlit + FastAPI.

    Python

  2. detoxify-telugu detoxify-telugu Public

    A Fine-Tuned BERT-Based Language Model for Hate Speech Detection in Telugu & Tenglish

    Python

  3. KonnectR_flask_fullstack_app KonnectR_flask_fullstack_app Public

    Developed a full-stack web app simulating R\&D collaboration between students, professors, and recruiters with messaging, posting, and analytics modules.

    HTML 1

  4. PreOwnedCars_Price_Prediction_Model_V2.0- PreOwnedCars_Price_Prediction_Model_V2.0- Public

    An additional metric 'Man_dep_OP' (average_manufacturer_depreciation_per_odometer_price_ratio) is introduced for better results. An Extension project for Version1.0.0

    Jupyter Notebook

  5. Synthetic-Data-Generator Synthetic-Data-Generator Public

    Synthetic Data Generator tool with Streamlit UI

    Python

  6. Text-Analysis-using-NLP-LDA Text-Analysis-using-NLP-LDA Public

    Advanced Text Analysis using NLP: Sentiment Analysis, Named Entity Recognition (NER) & Topic Modeling (LDA)

    Jupyter Notebook