Skip to content
View Gowthamch9's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Vsion Technologies
  • Denton, Texas

Block or report Gowthamch9

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Gowthamch9/README.md

Gowtham Venkat Eathamokkala Avatar

Denton, Texas, USA

LinkedIn Google Scholar Email Phone


Research Agenda

When a large language model confidently returns a wrong answer to a high-stakes clinical or financial query, who bears the cost — and how do we prevent it? This question has defined my research trajectory. Working at the intersection of machine learning and real-world data infrastructure — from building financial data pipelines at scale to studying the failure modes of generative AI — I have come to focus on a single, urgent problem: how do we build AI systems that know what they don’t know? I am fascinated to develop scalable frameworks for uncertainty quantification and trustworthiness in AI systems operating on high-volume, real-world data.


Research Interests

  • Reliability and Trustworthiness of AI Systems
  • Machine Learning and Statistical Learning Theory
  • Scalable Data Systems and Large-Scale Data Analytics
  • Uncertainty Quantification in Predictive Models

Education

University of North TexasMaster of Science in Advanced Data Analytics
May 2024GPA 3.636 / 4.0

Gokaraju Rangaraju Institute of Engineering and TechnologyB.Tech in Electronics and Communication Engineering
May 2022GPA 3.36 / 4.0


Research Experience

Graduate Research Affiliate — Reliability of Large Language Models
Computational Healthcare & BioTechnology Lab, Dr. Mohammed Aledhari, University of North Texas • Jan 2025 – Present

  • Investigating failure modes and uncertainty-aware decision-making processes in large language models (LLMs) to address the critical problem of confident yet inaccurate AI outputs.
  • Designing experimental protocols to evaluate model calibration and reliability across diverse query domains, contributing to the development of trustworthy AI evaluation frameworks.
  • Analyzing patterns of model hallucination and overconfidence, identifying systematic failure categories that inform uncertainty quantification strategies for real-world deployment.

Capstone Research — Energy Consumption Forecasting
University of North Texas • Fall 2023 – Spring 2024

  • Built predictive machine learning pipelines to forecast U.S. energy consumption using large-scale time series datasets, implementing preprocessing, feature engineering, and anomaly detection workflows.
  • Evaluated multiple model architectures using RMSE and MSE metrics, revealing how temporal dependencies and distributional shift degrade prediction accuracy in long-horizon forecasting.
  • Produced research-style technical reports and visualizations supporting sustainable energy planning applications.

Transportation Safety and Spatiotemporal Analytics Project
National Student Data Corps • Jan 2025

  • Conducted end-to-end geospatial and temporal analysis of large-scale NYC traffic collision data, including schema validation, missing-value diagnostics, and bias-aware preprocessing.
  • Applied time-series decomposition, seasonality extraction, anomaly detection, and geospatial hotspot analysis to identify temporal risk patterns and spatial crash clusters.
  • Synthesized findings into a professional research poster presented to the U.S. Department of Transportation Federal Highway Administration.

Independent Research — Structural and Demographic Patterns in Custodial Arrests
University of North Texas • May 2024

  • Performed large-scale exploratory data analysis on Dallas Police Department arrest records using statistical aggregation, categorical encoding, and spatial pattern detection.
  • Uncovered non-intuitive patterns in high-density residential environments and weapon involvement prevalence, generating evidence-based insights for municipal resource allocation policy.

Publications and Manuscripts

  • B. V. Kumar, A. Bharat, G. V. Eathamokkala, Y. S. S. Harsha, A. U. Sree, "Analysis of an IoT based Water Quality Monitoring System," I-SMAC 2022. DOI: 10.1109/I-SMAC55078.2022.9987360.
  • Al-Edhari, A., G. V. Eathamokkala, & Rahouti, M. (2026). Response drift across frontier large language models. Manuscript under review at Nature Machine Intelligence.

Presentations and Posters

  • Spatiotemporal Analysis of New York City Traffic Crash Data — Research Poster, National Student Data Corps; presented to USDOT FHWA, Jan 2025.

Professional Experience

Data Engineer — Vsion Technologies, Austin TX • Sep 2024 – Present

  • Design scalable data pipelines integrating Kafka streaming with PostgreSQL analytical storage layers for large-scale structured financial datasets, identifying computational bottlenecks that motivate research in scalable data systems.
  • Develop optimized relational data models and layered analytical views enabling downstream statistical analysis and machine learning workflows, with emphasis on reproducibility and experimental consistency.
  • Implement feature engineering pipelines and apply query optimization and modular schema design to improve computational efficiency for large-scale ML deployment.

Data Analyst — Zetatek Technologies Pvt Ltd, Hyderabad, India • Jan 2022 – Dec 2022

  • Analyzed operational and financial datasets using SQL Server and SSIS to identify statistical trends and optimize resource allocation strategies across business units.
  • Developed automated analytical dashboards and integrated SQL-based data pipelines for structured reporting and visual analytics communication.

Academic Projects

TriSQL Framework — Text-to-SQL Research Implementation

Independently implemented a three-stage Text-to-SQL framework inspired by the TriSQL architecture (Nature Scientific Reports, 2026), converting plain English questions into executable SQL queries using open-source tools running entirely on local hardware.

  • Designed and built a semantic schema selector using sentence-transformers (all-MiniLM-L6-v2) to filter relevant database tables via cosine similarity, reducing prompt noise and improving generation quality.
  • Developed a two-step structured SQL generator that first identifies required SQL clauses (JOIN, GROUP BY, WHERE) before generating the complete query — improving syntactic correctness over single-prompt approaches.
  • Implemented a complexity-aware refinement stage that classifies generated SQL as Easy, Medium, or Hard and applies tiered error correction including execution feedback loops for hard queries.
  • Evaluated on the Spider benchmark dataset (Yale University) — achieving 70% Execution Accuracy and 100% Executability Rate using SQLCoder via Ollama with no GPU or API costs.
  • Deployed a FastAPI web interface enabling non-technical users to query any SQLite database in plain English and view results directly in a browser.

GitHub: https://github.com/Gowthamch9/trisql-framework

IoT-Based Water Quality Monitoring System (E-Aqua)

  • Engineered an IoT-enabled prototype integrating pH, turbidity, TDS, temperature, and flow sensors with an Arduino-based controller for real-time environmental data acquisition.
  • Implemented wireless data transmission via ESP8266 Wi-Fi module, enabling continuous remote monitoring through a cloud-connected dashboard and mobile interface.
  • Conducted comparative analysis of monitoring technologies, demonstrating a low-cost, scalable system for municipal, aquaculture, and agricultural applications. Published in IEEE.

Technical Skills

  • Programming and Data Science: Python (NumPy, Pandas, scikit-learn, PyTorch, TensorFlow), SQL, PySpark, R
  • Machine Learning and Statistics: Regression, Classification, Clustering, Neural Networks, Time Series Forecasting, PCA, Random Forests, Support Vector Machines, Cross-Validation, Model Evaluation, Uncertainty Quantification
  • Big Data and Cloud Systems: Apache Spark, Kafka, Google Cloud Platform (BigQuery, Dataflow, Pub/Sub, Dataproc, Cloud SQL, Vertex AI), Snowflake
  • Databases: PostgreSQL, MySQL, Microsoft SQL Server
  • Visualization and Tools: Matplotlib, Seaborn, Power BI, Tableau, Git, Jupyter Notebook, LaTeX

Teaching and Mentoring

  • Coding Tutor: Taught Python and Scratch to middle and high school students through project‑based learning (~25 students).
  • Volunteer Data Science Mentor: National Student Data Corps — guided teams on preprocessing, analysis, and poster development.

Certifications and Honors

Certifications:

  • Google Cloud Data Engineering and ML Specialization
  • Google Advanced Data Analytics Professional Certificate
  • Microsoft Power BI for Data Analysts
  • Advanced SQL for Data Engineering.

Honors:

  • Selected Participant, National Student Data Corps
  • Graduate GPA 3.636 / 4.0, University of North Texas.

Languages

  • English: Professional proficiency
  • Telugu: Native
  • Hindi: Conversational proficiency
  • Tamil: Conversational Proficiency

Visual Badges and Quick Links

ORCID

Pinned Loading

  1. Data-Analysis-using-python Data-Analysis-using-python Public

    This Repo consists of coding in python for data analysis

    Jupyter Notebook

  2. Machine-Learning-using-Python Machine-Learning-using-Python Public

    This Repo consists of foundational Machine Learning python code.

    Jupyter Notebook

  3. MSSQL_Queries MSSQL_Queries Public

    TSQL

  4. Power-BI-Projects Power-BI-Projects Public

    This repo consists of Dashboards and visualizations using Power BI.

  5. TDSP-Transportation_Data_Science_Project TDSP-Transportation_Data_Science_Project Public

    This repo consists of Transportation Data Science Project. This project analyzes the new york city car crashes

    Jupyter Notebook

  6. Pizza-Sales-Excel-Project Pizza-Sales-Excel-Project Public

    This Repo consists of an excel file where I have created Pivot tables, Charts and Dashboards using Microsoft Excel