Skip to content
View bihari-bhau's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report bihari-bhau

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
bihari-bhau/README.md
Typing SVG

Profile Views


🟢  Open to full-time roles  ·  AI/ML Engineering  ·  Backend  ·  LLM Infra  ·  Batch 2025


$ whoami
  Shubham Singh  ·  bihari-bhau  ·  Gurugram, India 🇮🇳

$ cat current.txt
  LLM Post-Training Intern @ Ethara AI
  ↳ Built Kaiju — AI coding agent benchmark pipeline (Commit0 / ICLR 2025)
  ↳ Evaluating GPT-4o, Claude 4.7, Gemini across 100+ Python repos
  ↳ 500+ RLHF samples annotated · 38 eval criteria · 6+ LLMs benchmarked
  ↳ Journey: EEE → Full-Stack → AI/ML Engineering 🚀

⚡ Featured: Kaiju — AI Coding Agent Benchmark

Benchmarks AI agents on reconstructing Python libraries from scratch. Built on Commit0 (arXiv:2412.01769 · ICLR 2025)

  GitHub Repos    →   AST Stripper   →    Stubs     →   AI Agent    →   pytest   →   Score
  (2000+ ⭐          (function bodies    (empty          (Claude /       (pass        (ethara
   80%+ Python)       stripped → ∅)       shells)         GPT-4o / …)    rate)         splits)
Split Libraries Purpose
ethara 8 libraries Full benchmark
ethara-lite 4 libraries Lightweight eval

🛠 Tech Stack

Languages

Python TypeScript JavaScript Java SQL

Backend & Frontend

FastAPI Node.js PostgreSQL React Next.js TailwindCSS

AI / ML

LLM Eval RLHF AST Pytest

DevOps & Tools

Docker GitHub Actions n8n Supabase


🗂 Projects

Project Stack What it does Live
🦖 Kaiju Python · AST · pytest AI coding agent benchmarking pipeline (Commit0 / ICLR 2025)
📊 rlhf-eval React · FastAPI · PostgreSQL · Docker Full-stack RLHF dataset builder — pairwise comparisons, JSONL export
🧰 LLM Toolkit Next.js · TypeScript · Tailwind · Supabase Modular toolkit for LLM prompt experiments, evals, and dataset workflows 🔗
🎯 Lead Sniper n8n · LLM · Slack/Discord GitHub stargazer → enrichment → LLM pitch → auto-delivery
🌦 Weather-Aware Order Checker Node.js · OpenWeatherMap API Order decisions driven by real-time weather via Promise.all
📚 Bihar Skill Hub React · HTML · CSS Ed-tech platform bridging Bihar's skill gap 🔗
🍽 Meal-Buddy Python · Django · FastAPI Meal planning and suggestion API

🧠 Internship Metrics @ Ethara AI

┌────────────────────────────────────────────────────────────────────┐
│  🤖  LLMs Evaluated           →  6+    (GPT-4o, Claude, Gemini…)  │
│  📦  Python Repos Processed   →  100+  (repo_finder.py pipeline)  │
│  ✅  Eval Criteria / Repo      →  38    (fully automated)          │
│  📝  RLHF Samples Annotated   →  500+                             │
│  🏗   Custom Benchmark Splits  →  2     (ethara / ethara-lite)     │
└────────────────────────────────────────────────────────────────────┘

📈 GitHub Stats

  




🏆 GitHub Trophies


🎓 Background

education = {
    "degree":   "B.Tech — Electrical & Electronics Engineering",
    "college":  "Sershah Engineering College, Bihar",
    "batch":    "2025",
    "training": "Java Full Stack @ JSpiders, Noida",
    "journey":  "EEE → Full-Stack Dev → AI/ML Engineering 🚀",
}

🔗 Connect

LinkedIn GitHub Portfolio Email


github contribution grid snake animation

"Ship it. Benchmark it. Improve it."

Pinned Loading

  1. shubham-portfolio shubham-portfolio Public

    Personal portfolio — LLM Post-Training Engineer & Full-Stack Developer. Built with React + Vite, deployed on Vercel.

    JavaScript

  2. llm-response-evaluator llm-response-evaluator Public

    A Streamlit tool to evaluate and compare LLM responses across 5 RLHF-inspired quality dimensions — Instruction Following, Truthfulness, Prompt Correctness, Writing Quality, and Verbosity.

    Python 1 1

  3. bihar-skill-hub bihar-skill-hub Public

    A fully deployed educational platform targeting students in Bihar, featuring 33 courses across 11 skill categories. Built with JWT authentication, course enrollment, user profiles, success stories,…

    CSS

  4. llm-toolkit llm-toolkit Public

    AI-powered LLM evaluation toolkit - Prompt Quality Scorer & Multi-turn Conversation Analyzer. Built with Next.js + Claude API.

    TypeScript 1