Skip to content
View ayushm98's full-sized avatar

Block or report ayushm98

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
ayushm98/README.md

Hey, I'm Ayush Kumar Malik ๐Ÿ‘‹

Full Stack Engineer | AI Systems

Building production applications and intelligent systems with LLMs, RAG, and modern web technologies

Portfolio โ€ข LinkedIn โ€ข Email

MS in Computer Science @ Indiana University Bloomington | GPA: 3.9/4.0


๐Ÿš€ What I'm Working On

  • ๐ŸŒ Building production full-stack applications with real-time features
  • ๐Ÿค– Developing multi-agent AI systems and intelligent LLM architectures
  • ๐Ÿ“Š Deploying RAG pipelines and MLOps infrastructure at scale

๐Ÿ’ผ Featured Projects

๐ŸŽพ BRCKT - Production Fantasy Tennis Platform

โญ 1,500+ active users | ๐Ÿ† 27 tournaments hosted

Full-stack platform built in collaboration with Keith Hedges. Real-time match synchronization, bracket management, and AI-powered match analysis. Built with modern monorepo architecture handling complex tournament state management and WebSocket communications.

Next.js Express.js PostgreSQL Socket.io Python Docker


๐Ÿค– CodePilot | Live Demo

Multi-agent AI system for autonomous code generation. Four specialized agents (Planner, Coder, Reviewer, Explorer) orchestrate complex coding tasks with sandboxed execution using E2B.

Claude LangGraph Python


๐Ÿ“Š ML-Monitor | Live Demo

Production MLOps platform for real-time fraud detection achieving sub-100ms inference latency. Includes automated model retraining with drift detection and comprehensive Grafana monitoring.

FastAPI XGBoost MLflow Prometheus Grafana Docker


๐Ÿ”€ Cascade | Live Demo

Intelligent LLM router with semantic caching achieving 60% cost reduction. Routes queries to optimal models (GPT-4/GPT-3.5) based on complexity classification with 97% accuracy.

PyTorch Qdrant Redis Streamlit


๐Ÿ› ๏ธ Tech Stack

Full Stack Development

Next.js React Express.js PostgreSQL Socket.io

AI/ML & LLMs

PyTorch HuggingFace LangChain OpenAI Anthropic

Vector Databases & Caching

ChromaDB FAISS Qdrant Redis

Infrastructure & DevOps

AWS DigitalOcean Docker FastAPI

Languages

Python TypeScript JavaScript SQL


๐Ÿ’ผ Experience

AI Systems Engineer @ Radical Squares

Jan 2026 - Present | Remote

  • Architected AI-powered ETL platform using OpenAI GPT-4o for automated data pipeline generation, enabling natural language to SQL transformation across PostgreSQL, MySQL, and SQL Server
  • Optimized LLM cost infrastructure through FinOps analysis, identifying 54.6% cost reduction via model downgrading, prompt optimization, and semantic caching strategies
  • Developed AI field mapping service with LangChain and GPT-4o, automatically matching source to target columns with confidence scoring and 40% token reduction through metadata filtering

Full Stack Engineer @ Brckt (Peristyle Labs)

Dec 2025 - Present | Indianapolis, IN (Remote)

  • Built real-time tennis match analysis system using Llama 3.3-70B via Venice.ai API, generating professional head-to-head predictions with streaming responses
  • Developed web scraping infrastructure using Playwright headless browser with anti-detection measures, extracting H2H stats from matchstat.com
  • Implemented TTL caching layer with thread-safe operations and automatic eviction, reducing redundant scraping by caching H2H data for 2 hours
  • Deployed FastAPI backend with Server-Sent Events (SSE) for real-time streaming, Docker containerization, and Caddy reverse proxy

AI Systems Engineer @ Riverside Global LLC

Jun 2025 - Dec 2025 | Hampton, IL (Remote)

  • Architected production RAG system with 5-stage pipeline: query routing, reformulation, hybrid retrieval (BM25 + semantic), cross-encoder reranking, and GPT-4 generation, reducing document research time by 60%
  • Built hybrid search engine combining Sentence-BERT embeddings with BM25 using Reciprocal Rank Fusion (RRF), achieving 94% retrieval relevance on 10,000+ environmental documents
  • Developed LLM-powered data extraction pipeline using GPT-4 function calling, achieving 95% accuracy and reducing manual extraction from 3 hours to 15 minutes per document

Profile views

Pinned Loading

  1. cascade cascade Public

    Intelligent LLM Request Router - Reduce API costs by 60%+ through smart routing and semantic caching

    Python 3

  2. Devon Devon Public

    Python

  3. llama-tool-specialist llama-tool-specialist Public

    A specialized 8B-parameter Llama model fine-tuned for efficient function calling

    Python

  4. ml-monitor ml-monitor Public

    Python

  5. VerbaQuery VerbaQuery Public

    Industrial-grade RAG system with hybrid retrieval and cross-encoder re-ranking

    Python