Skip to content
View ShubhamSarvankar's full-sized avatar

Highlights

  • Pro

Block or report ShubhamSarvankar

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ShubhamSarvankar/README.md

Hi, I'm Shubham Sarvankar,

I build stuff.

Projects

These are some selected projects that I've worked on recently.

  • Litmus: S3-compatible object store built for correctness. AWS CLI and boto3 work unmodified. A two-tier chaos suite proves durability invariants hold across SIGKILL and injected OSErrors. 115/325 ceph/s3-tests pass; every failure documented.
  • ITCH 5.0 Parser: From-scratch C++20 limit order book reconstruction for NASDAQ TotalView-ITCH 5.0. Replays a real 263M-message, 7.7 GB production capture end-to-end at 1.16M msg/s (231ns median apply latency). Fixed-point price arithmetic throughout; 105 tests covering feed-gap corruption, crossed books, and shutdown races.
  • Botnet C2 Detection: graph topology classifier over CTU-13 (13 captures, 7 botnet families). Leave-one-family-out cross-validation with pre-registered thresholds. Donbot achieves 0.872 PR-AUC on an unseen family; the cc_dst_only collapse to 0.001 is itself a documented finding.
  • Storm: real-time messaging platform in TypeScript. AES-256-GCM encryption, single-use JWT refresh tokens with replay detection, 112 tests against real MongoDB and Redis.
  • Streaming Observability Platform: End-to-end log analytics pipeline: NASA HTTP logs (1.9M lines) replay through Kafka → Spark Structured Streaming → Cassandra, served by a layered Spring Boot API. A LangGraph agent answers questions in plain English but never writes a query — it selects from five typed tools, so the LLM can't reach the database directly. Swappable provider behind one interface, validated on both claude-haiku-4-5 and local qwen2.5:7b (GPU).
  • MedGuard: Cross-platform medical records system with end-to-end encryption and HIPAA-aware audit logging. Mandatory PHI de-identification runs as a Firestore trigger (not optional middleware) using a two-stage Claude pipeline: entity extraction → deterministic placeholder substitution. Per-record AES-256-GCM keys wrapped via envelope encryption; Android local storage requires biometric auth. 355 tests across Cloud Functions, React web, and Android (Kotlin + Compose).
  • PA Legal Case Search: Multi-method retrieval system over 200k+ Pennsylvania legal cases. BM25 baseline, Legal-BERT dual-encoder semantic search, and cross-encoder reranking (BAAI/bge-reranker-large, 560M params). RAG Q&A layer fuses BM25 and dense results via Reciprocal Rank Fusion, then streams answers through Qwen3:8b with case citations. Dense+rerank drops from ~3s first query to ~10ms cached.
  • Flagged Unfairly: Fairness audit of Detoxify's unbiased toxicity classifier on the Civil Comments dataset (21,293 identity-annotated comments). Black-identity comments are false-positived at 3.02× the baseline rate; counterfactual term-swapping confirms the model reacts to the identity term itself, not surrounding content. Intersectional analysis finds black × jewish FPR at 5.88×. Per-group threshold optimization cuts maximum FPR disparity from 3.07× to 1.20× at a cost of −1.1% F1.
  • Join Society: Full-stack e-commerce platform on the MERN stack. Stripe checkout with coupon support, Cloudinary image uploads, Redis caching, JWT auth via HTTP-only cookies, and an admin dashboard with live analytics (Recharts). Covers the complete user journey from browsing to order history.

Let’s Connect

Pinned Loading

  1. streaming-observability-platform streaming-observability-platform Public

    Streaming log analytics pipeline (Kafka, Spark, Cassandra) with a Spring Boot API and a LangGraph natural-language query agent. Dockerized; swappable LLM provider (Claude or local Ollama).

    Java

  2. medguard medguard Public

    Cross-platform medical records system with end-to-end encryption, mandatory PHI de-identification, and HIPAA-aware audit logging. Android (Kotlin/Compose) + React web + Firebase Cloud Functions.

    TypeScript

  3. itch-parser itch-parser Public

    A C++ limit order book reconstruction system for NASDAQ TotalView-ITCH 5.0 binary feeds.

    C++

  4. storm storm Public

    Production-grade real-time messaging platform. TypeScript, WebSockets, MongoDB, Redis, BullMQ, AES-256-GCM encryption, JWT auth, Docker. 112 tests, 90%+ coverage.

    TypeScript

  5. litmus litmus Public

    S3-compatible object store built for correctness. AWS CLI and boto3 work unmodified. A two-tier chaos suite proves durability invariants hold across SIGKILL and injected OSErrors. 115/325 ceph/s3-t…

    Python

  6. PALawSearch PALawSearch Public

    Forked from qianxusheng/PA-LawSearch

    An Information Retrieval course project that builds a search engine for Pennsylvania law cases from https://case.law./

    Python