Skip to content
View Joker2841's full-sized avatar

Block or report Joker2841

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Joker2841/README.md

header

Typing SVG

Blog LinkedIn Codeforces Email

Profile Views Followers Open to work


About me

I'm a 2026 CSE graduate from IIT Guwahati. I work across backend systems and applied AI, and I'm most interested in problems where reliability and performance actually matter β€” storage internals, distributed systems, LLM infrastructure, the layers underneath the products people use.

const sai = {
    role:         "Software Engineer (2026 grad)",
    school:       "IIT Guwahati, Computer Science",
    focus:        ["Storage systems", "LLM evaluation", "Backend"],
    languages:    ["Rust", "C++", "Python", "Java"],
    currently:    "Building Sastran β€” a crash-safe storage engine",
    competitive:  "Codeforces Expert (max 1803)",
    writing_at:   "tensen.dev",
    open_to:      "Full-time SDE / Backend / Infrastructure roles",
    based_in:     "India",
    availability: "Immediate"
};

The fastest way to know what I work on is to look at what I build β€” scroll down.


Right now

Building Interviewing

Prototyping Writing Climbing


Featured projects

πŸ¦€ Sastran

Crash-safe storage engine in Rust

Unified key-value and vector storage. LSM-tree for KV, HNSW for ANN search, one durability story across both.

  • WAL with CRC-checked records, fsync ordering
  • Leveled compaction over block-structured SSTables
  • Per-SSTable bloom filters: ~28Γ— speedup on absent-key lookups (2.63Β΅s β†’ 94ns)
  • HNSW with crash-consistent snapshot-and-replay recovery
  • Recall@10 > 0.80 after 30% vector deletion via neighbor repair
  • 8-bit scalar quantization: 4Γ— compression at 96.2% recall
  • ~3,000 lines Rust Β· ~190 tests Β· zero unsafe

Rust LSM-Tree HNSW WAL Criterion

repo

🧠 DocuMind

Full-stack RAG document assistant

Production-style RAG system supporting semantic Q&A across PDFs, DOCX, HTML, Markdown.

  • FastAPI backend, React frontend, PostgreSQL
  • GPU-accelerated retrieval with BAAI/bge-large-en-v1.5
  • FAISS vector search + custom heuristic filtering
  • Real-time updates over WebSockets
  • Containerized with Docker, PWA support
  • Sub-10ms semantic retrieval latency

FastAPI React PostgreSQL FAISS PyTorch CUDA Docker

repo

🎯 RAG-Aware LLM Routing

B.Tech thesis Β· Prof. Amit Awekar

Cost-aware routing system that dynamically selects between small and large language models per query.

  • Routes 2B vs 30B models on complexity + confidence signals
  • 22 predictive features, cost-sensitive XGBoost
  • Interaction-based signals catch small-model failures
  • ~41% inference-cost reduction
  • 93.3% answer quality preserved
  • Evaluation pipeline over 1,500 queries, 50+ GPU oracle hours

Python XGBoost PyTorch LLMs RAG

repo

😏 Sarcasm Detection (ACE 2+)

NLP / Deep Learning course project

Fine-tuned DeBERTa-v3 for sarcasm detection with custom affective-feature fusion.

  • 94.77% macro-F1 on 28K news headlines
  • Exceeds COLING 2020 benchmark (+2.56 F1)
  • Custom Sentiment Incongruity Score (SIS)
  • 10-dim Affective Feature Embedding (AFE)
  • Gated Fusion layer, full 4-variant ablation study

PyTorch DeBERTa-v3 HuggingFace NLP

repo

πŸ”Ž Redrob β€” Intelligent Candidate Discovery

Track 01 hackathon Β· CPU-fast ranking with LLM coherence

Ranks 100,000 candidates against a Senior ML/AI JD via structural signals + LLM-grounded coherence checking. 5-stage pipeline: relevance floor β†’ honeypot gate β†’ JD disqualifier β†’ calibrated scorer Γ— multipliers.

  • LLM re-tasked: qwen2.5:7b (Ollama) shifted from tier-scorer to narrow coherence checker after templated prose inflated tier scores
  • Structural trap detection: tool anachronisms, narrative inconsistencies, domain mismatches
  • Empirical signal validation β€” dropped fps weight, dup-detection after measuring against full 100K pool
  • Two-pass design with offline LLM precompute and CPU-only ranking
  • NDCG@10 = 0.929, NDCG@50 = 0.978 on hand-built gold set
  • < 5 min for 100K candidates, no network, MIT licensed

Python Ollama LLM-as-judge Empirical eval

repo

πŸ“‚ More work

A few other things I've built

  • NIDS β€” Real-time network intrusion detection (1-3K packets/sec via BPF + multi-threaded Python). The project where I debugged Python's GIL bottleneck firsthand. β†’ repo
  • Runway β€” AI agent that helps you finish commitments before deadlines, not just remember them. Built with Gemini on Google AI Studio. β†’ repo
  • SkyConnect β€” MERN-stack flight booking platform with REST API, automated notifications (node-cron + Nodemailer), admin panel.
  • Codeforces archive + Gridlock 2.0 hackathon β€” competitive programming climb from newbie to Expert; Bengaluru Traffic Police Γ— Flipkart traffic demand prediction (~89.1 leaderboard).

All public repos at my repositories page β†’


How I work

A few principles I've actually learned from building these β€” not borrowed from somewhere.

Measure before optimizing. The 28Γ— bloom filter speedup in Sastran came from writing the benchmark in Criterion before writing the optimization. The 41% routing cost reduction came from labeling 1,500 queries with GPU oracle data before training a router. In Redrob I dropped several intuitive signals β€” keyword density, duplicate-description detection, education date-ordering β€” to zero after measuring they had no ranking value across 100K candidates. I learned this the hard way enough times that it's now reflex.

Learn by building. Rust at depth came from building Sastran, not from a course. HNSW came from reading the Malkov–Yashunin paper and implementing it, not from a library wrapper. The fastest path to understanding something is usually to ship it.

Comfortable being wrong. From competitive programming, my code gets judged objectively against test cases every weekend. Disagreement and error feel like information, not threat.

Honest about scope. Sastran is a single-node engine, not a distributed system. DocuMind is production-style, not battle-tested in production. I'd rather scope something small and finish it well than wave my hands at something I haven't built.


Tech stack

πŸ›  Languages I reach for first

βš™οΈ Backend & APIs

πŸ’Ύ Data & storage

🧠 ML & AI

🐧 Systems & DevOps

🎨 Frontend (when needed)


Competitive programming

I've competed on Codeforces since 2022. It's where I first learned to think about problems systematically and to take being wrong as information.

πŸ† Codeforces β€” Expert



Contest Global Rank
Round 1070 279 / 13,000+
Round 1044 653 / 16,000+

CF

πŸŽ“ Academic




Exam Score
AP EAPCET 2022 State Rank 118
Class XII (2022) 93.2%
Class X (2020) 100%

GitHub stats

GitHub Stats GitHub Streak

Top Languages

GitHub Trophies

πŸ“ˆ Contribution activity graph

Activity Graph

snake animation


What I read & think about

Reading shapes how I work as much as building does. A short list of what's actually on my shelf β€” split between systems papers that inform my engineering and the broader books that shape how I think.

πŸ“„ Papers that inform my engineering

Paper Why it matters to me
The Log-Structured Merge-Tree β€” O'Neil et al. The foundation behind LevelDB, RocksDB, and Sastran's KV layer
Efficient ANN search using HNSW β€” Malkov & Yashunin What Sastran's vector index implements
Dynamo: Amazon's Highly Available Key-value Store For thinking about distribution, replication, and eventual consistency
The Google File System β€” Ghemawat et al. Still the cleanest introduction to large-scale block storage

πŸ“š Engineering books

Book Why it matters to me
Designing Data-Intensive Applications β€” Kleppmann The textbook I wish every backend engineer read
Database Internals β€” Petrov Dense but excellent on storage engines specifically

🧭 Beyond engineering β€” how I think and work

Book Why it matters to me
Thinking, Fast and Slow β€” Kahneman The book that changed how I treat my own intuitions
Deep Work β€” Cal Newport Why I protect long uninterrupted blocks for hard problems
Atomic Habits β€” James Clear Systems over goals β€” the framing that actually changed my daily work
Make It Stick β€” Brown, Roediger, McDaniel Evidence-based learning, written for technical readers
Clear Thinking β€” Shane Parrish On catching the moments where defaults make decisions for you
Man's Search for Meaning β€” Viktor Frankl The book I return to when work feels heavy
Meditations β€” Marcus Aurelius Short, ancient, still the best operating manual I've found for a noisy mind

Writing

I write occasionally at tensen.dev β€” mostly notes on systems, what I'm learning, and the occasional deep-dive into something I built.

Blog


Let's talk

I'm open to full-time SDE, backend, and infrastructure roles, including AI / LLM platform work. Most interested in companies where the engineering bar is high and the work touches reliability, performance, or data at meaningful scale.

Email LinkedIn Blog GitHub

πŸ“ Available immediately Β· Bengaluru / Hyderabad / Remote

footer

Pinned Loading

  1. sastran sastran Public

    Rust

  2. document-qa-rag-new document-qa-rag-new Public

    Upload documents (PDFs, text, Word). Ask questions. Get context-aware answers using RAG (embedding + LLM) - optimized for RTX 4050 GPU.

    JavaScript

  3. rag-aware-routing rag-aware-routing Public

    Python

  4. NIDS NIDS Public

    Python