Skip to content
View manjitpokhrel's full-sized avatar

Highlights

  • Pro

Block or report manjitpokhrel

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
manjitpokhrel/README.md

Manjit Pokhrel

AI Security Researcher · Adversarial ML · Hardware Security · Compiler Security

Kathmandu, Nepal · MLCommons Contributor
manjitpokhrel.com.np · Email · LinkedIn · ResearchGate · Medium


Research Focus

I study adversarial safety failures in large language models and build systems-level tools for inference optimization and security analysis — from GPU kernels to compiler passes.

Currently investigating:

  • Multilingual safety alignment failures in RLHF-trained LLMs
  • Compiler-level safety degradation — does torch.compile silently break alignment?
  • Cache-timing side-channel attacks on transformer inference
  • Activation-level jailbreak detection via neuron firing patterns
  • Training-free inference sparsity via custom CUDA/Triton kernels

Selected Contributions

NASB — Nepali Adversarial Safety Benchmark

Paper (Zenodo) · Code

First structured adversarial safety benchmark for Nepali-language LLMs.

  • 1,200+ multilingual and code-switched adversarial probes
  • 73.7% safety bypass rate in Nepali vs 0% in English
  • Evaluated across Qwen, Gemma, Llama, and Gemini
  • Findings disclosed to Google AI VRP (triaged) and Meta Whitehat
  • Independently confirmed by Qwen engineer
  • Revealed systemic tokenizer and alignment asymmetries in low-resource languages

GhostWeight

PyPI · Code

Training-free activation sparsity framework for LLM inference.

  • Custom CUDA kernels: sparse row-packing at warp/shared-memory level
  • 110.5% inference speedup on RTX 5060 (Blackwell)
  • Zero retraining. Zero fine-tuning. Zero accuracy loss.
  • Currently porting to OpenAI Triton as GhostWeight-Triton (sm_120)

Vajra Morphing

Adversarial attack vector I coined. Exploits morphological transformations in Devanagari script to bypass LLM safety filters at the subtoken level — below where safety alignment operates.


The Eleventh Optimization (in progress)

Investigating whether ML compiler optimizations (torch.compile, Triton kernels, TensorRT) silently alter safety alignment in RLHF-trained models. Zero prior work exists on this intersection.


Cache-Timing Attacks on Transformer Attention (in progress)

Extending Flush+Reload side-channel attacks from CNNs (Cache Telepathy, USENIX 2020) to transformer attention mechanisms — exploiting input-dependent memory access patterns unique to self-attention.


Publications & Disclosures

Year Work Venue
2026 Lost in Translation: Safety Alignment Failures in Nepali and Code-Switched Variants of Instruction-Tuned LLMs Zenodo
2026 GhostWeight: Training-Free Activation Sparsity for LLM Inference PyPI) / GitHub
2026 Multilingual Alignment Bypass Disclosure Google AI VRP (Triaged)
2026 Multilingual Safety Asymmetry Disclosure Meta Whitehat
2026 The Eleventh Way In (essay) Medium

Affiliations

  • MLCommons — Contributor, DMLR & AI Risk and Reliability Working Groups (2026–present)

Technical Stack

Security & Adversarial ML
Jailbreak evaluation · Cross-lingual safety bypass · Tokenizer exploitation · Morphological attacks · Side-channel analysis (cache-timing) · Responsible disclosure

ML Frameworks
PyTorch · HuggingFace Transformers · PEFT · NumPy · scikit-learn · llama.cpp

GPU & Compiler Systems
CUDA 12.8 · CUDA C (warp-level kernels) · OpenAI Triton (sm_120 Blackwell) · MLIR (learning) · Nsight Compute · Roofline analysis

Systems & Hardware
Flush+Reload (C) · Cache-timing attacks · AMD Zen 4 microarchitecture · · Operator fusion · Graph breaks

Languages
Python · C · CUDA C · JavaScript · SQL

Infrastructure
Linux · Git · Docker · FastAPI · HuggingFace Spaces · Vercel


Hardware

RTX 5060 8GB (Blackwell, sm_120) · Ryzen 7 7700 (Zen 4) · 16GB DDR5


Education

Kathmandu University
BSc Computer Science (Expected 2029)


Pinned Loading

  1. nepali-finetune nepali-finetune Public

    Fine-tuning Qwen2.5-1.5B on Nepali text using a RTX 5060 (Blackwell, sm_120)

    Python

  2. GhostWeight GhostWeight Public

    Training-free activation sparsity for LLMs. 74% hardware speedup on RTX 5060 (Blackwell) with 5.91% perplexity cost. Zero retraining. Static dead neuron masking + GhostGate threshold activation sur…

    Python 4