Token-level expert routing capture for Nemotron-Cascade-2-30B-A3B MoE layers during vLLM inference. Parquet output.
deep-learning model routing transformers moe mamba mixture-of-experts model-interpretability model-observability llm llms vllm llm-inference nemotron vllm-serve expert-routing moe-routing expert-fingerprinting activation-capture
-
Updated
Apr 7, 2026 - Python