GitHub - 1BIMU/1BIMU

🧑‍💻 About Me

🎓 Undergraduate student at Beijing University of Posts and Telecommunications (BUPT), School of Computer Science
🔬 Research interests: RLVR · RLHF · Optimization Algorithms
🌱 Currently exploring the intersection of reinforcement learning and large language model alignment
📍 Beijing, China

🔭 Research Interests

Area	Description
RLVR	Reinforcement Learning from Verifiable Rewards — scalable reward signals beyond human feedback
RLHF	Reinforcement Learning from Human Feedback — aligning LLMs with human preferences
Optimizer	Adaptive optimization methods (AdamW, Muon, Shampoo, etc.) for deep learning

📌 Pinned Repositories

APO_OFFICAL — [ICML 2026] The official repository for Anchored Policy Optimization: Mitigating Exploration Collapse via Support-Constrained Rectification ⭐ 14 🍴 1