Skip to content

[codex] Fix capacity-aware drop selection#2

Merged
Shwai-He merged 1 commit into
mainfrom
codex/fix-capacity-aware-drop
May 27, 2026
Merged

[codex] Fix capacity-aware drop selection#2
Shwai-He merged 1 commit into
mainfrom
codex/fix-capacity-aware-drop

Conversation

@Shwai-He
Copy link
Copy Markdown
Collaborator

Summary

Fix the capacity-aware MoE routing patch so dropped token-expert assignments remain dropped instead of being restored by fallback top-k indices.

Root Cause

The generic _select_with_capacity helper marked dropped assignments with the sentinel expert id, but then replaced those sentinel ids with the original top-k fallback indices before returning. Downstream code gathered weights or scattered logits from those restored indices, so changing expert_capacity could have little or no effect, especially for DeepSeek-style tuple-return gates.

Changes

  • Add a shared capacity-selection finalization helper that preserves sentinel ids and zero weights for dropped assignments.
  • Fix tuple-return gates, such as DeepSeek, to gather with safe indices and assign zero weight to sentinel entries.
  • Fix logits-return gates, such as Mixtral/Qwen/OLMoE-like modules, to scatter only valid selected logits while keeping a fallback logit only when a row would otherwise be all masked.
  • Keep the top-level capacity_aware/capacity_patch.py mirror in sync with the runtime lm_eval implementation.
  • Add focused tests for score drop, overselect width, tuple gate drop weights, and logits gate masking.

Validation

  • pytest tests/test_capacity_aware_patch.py
  • python -m py_compile lm_eval/capacity_aware/capacity_patch.py ../capacity_aware/capacity_patch.py tests/test_capacity_aware_patch.py

@Shwai-He Shwai-He marked this pull request as ready for review May 27, 2026 20:18
@Shwai-He Shwai-He merged commit dc16df0 into main May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant