Conversation
ghstack-source-id: 3b810e7 Pull-Request: InternLM#1484
|
|
||
| # Like Black, automatically detect the appropriate line ending. | ||
| line-ending = "auto" | ||
| 111 |
There was a problem hiding this comment.
Claude: Critical: This line (111) appears to be a test artifact from the first commit. It will break TOML parsing since it's not valid TOML syntax. This must be removed before merging.
| 111 |
| router_logits: dict[str, torch.Tensor] | None = None | ||
| balancing_loss: torch.Tensor | None = None | ||
| z_loss: torch.Tensor | None = None | ||
| tokens_per_expert_global: torch.Tensor |
There was a problem hiding this comment.
Claude: Critical: tokens_per_expert_global is declared as required (no default value), but the original TypedDict had it as NotRequired[torch.Tensor].
In _micro_batch_forward (line ~463), tokens_per_expert_global is only added to the output dict inside the if all_router_logits: block. If that condition is false, constructing MoEModelOutputs(**output) will raise a Pydantic ValidationError because this required field is missing.
This should be optional to match the original behavior:
| tokens_per_expert_global: torch.Tensor | |
| tokens_per_expert_global: torch.Tensor | None = None |
| def free(self): | ||
| super().free() | ||
| self.router_logits = None |
There was a problem hiding this comment.
Claude: Warning: MoEModelOutputs.free() only clears router_logits but not balancing_loss, z_loss, or tokens_per_expert_global. For consistency with ModelOutputs.free() (which clears all its tensor fields), all MoE-specific tensor fields should also be freed:
| def free(self): | |
| super().free() | |
| self.router_logits = None | |
| def free(self): | |
| super().free() | |
| self.router_logits = None | |
| self.balancing_loss = None | |
| self.z_loss = None | |
| self.tokens_per_expert_global = None |
| # TODO: Only for avoid BC. Should be removed later. | ||
| def __contains__(self, key): |
There was a problem hiding this comment.
Claude: Warning: __contains__ uses model_fields_set which only tracks fields explicitly passed during construction. This works correctly for the current code patterns (where fields are conditionally added to the output dict before construction), but note this is semantically different from checking hasattr or checking if a value is non-None.
For example, MoEModelOutputs(loss=loss, extra_info=info, tokens_per_expert_global=tpe, balancing_loss=None) would report "balancing_loss" in output as True (because it was explicitly passed), even though the value is None. This may be surprising to callers who expect in to indicate a meaningful value is present. Consider documenting this behavior or checking getattr(self, key) is not None instead.
|
|
||
| # TODO: yanhuida, should be removed | ||
| if not isinstance(extra_info, ModelForwardExtraLogInfo): | ||
| extra_info = ModelForwardExtraLogInfo(extra_info) |
There was a problem hiding this comment.
Claude: Warning: This TODO-guarded workaround wraps extra_info in ModelForwardExtraLogInfo if it isn't already one. Two concerns:
- This introduces a circular-style dependency:
loss/base_loss_ctx.pynow imports frommodel/utils/misc.py. Ifextra_infocan arrive as a plain dict here, the root cause should be fixed at the source (whereextra_infois produced), not patched at the consumer. - The
# TODO: yanhuida, should be removedcomment doesn't explain when or under what condition this should be removed, making it easy to forget. Consider adding a reference to a tracking issue.
| types: [submitted] | ||
| pull_request_target: | ||
| types: [opened, synchronize] |
There was a problem hiding this comment.
Claude: Warning — Security: pull_request_target runs in the context of the base branch and has access to repository secrets, unlike the safer pull_request event. While the Claude Code action uses restricted allowed_tools, the pull_request_target trigger is a known attack vector for PRs from untrusted forks — the PR author could craft malicious content in the PR title, body, or diff that influences the Claude agent's behavior in a context where secrets are available.
Consider whether pull_request (which does NOT have access to secrets) is sufficient for code review, or add explicit if: github.event.pull_request.head.repo.full_name == github.repository guards to limit this to same-repo PRs only.
| prompt: | | ||
| REPO: ${{ github.repository }} | ||
| PR NUMBER: ${{ github.event.pull_request.number }} | ||
| PR NUMBER: ${{ github.event.pull_request.number || github.event.issue.number}} |
There was a problem hiding this comment.
Claude: Nit: Missing space before closing }}:
| PR NUMBER: ${{ github.event.pull_request.number || github.event.issue.number}} | |
| PR NUMBER: ${{ github.event.pull_request.number || github.event.issue.number }} |
|
Claude: ## Summary This PR converts IssuesCritical
Warning
Nit
VerdictREQUEST_CHANGES |
No description provided.