-
Notifications
You must be signed in to change notification settings - Fork 404
Test pr2 #1506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Test pr2 #1506
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -9,6 +9,9 @@ on: | |||||
| types: [opened, assigned] | ||||||
| pull_request_review: | ||||||
| types: [submitted] | ||||||
| pull_request_target: | ||||||
| types: [opened, synchronize] | ||||||
| branches: [main] | ||||||
|
|
||||||
| jobs: | ||||||
| claude: | ||||||
|
|
@@ -38,7 +41,7 @@ jobs: | |||||
| # Prompt A workaround for claude code action bug of `Fork` PR | ||||||
| prompt: | | ||||||
| REPO: ${{ github.repository }} | ||||||
| PR NUMBER: ${{ github.event.pull_request.number }} | ||||||
| PR NUMBER: ${{ github.event.pull_request.number || github.event.issue.number}} | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Claude: Nit: Missing space before closing
Suggested change
|
||||||
|
|
||||||
| Please review this pull request. | ||||||
|
|
||||||
|
|
||||||
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -130,3 +130,4 @@ skip-magic-trailing-comma = false | |||
|
|
||||
| # Like Black, automatically detect the appropriate line ending. | ||||
| line-ending = "auto" | ||||
| 111 | ||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Claude: Critical: This line (
Suggested change
|
||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -12,6 +12,7 @@ | |
| from typing_extensions import Self | ||
|
|
||
| from xtuner.v1.loss.utils import sp_split | ||
| from xtuner.v1.model.utils.misc import ModelForwardExtraLogInfo | ||
|
|
||
| from .chunk_loss import ChunkLoss | ||
|
|
||
|
|
@@ -195,6 +196,10 @@ def forward( | |
| else: | ||
| loss, (logits, extra_info) = self.chunk_mode(hidden_states, head_weight, head_bias, self.loss_kwargs) | ||
|
|
||
| # TODO: yanhuida, should be removed | ||
| if not isinstance(extra_info, ModelForwardExtraLogInfo): | ||
| extra_info = ModelForwardExtraLogInfo(extra_info) | ||
|
Comment on lines
198
to
+201
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Claude: Warning: This TODO-guarded workaround wraps
|
||
|
|
||
| extra_info["local_base_loss"] = loss.detach().clone() | ||
|
|
||
| # Step 2.c in the loss calculation: reduce the loss over all ranks using all_reduce with autograd support | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -194,11 +194,25 @@ def layers_type(self) -> list[Literal["full_attention", "sliding_attention"]]: | |
| ] | ||
|
|
||
|
|
||
| class ModelOutputs(TypedDict): | ||
| hidden_states: NotRequired[list[torch.Tensor]] | ||
| logits: NotRequired[torch.Tensor] | ||
| class ModelOutputs(PydanticBaseModel): | ||
| model_config = ConfigDict(arbitrary_types_allowed=True) | ||
| hidden_states: list[torch.Tensor] | None = None | ||
| logits: torch.Tensor | None = None | ||
| loss: torch.Tensor | ||
| extra_info: ModelForwardExtraLogInfo | ||
| extra_info: ModelForwardExtraLogInfo | None = None | ||
|
|
||
| def free(self): | ||
| self.hidden_states = None | ||
| self.logits = None | ||
| self.extra_info = None | ||
|
|
||
| # TODO: Only for avoid BC. Should be removed later. | ||
| def __getitem__(self, key): | ||
| return getattr(self, key) | ||
|
|
||
| # TODO: Only for avoid BC. Should be removed later. | ||
| def __contains__(self, key): | ||
|
Comment on lines
+213
to
+214
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Claude: Warning: For example, |
||
| return key in self.model_fields_set | ||
|
|
||
|
|
||
| def _is_float8_available(): | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -79,10 +79,14 @@ | |||||||||||||||||||
|
|
||||||||||||||||||||
|
|
||||||||||||||||||||
| class MoEModelOutputs(ModelOutputs): | ||||||||||||||||||||
| router_logits: NotRequired[dict[str, torch.Tensor]] | ||||||||||||||||||||
| balancing_loss: NotRequired[torch.Tensor] | ||||||||||||||||||||
| z_loss: NotRequired[torch.Tensor] | ||||||||||||||||||||
| tokens_per_expert_global: NotRequired[torch.Tensor] | ||||||||||||||||||||
| router_logits: dict[str, torch.Tensor] | None = None | ||||||||||||||||||||
| balancing_loss: torch.Tensor | None = None | ||||||||||||||||||||
| z_loss: torch.Tensor | None = None | ||||||||||||||||||||
| tokens_per_expert_global: torch.Tensor | ||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Claude: Critical: In This should be optional to match the original behavior:
Suggested change
|
||||||||||||||||||||
|
|
||||||||||||||||||||
| def free(self): | ||||||||||||||||||||
| super().free() | ||||||||||||||||||||
| self.router_logits = None | ||||||||||||||||||||
|
Comment on lines
+87
to
+89
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Claude: Warning:
Suggested change
|
||||||||||||||||||||
|
|
||||||||||||||||||||
|
|
||||||||||||||||||||
| class BalancingLossConfig(PydanticBaseModel): | ||||||||||||||||||||
|
|
@@ -482,7 +486,7 @@ def _micro_batch_forward( | |||||||||||||||||||
|
|
||||||||||||||||||||
| output["router_logits"] = router_logits_dict | ||||||||||||||||||||
|
|
||||||||||||||||||||
| return MoEModelOutputs(**output, logits=logits) # type: ignore[typeddict-item] | ||||||||||||||||||||
| return MoEModelOutputs(**output, logits=logits) | ||||||||||||||||||||
|
|
||||||||||||||||||||
| def _forward( | ||||||||||||||||||||
| self, | ||||||||||||||||||||
|
|
@@ -583,7 +587,7 @@ def _forward( | |||||||||||||||||||
| else: | ||||||||||||||||||||
| output["router_logits"] = None | ||||||||||||||||||||
|
|
||||||||||||||||||||
| return MoEModelOutputs(**output) # type: ignore[typeddict-item] | ||||||||||||||||||||
| return MoEModelOutputs(**output) | ||||||||||||||||||||
|
|
||||||||||||||||||||
| def build_embeddings(self, config: MoEConfig): | ||||||||||||||||||||
| return nn.Embedding(config.vocab_size, config.hidden_size, config.pad_token_id) | ||||||||||||||||||||
|
|
||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Claude: Warning — Security:
pull_request_targetruns in the context of the base branch and has access to repository secrets, unlike the saferpull_requestevent. While the Claude Code action uses restrictedallowed_tools, thepull_request_targettrigger is a known attack vector for PRs from untrusted forks — the PR author could craft malicious content in the PR title, body, or diff that influences the Claude agent's behavior in a context where secrets are available.Consider whether
pull_request(which does NOT have access to secrets) is sufficient for code review, or add explicitif: github.event.pull_request.head.repo.full_name == github.repositoryguards to limit this to same-repo PRs only.