Skip to content

Conversation

@shuningjin
Copy link
Collaborator

@shuningjin shuningjin commented Dec 30, 2025

Description

Main change
onboard test for deepseek2-16b, plan to run full workflow in XLML on v5p-8

  • end_to_end/tpu/deepseek/v2-16b/test_deepseek.sh: ckpt conversion, logit check, train, finetune, decode

onboard test for deepseek3-671b, plan to run step 2 in XLML on v5p-128

  • end_to_end/tpu/deepseek/v3-671b/1_test_deepseek.sh
  • end_to_end/tpu/deepseek/v3-671b/2_test_deepseek.sh: logit check (scan), decode (unscan). Note we leave out training and finetuning, as is covered by ubench nightly test.

XLML PR: GoogleCloudPlatform/ml-auto-solutions#1112

FIXES: b/423057893

Other change

  • Minor format change to other test scripts.
  • max_logging.log behavior is changed due to PR#2873, need to set absl.logging.level to make logs visible for multiple scripts

Tests

preparation

  • The checkpoints uploaded
    • gs://maxtext-deepseek/deepseek2-16b (hf-bf16)
    • gs://maxtext-deepseek/deepseek3-671b (hf-bf16, scanned, unscanned)
  • golden logits uploaded
    • gs://maxtext-test-assets/golden_data_deepseek2-16b.jsonl
    • gs://maxtext-test-assets/golden_data_deepseek3-671b.jsonl

bash end_to_end/tpu/deepseek/v3-671b/2_test_deepseek.sh

bash end_to_end/tpu/deepseek/v2-16b/test_deepseek.sh

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov
Copy link

codecov bot commented Dec 30, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@github-actions
Copy link

🤖 Hi @shuningjin, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📋 Review Summary

This Pull Request introduces new test scripts for DeepSeek models (v2-16b and v3-671b) and refactors existing test scripts for GPT-OSS, Llama4, and Mixtral. The changes aim to onboard DeepSeek models, improve consistency in script practices, and adjust logging verbosity.

🔍 General Feedback

  • The refactoring of test scripts into modular 1_test_deepseek.sh and 2_test_deepseek.sh for DeepSeek v3 is a positive step for organization and clarity.
  • Consistent application of absl.logging.set_verbosity(absl.logging.INFO) across relevant Python files is good for debugging and monitoring.
  • The general updates to how BASE_OUTPUT_PATH is handled in the shell scripts improve robustness.
  • Consider reviewing the commented-out training and fine-tuning steps in the DeepSeek test scripts to ensure they are intentionally disabled and to add explanatory comments if needed.
  • Ensure all golden logit paths are robust and not dependent on transient local paths or specific dates.

Copy link
Collaborator

@RissyRan RissyRan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Also thanks for keeping them up-to-date! A few minor comments

idx=$(date +%Y-%m-%d)

# By default, we'll use "llama4-17b-16e"
if [ -z "${MODEL_VARIATION}" ]; then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to set this from XL ML? or we cold direct export MODEL_VARIATION="llama4-17b-16e" without if/else

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other models, we have separate test scripts for each variant. For llama4, this serves as the united script. Do we need to account for Maverick as well?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I think we should do something similar to GPT-OSS models. By passing the type instead of duplicate the scripts. No need to be in this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different variant may need customization in tests, for instance

  • forward logit check criteria
  • parallelism as they are typically run on different devices (e.g., v5p-8 vs. v5p-128)

I would prefer keeping disentangled scripts for different variants when possible.

Actually, for the llama4 case, the logit criteria is set for scout. this might not work for maverick. Need to follow this up.

Copy link
Collaborator

@RissyRan RissyRan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates!

idx=$(date +%Y-%m-%d)

# By default, we'll use "llama4-17b-16e"
if [ -z "${MODEL_VARIATION}" ]; then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I think we should do something similar to GPT-OSS models. By passing the type instead of duplicate the scripts. No need to be in this PR.

@shuningjin
Copy link
Collaborator Author

For deepseek script, updated moe strategy to tokamax_gmm for training and decoding. Tests pass:

Copy link
Collaborator

@parambole parambole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for making these changes. I have left a nit.

@shuningjin shuningjin force-pushed the shuningjin-xlml-ds branch 2 times, most recently from c018d3a to 79960df Compare January 2, 2026 22:20
@copybara-service copybara-service bot merged commit 98a3d4c into main Jan 2, 2026
26 checks passed
@copybara-service copybara-service bot deleted the shuningjin-xlml-ds branch January 2, 2026 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants