Skip to content

[Fix]: $HOME in launcher eagle example#1365

Merged
h-guo18 merged 2 commits intomainfrom
haoguo/fix-example0428
May 1, 2026
Merged

[Fix]: $HOME in launcher eagle example#1365
h-guo18 merged 2 commits intomainfrom
haoguo/fix-example0428

Conversation

@h-guo18
Copy link
Copy Markdown
Contributor

@h-guo18 h-guo18 commented Apr 28, 2026

What does this PR do?

Type of change: Bug fix

Launcher example bug raised by @cjluo-nv

Before fix: task1 in tools/launcher/examples/Qwen/Qwen3-8B/hf_online_eagle3.yaml fails
Reason: due to HOME: /tmp set in container, enroot credentials in $HOME/.config/enroot/.crendential not found

GpuFreq=control_disabled
pyxis: importing docker image: nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc10
Apr 28 13:35:59.491365 2515157 slurmstepd   0x155552c3b780: error: pyxis: child 2515158 failed with error code: 1
Apr 28 13:35:59.491415 2515157 slurmstepd   0x155552c3b780: error: pyxis: failed to import docker image
Apr 28 13:35:59.491433 2515157 slurmstepd   0x155552c3b780: error: pyxis: printing enroot log file:
Apr 28 13:35:59.491453 2515157 slurmstepd   0x155552c3b780: error: pyxis:     [INFO] Querying registry for permission grant
Apr 28 13:35:59.491469 2515157 slurmstepd   0x155552c3b780: error: pyxis:     [INFO] Authenticating with user: <anonymous>
Apr 28 13:35:59.491483 2515157 slurmstepd   0x155552c3b780: error: pyxis:     [INFO] Authentication succeeded
Apr 28 13:35:59.491499 2515157 slurmstepd   0x155552c3b780: error: pyxis:     [INFO] Fetching image manifest list
Apr 28 13:35:59.491512 2515157 slurmstepd   0x155552c3b780: error: pyxis:     [INFO] Fetching image manifest
Apr 28 13:35:59.491524 2515157 slurmstepd   0x155552c3b780: error: pyxis:     [ERROR] URL https://registry-1.docker.io/v2/nvcr.io/nvidia/tensorrt-llm/release/manifests/1.3.0rc10 returned error code: 401 Unauthorized
Apr 28 13:35:59.491564 2515157 slurmstepd   0x155552c3b780: error: pyxis: couldn't start container
Apr 28 13:35:59.491579 2515157 slurmstepd   0x155552c3b780: error: spank: required plugin spank_pyxis.so: task_init() failed with rc=-1
Apr 28 13:35:59.491593 2515157 slurmstepd   0x155552c3b780: error: Failed to invoke spank plugin stack
Apr 28 13:35:59.515523 2515146 slurmstepd   0x155552c3b780: error: pyxis: child 2515240 failed with error code: 1

After fix:

GpuFreq=control_disabled
pyxis: importing docker image: nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc10
pyxis: imported docker image: nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc10

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅ / ❌ / N/A
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
  • Did you write any new necessary tests?: ✅ / ❌ / N/A
  • Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

  • Chores
    • Updated example pipeline to use the standardized dataset example path.
    • Removed unnecessary per-task overrides of the process home and cache directory to simplify environment setup.
    • Preserved required model checkpoint environment setting for the relevant task so model resolution continues to work.

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 28, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 28, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: fb452087-b043-4994-af5f-8d25a966b405

📥 Commits

Reviewing files that changed from the base of the PR and between b25681e and 776f71b.

📒 Files selected for processing (1)
  • tools/launcher/examples/Qwen/Qwen3-8B/hf_online_eagle3.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • tools/launcher/examples/Qwen/Qwen3-8B/hf_online_eagle3.yaml

📝 Walkthrough

Walkthrough

Switches task_0 dataset input from a speculative-decoding prepare-input path to the generic examples/dataset path, and removes per-task environment overrides (HOME=/tmp and TORCHINDUCTOR_CACHE_DIR=/tmp/torch_cache) from task_1 and task_2; task_2 still sets HF_MODEL_CKPT from <<global_vars.hf_model>>.

Changes

Cohort / File(s) Summary
YAML Pipeline Configuration
tools/launcher/examples/Qwen/Qwen3-8B/hf_online_eagle3.yaml
Updates task_0 to use examples/dataset/.../example_data_config.yaml instead of the speculative-decoding prepare-input path; removes per-task environment entries setting HOME=/tmp and TORCHINDUCTOR_CACHE_DIR=/tmp/torch_cache from task_1 and task_2; task_2 still sets HF_MODEL_CKPT from <<global_vars.hf_model>>.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[Fix]: $HOME in launcher eagle example' directly addresses the bug fix described in the PR objectives—removing HOME=/tmp environment variable setting that caused enroot credential access failures.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed PR contains only YAML configuration changes to launcher example pipeline. No Python code modified, no dependencies added, and none of the six security anti-patterns are present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch haoguo/fix-example0428

Review rate limit: 9/10 reviews remaining, refill in 6 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@h-guo18 h-guo18 marked this pull request as ready for review April 28, 2026 20:45
@h-guo18 h-guo18 requested a review from cjluo-nv April 28, 2026 20:45
@h-guo18 h-guo18 changed the title [Fix] Eagle Example in launcher [Fix] $HOME in launcher eagle example Apr 28, 2026
@h-guo18 h-guo18 changed the title [Fix] $HOME in launcher eagle example [Fix]: $HOME in launcher eagle example Apr 28, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.84%. Comparing base (8eec6d4) to head (776f71b).
⚠️ Report is 11 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1365      +/-   ##
==========================================
- Coverage   76.93%   76.84%   -0.09%     
==========================================
  Files         471      471              
  Lines       50404    50537     +133     
==========================================
+ Hits        38776    38835      +59     
- Misses      11628    11702      +74     
Flag Coverage Δ
regression 14.91% <ø> (+0.21%) ⬆️
unit 52.78% <ø> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread tools/launcher/examples/Qwen/Qwen3-8B/hf_online_eagle3.yaml Outdated
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
@h-guo18 h-guo18 merged commit 9d2e608 into main May 1, 2026
39 checks passed
@h-guo18 h-guo18 deleted the haoguo/fix-example0428 branch May 1, 2026 23:31
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-05-01 23:32 UTC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants