ascend-runtime-manager

Lightweight manager for Ascend runtime setup and diagnostics.

Why

This repository isolates system-level Ascend dependency management from runtime repos. vllm-hust can call this manager so end users keep a single install entrypoint.

Commands

hust-ascend-manager doctor
hust-ascend-manager doctor --json
hust-ascend-manager env --shell
hust-ascend-manager env --install-hook
hust-ascend-manager setup --manifest manifests/euleros-910b.json --dry-run
hust-ascend-manager setup --manifest manifests/euleros-910b.json --install-python-stack
hust-ascend-manager setup --manifest manifests/euleros-910b.json --apply-system
hust-ascend-manager runtime check --repo /home/shuhao/vllm-hust
hust-ascend-manager runtime repair --repo /home/shuhao/vllm-hust
hust-ascend-manager runtime repair --repo /home/shuhao/vllm-hust --install-plugin
hust-ascend-manager launch Qwen/Qwen2.5-1.5B-Instruct
hust-ascend-manager container install --host-workspace-root /home/shuhao
hust-ascend-manager container shell --host-workspace-root /home/shuhao
hust-ascend-manager container install --non-interactive --host-workspace-root /home/shuhao
hust-ascend-manager container exec --host-workspace-root /home/shuhao -- python -c 'import torch; import torch_npu; print(torch.npu.device_count())'
hust-ascend-manager container ssh-deploy --host-workspace-root /home/shuhao --ssh-user shuhao --ssh-port 2222
hust-ascend-manager container ssh-enable --host-workspace-root /home/shuhao --ssh-user shuhao --ssh-port 2222

Default euleros-910b manifest includes:

conda config --add channels https://repo.huaweicloud.com/ascend/repos/conda/
conda install ascend-cann-toolkit==8.5.0
conda install ascend-cann-910b-ops==8.5.0
conda install ascend-cann-nnal==8.5.0

When a system step declares requires_group: HwHiAiUser, manager will run it via sg HwHiAiUser -c ... automatically when needed.

env --shell is the source of truth for Ascend runtime exports. Runtime repos should consume this output instead of carrying duplicated shell logic.

env --install-hook persists that same source of truth into the active conda environment by writing etc/conda/activate.d/hust-ascend-manager.sh and etc/conda/deactivate.d/hust-ascend-manager.sh. After that, conda activate reapplies the manager-generated Ascend environment automatically, so bare commands like python -c 'import torch_npu' or vllm --help do not depend on manual source set_env.sh or ad hoc shell wrappers.

runtime is the source of truth for repairing a broken vllm-hust Python environment from adjacent runtime repos such as vllm-hust-workstation. It checks whether the active Python can import torch, transformers, tokenizers, huggingface_hub, and vllm.entrypoints.cli.main under a clean PYTHONNOUSERSITE=1 environment. runtime repair then reconciles the common runtime deps, force-reinstalls the Hugging Face stack, installs build deps from requirements/build.txt without replacing the active torch wheel twice, and rebuilds editable vllm-hust against the currently selected Python runtime. When you pass --install-plugin, the same command also installs and verifies the Ascend platform plugin. It prefers a sibling local repo such as vllm-ascend-hust or vllm-ascend when present, and falls back to the PyPI package spec from --plugin-package or the default vllm-ascend-hust.

What runtime repair covers:

broken or incomplete Python runtime deps in the active env
mismatched transformers / tokenizers / huggingface_hub installs
missing build tools from requirements/build.txt such as cmake and ninja
stale local vllm/*.so artifacts that need an editable reinstall against the current torch wheel
optional Ascend platform plugin install and entry-point verification via --install-plugin

What still remains machine-specific or manual:

NVIDIA / Ascend driver packages and kernel modules on the host
CANN / NNAL / ATB system layout problems that require doctor / setup
model weights, Hugging Face reachability, mirror policy, and local cache completeness
systemd user-session availability and public ingress plumbing such as Cloudflare Tunnel
any repo-local changes that require a different torch major/minor than the manager default

The manager also normalizes non-standard Ascend installs, for example when the host only has directories like /usr/local/Ascend/ascend-toolkit.bak.8.1/latest instead of the canonical /usr/local/Ascend/ascend-toolkit/latest symlink. doctor verifies whether torch_npu can be imported under the manager-generated environment, and launch always runs with that normalized environment. doctor also detects a broken host OPP legacy-kernel layout where kernel/config/ascend910_93/ops_legacy/*.json points at kernel/ascend910_93/<op>/... but the installed files only exist under kernel/ascend910_93/ops_legacy/<op>/.... When this happens, even basic torch_npu operators such as torch.zeros() can fail before vLLM starts; the correct fix is to repair or reinstall the host CANN ops package. As a practical workaround, env --shell and launch now auto-generate a user-space OPP overlay under ~/.cache/hust-ascend-manager/opp-overlays/ and point ASCEND_OPP_PATH at that overlay when this broken layout is detected.

launch also enables a prefill compatibility mode by default on Ascend: it injects --no-enable-prefix-caching and --no-enable-chunked-prefill unless you already passed explicit prefill flags yourself. This is a pragmatic workaround for known npu_fused_infer_attention_score dimension crashes on some model/runtime combinations. To opt out, pass --no-prefill-compat-mode.

container is the source of truth for the official Huawei Ascend container workflow. container install is the one-click path: it pulls the configured image when needed, mounts Ascend devices and driver paths from the host, mounts your workspace into /workspace, and creates or starts a persistent container. When --image is omitted, the manager now defaults to the v0.9.1-dev image family, probes the host for an A2/910B vs A3 recommendation, and interactively confirms the official variant (v0.9.1-dev, -a3, -openeuler, or -a3-openeuler). Use --non-interactive to skip prompts in automation. Use container shell to enter that environment later without rebuilding the mount list, and container exec -- ... to run one-off checks or launches. If you want a single-command deployment for direct SSH access into the container, run container ssh-deploy. It creates or starts the container, installs openssh-server inside it when needed, configures a dedicated SSH port, and copies your mounted authorized_keys into the container user home. container ssh-enable remains available when the container is already running and you only want to refresh the in-container SSH setup.

The design follows upstream vLLM's plugin philosophy: hardware-specific setup and runtime adaptation should live outside the upstream core runtime path.

Install

cd /home/shuhao/ascend-runtime-manager
python -m pip install -e .

Or install from PyPI (recommended for teammates):

python -m pip install --upgrade hust-ascend-manager

Publish

Local publish with token:

cd /home/shuhao/ascend-runtime-manager
PYPI_TOKEN=pypi-xxxxx bash scripts/publish_pypi.sh

CI publish:

set repository secret PYPI_TOKEN
push a tag like v0.1.0 or run workflow dispatch

Notes

setup --apply-system executes commands from manifest and may require sudo.
Use setup --non-interactive when calling manager from automation. It will fail fast instead of hanging on an interactive sg or sudo password prompt.
setup --install-python-stack now auto-probes https://pypi.tuna.tsinghua.edu.cn/simple when PIP_INDEX_URL is unset, and falls back to the default upstream index when the mirror is unreachable.
Tune large wheel downloads with HUST_ASCEND_MANAGER_PIP_RETRIES, HUST_ASCEND_MANAGER_PIP_TIMEOUT, HUST_ASCEND_MANAGER_PIP_RESUME_RETRIES, HUST_ASCEND_MANAGER_PIP_INDEX_URL, and HUST_ASCEND_MANAGER_PIP_EXTRA_INDEX_URL. Set HUST_ASCEND_MANAGER_DISABLE_PYPI_MIRROR_AUTOSET=1 to disable automatic mirror selection.
container uses docker directly when available, otherwise falls back to sudo -n docker.
container ssh-deploy is the one-click path for direct SSH-to-container access.
container ssh-enable defaults to host port 2222, user shuhao, and authorized_keys source /workspace/.ssh/authorized_keys.
Keep binary payloads out of this repository. Use internal mirrors/artifact stores.
If your account was newly added to HwHiAiUser, re-login is still recommended.
setup is intentionally tolerant of a partially broken initial Ascend install: it can still reconcile the Python stack and planned CANN steps even when doctor cannot yet build a complete runtime env.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
manifests		manifests
scripts		scripts
src/hust_ascend_manager		src/hust_ascend_manager
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ascend-runtime-manager

Why

Commands

Install

Publish

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ascend-runtime-manager

Why

Commands

Install

Publish

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages