Lightweight manager for Ascend runtime setup and diagnostics.
This repository isolates system-level Ascend dependency management from runtime repos.
vllm-hust can call this manager so end users keep a single install entrypoint.
hust-ascend-manager doctorhust-ascend-manager doctor --jsonhust-ascend-manager env --shellhust-ascend-manager env --install-hookhust-ascend-manager setup --manifest manifests/euleros-910b.json --dry-runhust-ascend-manager setup --manifest manifests/euleros-910b.json --install-python-stackhust-ascend-manager setup --manifest manifests/euleros-910b.json --apply-systemhust-ascend-manager runtime check --repo /home/shuhao/vllm-husthust-ascend-manager runtime repair --repo /home/shuhao/vllm-husthust-ascend-manager runtime repair --repo /home/shuhao/vllm-hust --install-pluginhust-ascend-manager launch Qwen/Qwen2.5-1.5B-Instructhust-ascend-manager container install --host-workspace-root /home/shuhaohust-ascend-manager container shell --host-workspace-root /home/shuhaohust-ascend-manager container install --non-interactive --host-workspace-root /home/shuhaohust-ascend-manager container exec --host-workspace-root /home/shuhao -- python -c 'import torch; import torch_npu; print(torch.npu.device_count())'hust-ascend-manager container ssh-deploy --host-workspace-root /home/shuhao --ssh-user shuhao --ssh-port 2222hust-ascend-manager container ssh-enable --host-workspace-root /home/shuhao --ssh-user shuhao --ssh-port 2222
Default euleros-910b manifest includes:
conda config --add channels https://repo.huaweicloud.com/ascend/repos/conda/conda install ascend-cann-toolkit==8.5.0conda install ascend-cann-910b-ops==8.5.0conda install ascend-cann-nnal==8.5.0
When a system step declares requires_group: HwHiAiUser, manager will run it via
sg HwHiAiUser -c ... automatically when needed.
env --shell is the source of truth for Ascend runtime exports. Runtime repos
should consume this output instead of carrying duplicated shell logic.
env --install-hook persists that same source of truth into the active conda
environment by writing etc/conda/activate.d/hust-ascend-manager.sh and
etc/conda/deactivate.d/hust-ascend-manager.sh. After that, conda activate
reapplies the manager-generated Ascend environment automatically, so bare
commands like python -c 'import torch_npu' or vllm --help do not depend on
manual source set_env.sh or ad hoc shell wrappers.
runtime is the source of truth for repairing a broken vllm-hust Python
environment from adjacent runtime repos such as vllm-hust-workstation.
It checks whether the active Python can import torch, transformers,
tokenizers, huggingface_hub, and vllm.entrypoints.cli.main under a clean
PYTHONNOUSERSITE=1 environment. runtime repair then reconciles the common
runtime deps, force-reinstalls the Hugging Face stack, installs build deps from
requirements/build.txt without replacing the active torch wheel twice, and
rebuilds editable vllm-hust against the currently selected Python runtime.
When you pass --install-plugin, the same command also installs and verifies
the Ascend platform plugin. It prefers a sibling local repo such as
vllm-ascend-hust or vllm-ascend when present, and falls back to the PyPI
package spec from --plugin-package or the default vllm-ascend-hust.
What runtime repair covers:
- broken or incomplete Python runtime deps in the active env
- mismatched
transformers/tokenizers/huggingface_hubinstalls - missing build tools from
requirements/build.txtsuch ascmakeandninja - stale local
vllm/*.soartifacts that need an editable reinstall against the current torch wheel - optional Ascend platform plugin install and entry-point verification via
--install-plugin
What still remains machine-specific or manual:
- NVIDIA / Ascend driver packages and kernel modules on the host
- CANN / NNAL / ATB system layout problems that require
doctor/setup - model weights, Hugging Face reachability, mirror policy, and local cache completeness
- systemd user-session availability and public ingress plumbing such as Cloudflare Tunnel
- any repo-local changes that require a different torch major/minor than the manager default
The manager also normalizes non-standard Ascend installs, for example when the
host only has directories like /usr/local/Ascend/ascend-toolkit.bak.8.1/latest
instead of the canonical /usr/local/Ascend/ascend-toolkit/latest symlink.
doctor verifies whether torch_npu can be imported under the manager-generated
environment, and launch always runs with that normalized environment.
doctor also detects a broken host OPP legacy-kernel layout where
kernel/config/ascend910_93/ops_legacy/*.json points at
kernel/ascend910_93/<op>/... but the installed files only exist under
kernel/ascend910_93/ops_legacy/<op>/.... When this happens, even basic
torch_npu operators such as torch.zeros() can fail before vLLM starts; the
correct fix is to repair or reinstall the host CANN ops package. As a practical
workaround, env --shell and launch now auto-generate a user-space OPP
overlay under ~/.cache/hust-ascend-manager/opp-overlays/ and point
ASCEND_OPP_PATH at that overlay when this broken layout is detected.
launch also enables a prefill compatibility mode by default on Ascend: it
injects --no-enable-prefix-caching and --no-enable-chunked-prefill unless
you already passed explicit prefill flags yourself. This is a pragmatic
workaround for known npu_fused_infer_attention_score dimension crashes on some
model/runtime combinations. To opt out, pass --no-prefill-compat-mode.
container is the source of truth for the official Huawei Ascend container
workflow. container install is the one-click path: it pulls the configured
image when needed, mounts Ascend devices and driver paths from the host, mounts
your workspace into /workspace, and creates or starts a persistent container.
When --image is omitted, the manager now defaults to the v0.9.1-dev image
family, probes the host for an A2/910B vs A3 recommendation, and interactively
confirms the official variant (v0.9.1-dev, -a3, -openeuler, or
-a3-openeuler). Use --non-interactive to skip prompts in automation.
Use container shell to enter that environment later without rebuilding the
mount list, and container exec -- ... to run one-off checks or launches.
If you want a single-command deployment for direct SSH access into the container,
run container ssh-deploy. It creates or starts the container, installs
openssh-server inside it when needed, configures a dedicated SSH port, and
copies your mounted authorized_keys into the container user home.
container ssh-enable remains available when the container is already running
and you only want to refresh the in-container SSH setup.
The design follows upstream vLLM's plugin philosophy: hardware-specific setup and runtime adaptation should live outside the upstream core runtime path.
cd /home/shuhao/ascend-runtime-manager
python -m pip install -e .Or install from PyPI (recommended for teammates):
python -m pip install --upgrade hust-ascend-managerLocal publish with token:
cd /home/shuhao/ascend-runtime-manager
PYPI_TOKEN=pypi-xxxxx bash scripts/publish_pypi.shCI publish:
- set repository secret
PYPI_TOKEN - push a tag like
v0.1.0or run workflow dispatch
setup --apply-systemexecutes commands from manifest and may require sudo.- Use
setup --non-interactivewhen calling manager from automation. It will fail fast instead of hanging on an interactivesgorsudopassword prompt. setup --install-python-stacknow auto-probeshttps://pypi.tuna.tsinghua.edu.cn/simplewhenPIP_INDEX_URLis unset, and falls back to the default upstream index when the mirror is unreachable.- Tune large wheel downloads with
HUST_ASCEND_MANAGER_PIP_RETRIES,HUST_ASCEND_MANAGER_PIP_TIMEOUT,HUST_ASCEND_MANAGER_PIP_RESUME_RETRIES,HUST_ASCEND_MANAGER_PIP_INDEX_URL, andHUST_ASCEND_MANAGER_PIP_EXTRA_INDEX_URL. SetHUST_ASCEND_MANAGER_DISABLE_PYPI_MIRROR_AUTOSET=1to disable automatic mirror selection. containerusesdockerdirectly when available, otherwise falls back tosudo -n docker.container ssh-deployis the one-click path for direct SSH-to-container access.container ssh-enabledefaults to host port2222, usershuhao, andauthorized_keyssource/workspace/.ssh/authorized_keys.- Keep binary payloads out of this repository. Use internal mirrors/artifact stores.
- If your account was newly added to
HwHiAiUser, re-login is still recommended. setupis intentionally tolerant of a partially broken initial Ascend install: it can still reconcile the Python stack and planned CANN steps even whendoctorcannot yet build a complete runtime env.