-
Notifications
You must be signed in to change notification settings - Fork 390
Add CLAUDE.md #956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CLAUDE.md #956
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
|
rohansjoshi marked this conversation as resolved.
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,106 @@ | ||
| # CLAUDE.md | ||
|
|
||
| NVIDIA Model Optimizer (ModelOpt): open-source library for model optimization techniques including | ||
| quantization, pruning, distillation, sparsity, and speculative decoding to accelerate inference. | ||
| Primarily Python codebase with optional C++/CUDA extensions supporting PyTorch, ONNX, and Hugging Face/Megatron models. | ||
|
|
||
| > If a `CLAUDE.local.md` file exists alongside this file, read and respect it — it contains | ||
| > developer-specific overrides that supplement this shared guidance. | ||
|
|
||
| ## Rules (Read First) | ||
|
|
||
| **CRITICAL (YOU MUST):** | ||
|
|
||
| - NVIDIA Apache 2.0 license header on ALL new Python/C++/CUDA files (see `LICENSE_HEADER`) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unify license-header scope to avoid contradictory guidance. Line 14 says headers are required on new Python/C++/CUDA files, while Lines 123–124 also include Also applies to: 123-124 🤖 Prompt for AI Agents
rohansjoshi marked this conversation as resolved.
|
||
| - `git commit -s -S` (DCO sign-off + cryptographic signing required). Never attribute AI tools in | ||
| sign-off line | ||
| - `pre-commit` hooks run on commit — if files are modified by hooks, re-stage and commit again | ||
| - PRs require CODEOWNERS review (auto-assigned based on `.github/CODEOWNERS`) | ||
| - After rebasing, always re-run tests locally before pushing | ||
| - All code must follow the security guidelines in `SECURITY.md` — violations are blocked as pre-merge errors | ||
| - For contribution guidelines, commit conventions, and PR requirements, see `CONTRIBUTING.md` | ||
|
|
||
| ## Common Commands | ||
|
|
||
| | Task | Command | | ||
| |------|---------| | ||
| | Install (editable + dev) | `pip install -e ".[dev]"` | | ||
| | CPU unit tests | `python -m pytest tests/unit` | | ||
| | GPU unit tests | `python -m pytest tests/gpu` | | ||
| | Megatron GPU tests | `python -m pytest tests/gpu_megatron` | | ||
|
rohansjoshi marked this conversation as resolved.
|
||
| | TRT-LLM GPU tests | `python -m pytest tests/gpu_trtllm` | | ||
| | Pattern match | `pytest tests/unit -k "test_quantize"` | | ||
| | Lint + format (all files) | `pre-commit run --all-files` | | ||
| | Lint (diff only) | `pre-commit run --from-ref origin/main --to-ref HEAD` | | ||
| | Run via tox (CPU unit) | `tox -e py312-torch210-tf_latest-unit` | | ||
| | Build docs | `tox -e build-docs` | | ||
| | Build wheel | `tox -e build-wheel` | | ||
|
|
||
| ## Architecture | ||
|
|
||
| ModelOpt is organized into three top-level namespaces: | ||
|
|
||
| | Namespace | Path | Role | | ||
| |-----------|------|------| | ||
| | `modelopt.torch` | `modelopt/torch/` | Core PyTorch optimization library | | ||
| | `modelopt.onnx` | `modelopt/onnx/` | ONNX model quantization and export | | ||
| | `modelopt.deploy` | `modelopt/deploy/` | Deployment utilities for LLMs | | ||
|
|
||
| ### `modelopt.torch` Sub-packages | ||
|
|
||
| | Sub-package | Path | Role | | ||
| |-------------|------|------| | ||
| | `opt` | `modelopt/torch/opt/` | Core optimization infrastructure (modes, config, state dicts) | | ||
| | `quantization` | `modelopt/torch/quantization/` | PTQ, QAT, and quantization-aware algorithms | | ||
| | `prune` | `modelopt/torch/prune/` | Structured and unstructured pruning | | ||
| | `distill` | `modelopt/torch/distill/` | Knowledge distillation | | ||
| | `sparsity` | `modelopt/torch/sparsity/` | Weight and activation sparsity | | ||
| | `speculative` | `modelopt/torch/speculative/` | Speculative decoding (Medusa, EAGLE, etc.) | | ||
| | `nas` | `modelopt/torch/nas/` | Neural architecture search | | ||
| | `export` | `modelopt/torch/export/` | Checkpoint export for TRT-LLM / Megatron | | ||
| | `peft` | `modelopt/torch/peft/` | QLoRA and PEFT integration | | ||
| | `_deploy` | `modelopt/torch/_deploy/` | Internal deployment utilities | | ||
| | `utils` | `modelopt/torch/utils/` | Shared utilities and plugin infrastructure | | ||
|
|
||
| ### Core Abstraction: Modes | ||
|
|
||
| A **mode** is the unit of model optimization in ModelOpt. Each algorithm (quantization, pruning, | ||
| etc.) is implemented as one or more modes. Modes are recorded in the model's `modelopt_state` so | ||
| optimization workflows can be composed, saved, and restored. | ||
|
|
||
| ## Key Files | ||
|
|
||
| | File | Role | | ||
| |------|------| | ||
| | `modelopt/torch/opt/mode.py` | Base class for all optimization modes | | ||
| | `modelopt/torch/opt/config.py` | Configuration system for modes | | ||
| | `modelopt/torch/opt/conversion.py` | `apply_mode()` / `restore()` entry points | | ||
| | `modelopt/torch/quantization/__init__.py` | PTQ/QAT public API | | ||
| | `modelopt/torch/export/unified_export_hf.py` | Unified HF checkpoint export | | ||
| | `modelopt/torch/export/model_config_export.py` | TRT-LLM model config export | | ||
| | `modelopt/deploy/llm/` | LLM deployment utilities | | ||
| | `pyproject.toml` | Optional dependency groups (`[onnx]`, `[hf]`, `[all]`, `[dev]`); ruff, mypy, pytest, bandit, and coverage config | | ||
| | `.pre-commit-config.yaml` | Pre-commit hooks (ruff, mypy, clang-format, license headers) | | ||
| | `tox.ini` | Test environment definitions | | ||
|
|
||
| ## Design Patterns | ||
|
|
||
| | Pattern | Key Points | | ||
| |---------|------------| | ||
| | **Mode composition** | Optimization algorithms are composed as sequences of modes, each recorded in `modelopt_state` | | ||
| | **Plugin system** | Optional integrations (HuggingFace, Megatron, etc.) loaded lazily via `import_plugin()` | | ||
| | **Optional dependencies** | Features gated by install extras (`[onnx]`, `[hf]`, `[all]`); avoid hard imports at module level | | ||
| | **Config dataclasses** | Each mode has a typed config; use Pydantic or dataclass conventions | | ||
| | **State dict** | Models carry `modelopt_state` for checkpoint save/restore across optimization steps | | ||
|
|
||
| ## CI / Testing | ||
|
|
||
| | Layer | Location | Notes | | ||
| |-------|----------|-------| | ||
| | CPU unit tests | `tests/unit/` | Fast, no GPU needed; run in pre-merge CI | | ||
| | GPU unit tests | `tests/gpu/` | Requires CUDA GPU | | ||
| | Megatron GPU tests | `tests/gpu_megatron/` | Requires Megatron-Core + GPU | | ||
| | TRT-LLM GPU tests | `tests/gpu_trtllm/` | Requires TensorRT-LLM + GPU | | ||
| | Example/integration tests | `tests/examples/` | Integration tests for examples; see `tests/examples/README.md` | | ||
| | Pre-commit / lint | `.pre-commit-config.yaml` | ruff, mypy, clang-format, license headers, bandit | | ||
| | Coverage | `pyproject.toml` | 70% minimum on `modelopt/*` | | ||
Uh oh!
There was an error while loading. Please reload this page.