Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/unit_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ jobs:
- {nox_session: "unit-3.10(torch_211, tf_latest)", python_version: "3.10"}
- {nox_session: "unit-3.11(torch_211, tf_latest)", python_version: "3.11"}
- {nox_session: "unit-3.13(torch_211, tf_latest)", python_version: "3.13"}
- {nox_session: "unit-3.14(torch_211, tf_latest)", python_version: "3.14"}
- {nox_session: "unit-3.12(torch_28, tf_latest)", python_version: "3.12"}
- {nox_session: "unit-3.12(torch_29, tf_latest)", python_version: "3.12"}
- {nox_session: "unit-3.12(torch_210, tf_latest)", python_version: "3.12"}
Expand Down
3 changes: 2 additions & 1 deletion CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Changelog
- Add ``--cast_mxfp4_to_nvfp4`` flag to ``examples/llm_ptq/hf_ptq.py`` for closed-form, bit-exact MXFP4 → NVFP4 weight conversion. Supports the GPT-OSS family (``openai/gpt-oss-20b``, ``openai/gpt-oss-120b``). See `examples/llm_ptq/README.md <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/llm_ptq#mxfp4--nvfp4-cast-for-gpt-oss>`__ for usage.
- DeepSeek PTQ (``examples/deepseek/ptq.py``) now defaults to native top-k calibration with post-hoc per-layer peer-max sync of expert ``input_quantizer.amax``; the all-experts path is preserved behind ``--calib_all_experts``.

0.44 (2026-05-xx)
0.44 (2026-05-18)
^^^^^^^^^^^^^^^^^

**New Features**
Expand Down Expand Up @@ -60,6 +60,7 @@ Changelog
- Bump minimum required PyTorch version to 2.8.
- [Experimental] Add support for transformers>=5.0, including generic PTQ and unified HF checkpoint export for fused MoE expert modules (Mixtral, Qwen2-MoE, Qwen3-MoE, Qwen3.5-MoE, DeepSeek-V3, Jamba, OLMoE, etc.).
- Improve ``megatron_preprocess_data``: add ``--reasoning_content`` support for Nemotron v3 datasets, eliminate intermediate JSONL for HuggingFace datasets, return output file prefixes from the Python API, add gzip input support (``.jsonl.gz``), add ``--strip_newlines`` flag for plain-text pretraining data, add ``--hf_streaming`` for very large datasets (only consumed rows downloaded), and auto-shuffle when ``--hf_max_samples_per_split`` is set to avoid biased sampling.
- Add installation support for Python 3.14. Only basic unit tests are verified for now. Production usage still defaults to Python 3.12. Python 3.10 support will be dropped in the next release.

0.43 (2026-04-16)
^^^^^^^^^^^^^^^^^
Expand Down
2 changes: 1 addition & 1 deletion docs/source/getting_started/_installation_for_Linux.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Latest Model Optimizer (``nvidia-modelopt``) currently has the following system
+-------------------------+-----------------------------+
| Architecture | x86_64, aarch64 (SBSA) |
+-------------------------+-----------------------------+
| Python | >=3.10,<3.14 |
| Python | >=3.10,<3.15 |
+-------------------------+-----------------------------+
| CUDA | 12.x, 13.x |
+-------------------------+-----------------------------+
Expand Down
2 changes: 1 addition & 1 deletion noxfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ def _cov_args():


# ─── CPU unit tests ───────────────────────────────────────────────────────────
@nox.session(python=["3.10", "3.11", "3.12", "3.13"])
@nox.session(python=["3.10", "3.11", "3.12", "3.13", "3.14"])
@nox.parametrize("tf_ver", [nox.param(k, id=k) for k in TRANSFORMERS_VERSIONS])
@nox.parametrize("torch_ver", [nox.param(k, id=k) for k in TORCH_VERSIONS])
def unit(session, torch_ver, tf_ver):
Expand Down
8 changes: 6 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ description = "Nvidia Model Optimizer: a unified model optimization and deployme
readme = { text = "Checkout https://github.com/nvidia/Model-Optimizer for more information.", content-type = "text/markdown" }
license = "Apache-2.0"
license-files = ["LICENSE_HEADER"]
requires-python = ">=3.10,<3.14"
requires-python = ">=3.10,<3.15"
authors = [{ name = "NVIDIA Corporation" }]
classifiers = [
"Programming Language :: Python :: 3",
Expand Down Expand Up @@ -227,7 +227,11 @@ extend-ignore = [
"SIM",
"UP",
] # TODO: Disabled for now, will enable later, once all puzzletron code is migrated
"modelopt/torch/kernels/quantization/gemm/*" = ["N803", "N806", "E731"] # triton style
"modelopt/torch/kernels/quantization/gemm/*" = [
"N803",
"N806",
"E731",
] # triton style
"modelopt/torch/kernels/sparsity/attention/*" = [
"N803",
"N806",
Expand Down
Loading