Added fallback to load extra cudnn dlls in the site packages by hthadicherla · Pull Request #1369 · NVIDIA/Model-Optimizer

hthadicherla · 2026-04-29T11:31:46Z

What does this PR do?

Type of change: Bug fix

CUDNN 9.21 added a new dll dependency called cudnn_engines_tensor_ir64_9.dll that ort.preload_dlls() is not updated on for windows to load this dll hence fails trying to load cudnn when just nvidia-cudnn-cu12>9.20 package is used.
So added code to add any extra dlls from the site-packages folder that the preload function misses.

Summary by CodeRabbit

Bug Fixes
- Improved Windows cuDNN detection and loading for ONNX Runtime with CUDA by scanning installed cuDNN packages and attempting to load any missing DLLs to reduce startup failures.
- Enhanced logging and diagnostics: preload output is now surfaced as warnings and individual DLL load successes/failures are logged to aid troubleshooting.

…er that ort.preload_dlls() missed Signed-off-by: Hrishith Thadicherla <hthadicherla@nvidia.com>

coderabbitai · 2026-04-29T11:31:57Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f08d01f8-39aa-4f87-9206-bc7186f5736e

📥 Commits

Reviewing files that changed from the base of the PR and between 76436e4 and 0fa6624.

📒 Files selected for processing (1)

modelopt/onnx/quantization/ort_utils.py

📝 Walkthrough

Walkthrough

Windows-specific cuDNN detection in ort_utils.py was expanded: _check_for_libcudnn() now treats onnxruntime.preload_dlls() output as warnings, uses a Windows-specific preload failure key, and on successful preload enumerates nvidia-cudnn-cu* site-packages for cudnn*.dll, skipping already-loaded modules and attempting ctypes.CDLL loads while logging results.

Changes

Cohort / File(s)	Summary
cuDNN Windows support & loading `modelopt/onnx/quantization/ort_utils.py`	Extended `_check_for_libcudnn()` to log `onnxruntime.preload_dlls()` output at warning level, check Windows-specific failure sentinel (`cudnn64_9`), locate `nvidia-cudnn-cu` site-packages and list `cudnn.dll`, skip DLLs already present via `GetModuleHandleW`, and attempt `ctypes.CDLL` loads for remaining DLLs while logging loaded and failed DLLs.

Sequence Diagram(s)

sequenceDiagram
    participant Python as PythonProcess (ort_utils)
    participant ORT as ONNXRuntime
    participant FS as FileSystem (site-packages)
    participant WinAPI as WindowsAPI (GetModuleHandleW)
    participant CT as ctypes.CDLL

    Python->>ORT: call preload_dlls()
    ORT-->>Python: preload_result (dict)
    alt preload_result contains Windows failure key (cudnn64_9)
        Python->>Python: log warning (preload reported failure)
    else preload appears successful
        Python->>FS: scan site-packages for nvidia-cudnn-cu* -> list cudnn*.dll
        loop for each cudnn DLL
            Python->>WinAPI: GetModuleHandleW(dll_name)
            alt module already loaded
                WinAPI-->>Python: handle present -> skip DLL
            else not loaded
                Python->>CT: ctypes.CDLL(full_dll_path)
                alt load success
                    CT-->>Python: success -> log loaded DLL
                else load failure
                    CT-->>Python: error -> log failure
                end
            end
        end
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: adding a fallback mechanism to load extra cuDNN DLLs from site packages on Windows.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	No security anti-patterns detected: no unsafe torch.load/numpy.load calls, no eval/exec, no trust_remote_code=True, no nosec comments. DLL loading uses safe pre-installed system library paths.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch hthadicherla/fix-cudnn-load-function

_{Review rate limit: 9/10 reviews remaining, refill in 6 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/onnx/quantization/ort_utils.py`:
- Around line 154-155: The docstring incorrectly states that _find_cudnn_bin_dir
is the "caller" of this function; update the docstring to correctly describe the
relationship (this function is the caller/helper that invokes
_find_cudnn_bin_dir or vice versa as appropriate). Locate the docstring in
ort_utils.py around the function that references _find_cudnn_bin_dir and change
the wording to state that this function calls (or is a helper/child of)
_find_cudnn_bin_dir, or simply describe their relationship accurately (e.g.,
"this function calls _find_cudnn_bin_dir" or "helper/callee" vs "caller"),
ensuring function names _find_cudnn_bin_dir and the current function are
mentioned for clarity.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: fc851a14-a70d-4dff-a6fb-0235d3765f77

📥 Commits

Reviewing files that changed from the base of the PR and between 9bb917d and 5192b20.

📒 Files selected for processing (1)

modelopt/onnx/quantization/ort_utils.py

github-actions · 2026-04-29T11:35:34Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1369/
Built to branch `gh-pages` at 2026-04-30 15:52 UTC. Preview will be ready when the GitHub Pages deployment is complete.

…r in linux since this fix is only for windows Signed-off-by: Hrishith Thadicherla <hthadicherla@nvidia.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/onnx/quantization/ort_utils.py`:
- Around line 147-201: The current logic in _load_extra_cudnn_dlls (and the
caller _check_for_libcudnn) swallows exceptions from ort.preload_dlls() and
treats partial extra-DLL load failures as success; change the flow so that if
ort.preload_dlls() raises we log and immediately return False (or re-raise), and
after attempting to load extra cudnn DLLs, if any DLLs failed to load (the
failed list is non-empty) return False (or raise an appropriate error) instead
of allowing a success return; update callers of
_load_extra_cudnn_dlls/_check_for_libcudnn to respect the boolean failure and
propagate failure to the caller so that partial/failed cuDNN loads are not
reported as available.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9ff98f02-7045-4bea-8a6b-df1776d8d3d3

📥 Commits

Reviewing files that changed from the base of the PR and between 5192b20 and 7e56229.

📒 Files selected for processing (1)

modelopt/onnx/quantization/ort_utils.py

coderabbitai · 2026-04-29T11:43:59Z

+def _load_extra_cudnn_dlls():
+    """Load any cuDNN DLLs from site-packages that ORT's preload_dlls() missed.
+
+    TEMPORARY WORKAROUND: This function exists because ort.preload_dlls() has a
+    hardcoded list of cuDNN sub-libraries which may be incomplete for newer cuDNN
+    versions (e.g. cuDNN 9.21 added cudnn_engines_tensor_ir64_9.dll, cuDNN 9.20
+    added cudnn_cnn64_9.dll). Once ort.preload_dlls() is fixed upstream to
+    dynamically discover all cuDNN DLLs, this function and its caller
+    (_find_cudnn_bin_dir) should be removed.
+
+    This scans the nvidia-cudnn bin directory and loads any cudnn*.dll not already
+    loaded in the process.
+    """
+    import ctypes
+    import ctypes.wintypes
+
+    cudnn_bin_dir = _find_cudnn_bin_dir()
+    if not cudnn_bin_dir:
+        logger.debug(
+            "nvidia-cudnn bin directory not found in site-packages, skipping extra DLL load"
+        )
+        return
+
+    dll_files = sorted(glob.glob(os.path.join(cudnn_bin_dir, "cudnn*.dll")))
+    if not dll_files:
+        return
+
+    get_module_handle_w = ctypes.windll.kernel32.GetModuleHandleW  # type: ignore[attr-defined]
+    get_module_handle_w.argtypes = [ctypes.wintypes.LPCWSTR]
+    get_module_handle_w.restype = ctypes.wintypes.HMODULE
+
+    loaded = []
+    skipped = []
+    failed = []
+    for dll_path in dll_files:
+        dll_name = os.path.basename(dll_path)
+        if get_module_handle_w(dll_name):
+            skipped.append(dll_name)
+            continue
+        try:
+            ctypes.CDLL(dll_path)
+            loaded.append(dll_name)
+        except OSError as e:
+            failed.append(dll_name)
+            logger.warning(f"Failed to load {dll_name} from site-packages: {e}")
+
+    if skipped:
+        logger.debug(f"Already loaded (skipped): {skipped}")
+    if loaded:
+        logger.info(
+            f"Loaded {len(loaded)} extra cuDNN DLLs that ort.preload_dlls() missed: {loaded}"
+        )
+    if failed:
+        logger.warning(f"Failed to load {len(failed)} cuDNN DLLs: {failed}")
+


⚠️ Potential issue | 🟠 Major

Preload failure can still return success (True).

If ort.preload_dlls() raises, the exception is only logged and the flow can still hit the success branch and return True. Also, extra-DLL load failures are not propagated to the caller, so _check_for_libcudnn() may report success while cuDNN is still partially unavailable.

Proposed fix

-def _load_extra_cudnn_dlls(): +def _load_extra_cudnn_dlls() -> list[str]: @@ - return + return [] @@ - return + return [] @@ if failed: logger.warning(f"Failed to load {len(failed)} cuDNN DLLs: {failed}") + return failed @@ - try: + preload_exception: Exception | None = None + try: with redirect_stdout(captured), redirect_stderr(captured): ort.preload_dlls() except Exception as e: + preload_exception = e logger.warning(f"onnxruntime.preload_dlls() raised an exception: {e}") @@ - if f"Failed to load {core_cudnn_dll}" in preload_output: + preload_failed = preload_exception is not None or ( + f"Failed to load {core_cudnn_dll}" in preload_output + ) + extra_failed: list[str] = [] + if not preload_failed and platform.system() == "Windows": + extra_failed = _load_extra_cudnn_dlls() + preload_failed = len(extra_failed) > 0 + + if preload_failed: logger.error( f"onnxruntime.preload_dlls() was called but {core_cudnn_dll} failed to load. " "cuDNN DLLs were NOT successfully loaded from site-packages." ) else: - if platform.system() == "Windows": - _load_extra_cudnn_dlls() logger.info( "onnxruntime.preload_dlls() succeeded — CUDA/cuDNN DLLs loaded" " from site-packages. Verify version compatibility at" " https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements." ) return True

Also applies to: 229-252

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/quantization/ort_utils.py` around lines 147 - 201, The current logic in _load_extra_cudnn_dlls (and the caller _check_for_libcudnn) swallows exceptions from ort.preload_dlls() and treats partial extra-DLL load failures as success; change the flow so that if ort.preload_dlls() raises we log and immediately return False (or re-raise), and after attempting to load extra cudnn DLLs, if any DLLs failed to load (the failed list is non-empty) return False (or raise an appropriate error) instead of allowing a success return; update callers of _load_extra_cudnn_dlls/_check_for_libcudnn to respect the boolean failure and propagate failure to the caller so that partial/failed cuDNN loads are not reported as available.

@coderabbitai This seems like a non issue, since there is no real way to log failure through ort_preload_dlls. So we just check if the function loads cudnn64 main dll and if it is not loaded then only log failure. The extra dll will try to load all dlls in venv nvidia folder . There should be no error logging there.

@hthadicherla, that's a fair clarification — the best-effort philosophy for extra DLL loading makes sense, and I agree that failures in _load_extra_cudnn_dlls() should remain warnings rather than propagating as hard errors.

One narrow concern still stands though: if ort.preload_dlls() raises an exception, captured is left empty, so the "Failed to load cudnn64_9" check silently passes (empty string never contains the pattern), _load_extra_cudnn_dlls() is called, and the function returns True — even though the main preload attempt actually failed with an exception. The exception is already being caught and logged as a warning, so this isn't a silent failure, but the function still surfaces as "success" to its caller.

A minimal guard for just that case (without changing the best-effort DLL logic at all):

preload_exception = None try: with redirect_stdout(captured), redirect_stderr(captured): ort.preload_dlls() except Exception as e: preload_exception = e logger.warning(f"onnxruntime.preload_dlls() raised an exception: {e}") preload_output = captured.getvalue() if preload_output: logger.warning(f"preload_dlls() output:\n{preload_output}") core_cudnn_dll = "cudnn64_9" if platform.system() == "Windows" else "libcudnn_adv" if preload_exception is not None or f"Failed to load {core_cudnn_dll}" in preload_output: logger.error(...) else: if platform.system() == "Windows": _load_extra_cudnn_dlls() # best-effort, failures stay as warnings logger.info("onnxruntime.preload_dlls() succeeded ...") return True

That keeps _load_extra_cudnn_dlls() fully best-effort while correctly treating a raised exception as a load failure. Happy to drop this entirely if you consider the exception path acceptable in practice.

🧠 Learnings used

Learnt from: kevalmorabia97 Repo: NVIDIA/Model-Optimizer PR: 974 File: modelopt/torch/puzzletron/pruning/pruning_utils.py:195-201 Timestamp: 2026-03-06T14:26:21.439Z Learning: In NVIDIA/Model-Optimizer, for PyTorch >= 2.6, torch.load() calls without an explicit weights_only argument are safe. Do not flag bare torch.load(...) as a security issue in files under the modelopt package (e.g., modelopt/torch/puzzletron/pruning/pruning_utils.py) as long as the PyTorch version constraint is maintained. If supporting PyTorch < 2.6, require an explicit weights_only argument to torch.load() to avoid potential issues.

codecov · 2026-04-29T11:52:57Z

Codecov Report

❌ Patch coverage is 15.09434% with 45 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.34%. Comparing base (077e29a) to head (0fa6624).
⚠️ Report is 5 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/onnx/quantization/ort_utils.py	15.09%	45 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1369       +/-   ##
===========================================
- Coverage   76.48%   66.34%   -10.15%     
===========================================
  Files         471      471               
  Lines       50487    51002      +515     
===========================================
- Hits        38617    33837     -4780     
- Misses      11870    17165     +5295

Flag	Coverage Δ
examples	`41.55% <7.54%> (+10.80%)`	⬆️
gpu	`26.79% <0.00%> (-33.36%)`	⬇️
unit	`52.73% <15.09%> (-0.05%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

kevalmorabia97 · 2026-04-29T14:00:48Z

+    """Locate the nvidia cudnn bin directory inside site-packages."""
+    from importlib.metadata import PackageNotFoundError, distribution
+
+    for pkg_name in ("nvidia-cudnn-cu12", "nvidia-cudnn-cu11", "nvidia-cudnn-cu13"):


We dont support Cuda11. Lets keep 12 and 13 only

Suggested change

for pkg_name in ("nvidia-cudnn-cu12", "nvidia-cudnn-cu11", "nvidia-cudnn-cu13"):

for pkg_name in ("nvidia-cudnn-cu12", "nvidia-cudnn-cu13"):

kevalmorabia97 · 2026-04-29T14:00:57Z


+def _find_cudnn_bin_dir():
+    """Locate the nvidia cudnn bin directory inside site-packages."""
+    from importlib.metadata import PackageNotFoundError, distribution


pls move this import to the top of the file

Done. Made commit for this

Signed-off-by: Hrishith Thadicherla <hthadicherla@nvidia.com>

ajrasane

LGTM. Left two minor comments. PTAL.

ajrasane · 2026-04-30T15:21:30Z

+
+    dll_files = sorted(glob.glob(os.path.join(cudnn_bin_dir, "cudnn*.dll")))
+    if not dll_files:
+        return


nit: could you add a logger.debug statement that no cudnn dll files were found?

Signed-off-by: Hrishith Thadicherla <hthadicherla@nvidia.com>

hthadicherla · 2026-04-30T17:41:58Z

Hey @kevalmorabia97 , as you can see this test https://github.com/NVIDIA/Model-Optimizer/actions/runs/25175182312/job/73807528625?pr=1369 is failing [gpu-tests (gpu, 60, pytorch:26.03-py3] is cancelling out at 1hr mark regardless of pass or fail.

Can you increase the timeout for this test in CI/CD ?

Added fallback to load extra cudnn dlls in the site packages lib fold…

5192b20

…er that ort.preload_dlls() missed Signed-off-by: Hrishith Thadicherla <hthadicherla@nvidia.com>

hthadicherla requested a review from vishalpandya1990 April 29, 2026 11:31

hthadicherla requested a review from a team as a code owner April 29, 2026 11:31

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread modelopt/onnx/quantization/ort_utils.py Outdated

Added comment to ignore mypy check that was causing code quality erro…

7e56229

…r in linux since this fix is only for windows Signed-off-by: Hrishith Thadicherla <hthadicherla@nvidia.com>

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

hthadicherla requested a review from ajrasane April 29, 2026 11:45

kevalmorabia97 added the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Apr 29, 2026

kevalmorabia97 reviewed Apr 29, 2026

View reviewed changes

Made changes based on suggestions

76436e4

Signed-off-by: Hrishith Thadicherla <hthadicherla@nvidia.com>

hthadicherla requested a review from gcunhase April 30, 2026 11:04

ajrasane approved these changes Apr 30, 2026

View reviewed changes

Made changes based on suggestions

0fa6624

Signed-off-by: Hrishith Thadicherla <hthadicherla@nvidia.com>

	for pkg_name in ("nvidia-cudnn-cu12", "nvidia-cudnn-cu11", "nvidia-cudnn-cu13"):
	for pkg_name in ("nvidia-cudnn-cu12", "nvidia-cudnn-cu13"):

Conversation

hthadicherla commented Apr 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-04-30 15:52 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hthadicherla Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kevalmorabia97 Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

hthadicherla Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

kevalmorabia97 Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

hthadicherla Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

ajrasane left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ajrasane Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

hthadicherla commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hthadicherla commented Apr 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

github-actions Bot commented Apr 29, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-04-30 15:52 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

coderabbitai Bot Apr 29, 2026 •

edited

Loading

codecov Bot commented Apr 29, 2026 •

edited

Loading