Skip to content

Added fallback to load extra cudnn dlls in the site packages #1369

Open
hthadicherla wants to merge 4 commits intomainfrom
hthadicherla/fix-cudnn-load-function
Open

Added fallback to load extra cudnn dlls in the site packages #1369
hthadicherla wants to merge 4 commits intomainfrom
hthadicherla/fix-cudnn-load-function

Conversation

@hthadicherla
Copy link
Copy Markdown
Contributor

@hthadicherla hthadicherla commented Apr 29, 2026

What does this PR do?

Type of change: Bug fix

CUDNN 9.21 added a new dll dependency called cudnn_engines_tensor_ir64_9.dll that ort.preload_dlls() is not updated on for windows to load this dll hence fails trying to load cudnn when just nvidia-cudnn-cu12>9.20 package is used.
So added code to add any extra dlls from the site-packages folder that the preload function misses.

Summary by CodeRabbit

  • Bug Fixes
    • Improved Windows cuDNN detection and loading for ONNX Runtime with CUDA by scanning installed cuDNN packages and attempting to load any missing DLLs to reduce startup failures.
    • Enhanced logging and diagnostics: preload output is now surfaced as warnings and individual DLL load successes/failures are logged to aid troubleshooting.

…er that ort.preload_dlls() missed

Signed-off-by: Hrishith Thadicherla <hthadicherla@nvidia.com>
@hthadicherla hthadicherla requested a review from a team as a code owner April 29, 2026 11:31
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f08d01f8-39aa-4f87-9206-bc7186f5736e

📥 Commits

Reviewing files that changed from the base of the PR and between 76436e4 and 0fa6624.

📒 Files selected for processing (1)
  • modelopt/onnx/quantization/ort_utils.py

📝 Walkthrough

Walkthrough

Windows-specific cuDNN detection in ort_utils.py was expanded: _check_for_libcudnn() now treats onnxruntime.preload_dlls() output as warnings, uses a Windows-specific preload failure key, and on successful preload enumerates nvidia-cudnn-cu* site-packages for cudnn*.dll, skipping already-loaded modules and attempting ctypes.CDLL loads while logging results.

Changes

Cohort / File(s) Summary
cuDNN Windows support & loading
modelopt/onnx/quantization/ort_utils.py
Extended _check_for_libcudnn() to log onnxruntime.preload_dlls() output at warning level, check Windows-specific failure sentinel (cudnn64_9), locate nvidia-cudnn-cu* site-packages and list cudnn*.dll, skip DLLs already present via GetModuleHandleW, and attempt ctypes.CDLL loads for remaining DLLs while logging loaded and failed DLLs.

Sequence Diagram(s)

sequenceDiagram
    participant Python as PythonProcess (ort_utils)
    participant ORT as ONNXRuntime
    participant FS as FileSystem (site-packages)
    participant WinAPI as WindowsAPI (GetModuleHandleW)
    participant CT as ctypes.CDLL

    Python->>ORT: call preload_dlls()
    ORT-->>Python: preload_result (dict)
    alt preload_result contains Windows failure key (cudnn64_9)
        Python->>Python: log warning (preload reported failure)
    else preload appears successful
        Python->>FS: scan site-packages for nvidia-cudnn-cu* -> list cudnn*.dll
        loop for each cudnn DLL
            Python->>WinAPI: GetModuleHandleW(dll_name)
            alt module already loaded
                WinAPI-->>Python: handle present -> skip DLL
            else not loaded
                Python->>CT: ctypes.CDLL(full_dll_path)
                alt load success
                    CT-->>Python: success -> log loaded DLL
                else load failure
                    CT-->>Python: error -> log failure
                end
            end
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding a fallback mechanism to load extra cuDNN DLLs from site packages on Windows.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed No security anti-patterns detected: no unsafe torch.load/numpy.load calls, no eval/exec, no trust_remote_code=True, no nosec comments. DLL loading uses safe pre-installed system library paths.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch hthadicherla/fix-cudnn-load-function

Review rate limit: 9/10 reviews remaining, refill in 6 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/onnx/quantization/ort_utils.py`:
- Around line 154-155: The docstring incorrectly states that _find_cudnn_bin_dir
is the "caller" of this function; update the docstring to correctly describe the
relationship (this function is the caller/helper that invokes
_find_cudnn_bin_dir or vice versa as appropriate). Locate the docstring in
ort_utils.py around the function that references _find_cudnn_bin_dir and change
the wording to state that this function calls (or is a helper/child of)
_find_cudnn_bin_dir, or simply describe their relationship accurately (e.g.,
"this function calls _find_cudnn_bin_dir" or "helper/callee" vs "caller"),
ensuring function names _find_cudnn_bin_dir and the current function are
mentioned for clarity.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: fc851a14-a70d-4dff-a6fb-0235d3765f77

📥 Commits

Reviewing files that changed from the base of the PR and between 9bb917d and 5192b20.

📒 Files selected for processing (1)
  • modelopt/onnx/quantization/ort_utils.py

Comment thread modelopt/onnx/quantization/ort_utils.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 29, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1369/

Built to branch gh-pages at 2026-04-30 15:52 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

…r in linux since this fix is only for windows

Signed-off-by: Hrishith Thadicherla <hthadicherla@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/onnx/quantization/ort_utils.py`:
- Around line 147-201: The current logic in _load_extra_cudnn_dlls (and the
caller _check_for_libcudnn) swallows exceptions from ort.preload_dlls() and
treats partial extra-DLL load failures as success; change the flow so that if
ort.preload_dlls() raises we log and immediately return False (or re-raise), and
after attempting to load extra cudnn DLLs, if any DLLs failed to load (the
failed list is non-empty) return False (or raise an appropriate error) instead
of allowing a success return; update callers of
_load_extra_cudnn_dlls/_check_for_libcudnn to respect the boolean failure and
propagate failure to the caller so that partial/failed cuDNN loads are not
reported as available.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9ff98f02-7045-4bea-8a6b-df1776d8d3d3

📥 Commits

Reviewing files that changed from the base of the PR and between 5192b20 and 7e56229.

📒 Files selected for processing (1)
  • modelopt/onnx/quantization/ort_utils.py

Comment on lines +147 to +201
def _load_extra_cudnn_dlls():
"""Load any cuDNN DLLs from site-packages that ORT's preload_dlls() missed.

TEMPORARY WORKAROUND: This function exists because ort.preload_dlls() has a
hardcoded list of cuDNN sub-libraries which may be incomplete for newer cuDNN
versions (e.g. cuDNN 9.21 added cudnn_engines_tensor_ir64_9.dll, cuDNN 9.20
added cudnn_cnn64_9.dll). Once ort.preload_dlls() is fixed upstream to
dynamically discover all cuDNN DLLs, this function and its caller
(_find_cudnn_bin_dir) should be removed.

This scans the nvidia-cudnn bin directory and loads any cudnn*.dll not already
loaded in the process.
"""
import ctypes
import ctypes.wintypes

cudnn_bin_dir = _find_cudnn_bin_dir()
if not cudnn_bin_dir:
logger.debug(
"nvidia-cudnn bin directory not found in site-packages, skipping extra DLL load"
)
return

dll_files = sorted(glob.glob(os.path.join(cudnn_bin_dir, "cudnn*.dll")))
if not dll_files:
return

get_module_handle_w = ctypes.windll.kernel32.GetModuleHandleW # type: ignore[attr-defined]
get_module_handle_w.argtypes = [ctypes.wintypes.LPCWSTR]
get_module_handle_w.restype = ctypes.wintypes.HMODULE

loaded = []
skipped = []
failed = []
for dll_path in dll_files:
dll_name = os.path.basename(dll_path)
if get_module_handle_w(dll_name):
skipped.append(dll_name)
continue
try:
ctypes.CDLL(dll_path)
loaded.append(dll_name)
except OSError as e:
failed.append(dll_name)
logger.warning(f"Failed to load {dll_name} from site-packages: {e}")

if skipped:
logger.debug(f"Already loaded (skipped): {skipped}")
if loaded:
logger.info(
f"Loaded {len(loaded)} extra cuDNN DLLs that ort.preload_dlls() missed: {loaded}"
)
if failed:
logger.warning(f"Failed to load {len(failed)} cuDNN DLLs: {failed}")

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preload failure can still return success (True).

If ort.preload_dlls() raises, the exception is only logged and the flow can still hit the success branch and return True. Also, extra-DLL load failures are not propagated to the caller, so _check_for_libcudnn() may report success while cuDNN is still partially unavailable.

Proposed fix
-def _load_extra_cudnn_dlls():
+def _load_extra_cudnn_dlls() -> list[str]:
@@
-        return
+        return []
@@
-        return
+        return []
@@
     if failed:
         logger.warning(f"Failed to load {len(failed)} cuDNN DLLs: {failed}")
+    return failed
@@
-            try:
+            preload_exception: Exception | None = None
+            try:
                 with redirect_stdout(captured), redirect_stderr(captured):
                     ort.preload_dlls()
             except Exception as e:
+                preload_exception = e
                 logger.warning(f"onnxruntime.preload_dlls() raised an exception: {e}")
@@
-            if f"Failed to load {core_cudnn_dll}" in preload_output:
+            preload_failed = preload_exception is not None or (
+                f"Failed to load {core_cudnn_dll}" in preload_output
+            )
+            extra_failed: list[str] = []
+            if not preload_failed and platform.system() == "Windows":
+                extra_failed = _load_extra_cudnn_dlls()
+                preload_failed = len(extra_failed) > 0
+
+            if preload_failed:
                 logger.error(
                     f"onnxruntime.preload_dlls() was called but {core_cudnn_dll} failed to load. "
                     "cuDNN DLLs were NOT successfully loaded from site-packages."
                 )
             else:
-                if platform.system() == "Windows":
-                    _load_extra_cudnn_dlls()
                 logger.info(
                     "onnxruntime.preload_dlls() succeeded — CUDA/cuDNN DLLs loaded"
                     " from site-packages. Verify version compatibility at"
                     " https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements."
                 )
                 return True

Also applies to: 229-252

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/ort_utils.py` around lines 147 - 201, The current
logic in _load_extra_cudnn_dlls (and the caller _check_for_libcudnn) swallows
exceptions from ort.preload_dlls() and treats partial extra-DLL load failures as
success; change the flow so that if ort.preload_dlls() raises we log and
immediately return False (or re-raise), and after attempting to load extra cudnn
DLLs, if any DLLs failed to load (the failed list is non-empty) return False (or
raise an appropriate error) instead of allowing a success return; update callers
of _load_extra_cudnn_dlls/_check_for_libcudnn to respect the boolean failure and
propagate failure to the caller so that partial/failed cuDNN loads are not
reported as available.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai This seems like a non issue, since there is no real way to log failure through ort_preload_dlls. So we just check if the function loads cudnn64 main dll and if it is not loaded then only log failure. The extra dll will try to load all dlls in venv nvidia folder . There should be no error logging there.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hthadicherla, that's a fair clarification — the best-effort philosophy for extra DLL loading makes sense, and I agree that failures in _load_extra_cudnn_dlls() should remain warnings rather than propagating as hard errors.

One narrow concern still stands though: if ort.preload_dlls() raises an exception, captured is left empty, so the "Failed to load cudnn64_9" check silently passes (empty string never contains the pattern), _load_extra_cudnn_dlls() is called, and the function returns True — even though the main preload attempt actually failed with an exception. The exception is already being caught and logged as a warning, so this isn't a silent failure, but the function still surfaces as "success" to its caller.

A minimal guard for just that case (without changing the best-effort DLL logic at all):

preload_exception = None
try:
    with redirect_stdout(captured), redirect_stderr(captured):
        ort.preload_dlls()
except Exception as e:
    preload_exception = e
    logger.warning(f"onnxruntime.preload_dlls() raised an exception: {e}")

preload_output = captured.getvalue()
if preload_output:
    logger.warning(f"preload_dlls() output:\n{preload_output}")

core_cudnn_dll = "cudnn64_9" if platform.system() == "Windows" else "libcudnn_adv"
if preload_exception is not None or f"Failed to load {core_cudnn_dll}" in preload_output:
    logger.error(...)
else:
    if platform.system() == "Windows":
        _load_extra_cudnn_dlls()  # best-effort, failures stay as warnings
    logger.info("onnxruntime.preload_dlls() succeeded ...")
    return True

That keeps _load_extra_cudnn_dlls() fully best-effort while correctly treating a raised exception as a load failure. Happy to drop this entirely if you consider the exception path acceptable in practice.


🧠 Learnings used
Learnt from: kevalmorabia97
Repo: NVIDIA/Model-Optimizer PR: 974
File: modelopt/torch/puzzletron/pruning/pruning_utils.py:195-201
Timestamp: 2026-03-06T14:26:21.439Z
Learning: In NVIDIA/Model-Optimizer, for PyTorch >= 2.6, torch.load() calls without an explicit weights_only argument are safe. Do not flag bare torch.load(...) as a security issue in files under the modelopt package (e.g., modelopt/torch/puzzletron/pruning/pruning_utils.py) as long as the PyTorch version constraint is maintained. If supporting PyTorch < 2.6, require an explicit weights_only argument to torch.load() to avoid potential issues.

@hthadicherla hthadicherla requested a review from ajrasane April 29, 2026 11:45
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

❌ Patch coverage is 15.09434% with 45 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.34%. Comparing base (077e29a) to head (0fa6624).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/onnx/quantization/ort_utils.py 15.09% 45 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1369       +/-   ##
===========================================
- Coverage   76.48%   66.34%   -10.15%     
===========================================
  Files         471      471               
  Lines       50487    51002      +515     
===========================================
- Hits        38617    33837     -4780     
- Misses      11870    17165     +5295     
Flag Coverage Δ
examples 41.55% <7.54%> (+10.80%) ⬆️
gpu 26.79% <0.00%> (-33.36%) ⬇️
unit 52.73% <15.09%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kevalmorabia97 kevalmorabia97 added the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Apr 29, 2026
Comment thread modelopt/onnx/quantization/ort_utils.py Outdated
"""Locate the nvidia cudnn bin directory inside site-packages."""
from importlib.metadata import PackageNotFoundError, distribution

for pkg_name in ("nvidia-cudnn-cu12", "nvidia-cudnn-cu11", "nvidia-cudnn-cu13"):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dont support Cuda11. Lets keep 12 and 13 only

Suggested change
for pkg_name in ("nvidia-cudnn-cu12", "nvidia-cudnn-cu11", "nvidia-cudnn-cu13"):
for pkg_name in ("nvidia-cudnn-cu12", "nvidia-cudnn-cu13"):

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread modelopt/onnx/quantization/ort_utils.py Outdated

def _find_cudnn_bin_dir():
"""Locate the nvidia cudnn bin directory inside site-packages."""
from importlib.metadata import PackageNotFoundError, distribution
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls move this import to the top of the file

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Made commit for this

Signed-off-by: Hrishith Thadicherla <hthadicherla@nvidia.com>
@hthadicherla hthadicherla requested a review from gcunhase April 30, 2026 11:04
Copy link
Copy Markdown
Contributor

@ajrasane ajrasane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left two minor comments. PTAL.

Comment thread modelopt/onnx/quantization/ort_utils.py Outdated

dll_files = sorted(glob.glob(os.path.join(cudnn_bin_dir, "cudnn*.dll")))
if not dll_files:
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could you add a logger.debug statement that no cudnn dll files were found?

Signed-off-by: Hrishith Thadicherla <hthadicherla@nvidia.com>
@hthadicherla
Copy link
Copy Markdown
Contributor Author

Hey @kevalmorabia97 , as you can see this test https://github.com/NVIDIA/Model-Optimizer/actions/runs/25175182312/job/73807528625?pr=1369 is failing [gpu-tests (gpu, 60, pytorch:26.03-py3] is cancelling out at 1hr mark regardless of pass or fail.

Can you increase the timeout for this test in CI/CD ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants