Skip to content

Move TRT-RTX runtime controls to runtime context managers (v3, for review)#3

Draft
tp5uiuc wants to merge 6 commits into
feat/trtrtx-cpp-runtime-v2from
feat/trtrtx-runtime-ctx-managers
Draft

Move TRT-RTX runtime controls to runtime context managers (v3, for review)#3
tp5uiuc wants to merge 6 commits into
feat/trtrtx-cpp-runtime-v2from
feat/trtrtx-runtime-ctx-managers

Conversation

@tp5uiuc
Copy link
Copy Markdown
Owner

@tp5uiuc tp5uiuc commented Jun 3, 2026

Rewrites the v2 design (PR #2 base branch) to move cuda_graph_strategy, dynamic_shapes_kernel_specialization_strategy, runtime_cache from CompilationSettings / serialized engine slots to runtime context managers per pytorch#4310.

Summary

  • New RuntimeSettings dataclass on both Python and C++ sides; RuntimeCacheHandle registered as a torchbind class for shared-cache semantics.
  • Three new CMs in torch_tensorrt.runtime: runtime_config (pool API), runtime_cache (shared cache), plus per-knob sugars. All accept a list of modules.
  • New runtime_settings= kwarg on compile() / cross_compile_for_windows() / convert_module() for compile-time hints (1 context-create cost, no enter/exit recreate).
  • Per-engine update_runtime_settings(rs) with fast-path equality check; rebuilds IRuntimeConfig + recreates execution context on diff.
  • SerializedInfoIndex drops 4 RTX slots; SERIALIZATION_LEN back to 12.

Tests

  • New test_004_runtime_settings.py (12 tests) covering data model, compile-time hint, CM restore, multi-target, dispatch.
  • test_000_runtime_cache.py, test_001_dynamic_shapes_kernel_strategy.py, test_001_cuda_graph_strategy.py migrated to the new API.

Status

  • Pre-commit clean (SKIP=mypy for the pre-existing _TRTEngine.py errors tracked separately).
  • RTX wheel build succeeds; test_004 all 12 pass; Python-runtime half of the three other test files passes.
  • C++-engine path crashes inside libtensorrt_rtx.so.1 at cuda_engine->getStreamableWeightsSize() -- I confirmed this is a pre-existing environmental issue on the test node (the same crash occurs with a known-good pre-built v2 wheel installed in the same env), not a regression from this refactor.

…anagers

Replaces the v2 design that packed three runtime-mode controls
(``cuda_graph_strategy``, ``dynamic_shapes_kernel_specialization_strategy``,
``runtime_cache``) into ``CompilationSettings`` and the serialized engine
tuple. Per pytorch#4310, these are runtime mode controls -- not
engine properties -- and shouldn't pin at compile time or round-trip
through serialization.

Highlights:

* New ``RuntimeSettings`` dataclass on both Python and C++ sides
  (``py/torch_tensorrt/runtime/_runtime_settings.py``,
  ``core/runtime/RuntimeSettings.h``). Three fields:
  ``dynamic_shapes_kernel_specialization_strategy``,
  ``cuda_graph_strategy``, ``runtime_cache``. The cache field accepts
  ``None``, a path string (engine creates an implicit handle, saves on
  ``__del__``, mirrors old ``runtime_cache_path=`` behavior), or a
  ``RuntimeCacheHandle`` (shared cache, lifecycle owned by the
  ``runtime_cache()`` CM).
* New ``RuntimeCacheHandle`` registered as a torchbind class
  (``torch.classes.tensorrt.RuntimeCacheHandle``) so the same C++
  ``IRuntimeCache`` shared_ptr crosses the Python/C++ boundary.
* New per-engine ``update_runtime_settings`` API on both ``TRTEngine``
  flavors. Fast-paths on settings equality; eagerly rebuilds
  ``IRuntimeConfig`` + recreates execution context on diff.
* Three new context managers in ``torch_tensorrt.runtime``:
  ``runtime_config(target_or_targets, **kw)`` (the pool API; also
  yields the target so ``with runtime_config(model, ...) as m:``
  works), ``runtime_cache(target, path)`` (shared cache CM), and the
  per-knob sugars ``set_cuda_graph_strategy`` /
  ``set_dynamic_shapes_kernel_strategy``. All three accept a list of
  modules for multi-target use; the cache CM yields the
  ``RuntimeCacheHandle`` for inspection or explicit ``save()``.
* New ``runtime_settings=`` kwarg on ``compile()``,
  ``cross_compile_for_windows()``, and ``convert_module()`` so callers
  can prime the engine with the right values up front. Compile-time
  hint avoids the enter/exit recreate cost.
* ``CompilationSettings`` loses the three fields; the compiler entry
  points drop the three kwargs. ``SerializedInfoIndex`` drops the four
  RTX-related slots; ``SERIALIZATION_LEN`` returns to 12. Engines
  saved with the old 16-slot layout will raise the existing
  layout-mismatch error on load.
* Three existing test files migrated to the new API; new
  ``tests/py/dynamo/runtime/test_004_runtime_settings.py`` covers the
  data model, compile-time hint, runtime CM restore semantics,
  multi-target form, and dispatch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Two follow-up bugs exposed by the cross-runtime test parameterization on
the C++ engine path:

1. ``torch.classes.tensorrt.Engine.update_runtime_settings(...)`` rejected
   Python ``None`` for the ``RuntimeCacheHandle`` argument because TorchBind
   does not auto-convert ``None`` to a null ``c10::intrusive_ptr``. Switch
   the signature to ``c10::optional<c10::intrusive_ptr<RuntimeCacheHandle>>``
   so the default ``runtime_cache=None`` case round-trips cleanly.

2. ``RuntimeSettings(runtime_cache="/some/path")`` only auto-saved to disk
   on engine destruction for the Python runtime (via ``_TRTEngine.__del__``).
   The C++ engine had no equivalent saver and the IRuntimeCache it
   materialized internally wasn't accessible from Python.

   Make the cpp path symmetric:
   - Expose ``serialize() -> at::Tensor`` / ``deserialize(at::Tensor)`` /
     ``has_cache()`` on the torchbind ``RuntimeCacheHandle`` class. ``at::Tensor``
     of uint8 is used instead of ``std::string`` because TorchBind forces
     ``std::string`` through Python ``str`` (UTF-8) and serialized cache bytes
     are not valid UTF-8.
   - In ``TorchTensorRTModule.setup_engine`` (cpp branch), pre-materialize a
     torchbind handle when ``runtime_cache`` is a path string, store it on
     the module, and substitute it into ``_runtime_settings`` so the dispatch
     passes the same handle through.
   - Add ``_load_cpp_implicit_cache`` / ``_save_cpp_implicit_cache`` and a
     module ``__del__`` that mirrors the Python ``_TRTEngine`` saver, with
     ``filelock`` + atomic-rename semantics.
   - Teach ``_to_torchbind_handle`` to pass an already-torchbind
     ``torch.ScriptObject`` through unchanged.

All cpp + python runtime tests pass on TRT-RTX 1.5: test_004 (12/12),
test_000 (10/10), test_001 dynamic_shapes (14/14), test_001 cuda_graph
(13/13).
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Comment thread py/torch_tensorrt/runtime/_cuda_graph_strategy.py Outdated
Comment thread py/torch_tensorrt/runtime/_dynamic_shapes_kernel_strategy.py Outdated
Comment thread py/torch_tensorrt/runtime/_runtime_settings.py Outdated
runtime_cache: Optional[Union[str, "RuntimeCacheHandle"]] = None # noqa: F821

def __post_init__(self) -> None:
if (
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _runtime_settings is only expected to be used in TRT-RTX. For regular TRT, let's issue a warning here saying this has no effect.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 4bfa982. RuntimeSettings.__post_init__ now emits a one-shot UserWarning on non-RTX builds (gated by a module-level _NON_RTX_WARNING_EMITTED so it does not spam):

global _NON_RTX_WARNING_EMITTED
if not ENABLED_FEATURES.tensorrt_rtx and not _NON_RTX_WARNING_EMITTED:
    warnings.warn(
        "RuntimeSettings is only honored on TRT-RTX builds; "
        "values will be ignored on standard TRT.",
        UserWarning,
        stacklevel=2,
    )
    _NON_RTX_WARNING_EMITTED = True

Comment thread py/torch_tensorrt/dynamo/conversion/_conversion.py Outdated
…timeCacheHandle lifecycle

Structural cleanup on top of the v3 work (no observable behavior change).

C++ side
--------
``RuntimeSettings`` migrates from a ``TRTEngine`` member to a
``TRTRuntimeConfig`` member -- the value-type now lives with its primary
consumer (the IRuntimeConfig builder). ``TRTRuntimeConfig`` gains
``set_settings()`` (the diff-and-invalidate primitive) and turns the
static ``uses_internal_capture`` / ``is_monolithic_capturable`` helpers
into instance methods so callers do not need to pass settings around.
``TRTEngine::runtime_settings()`` forwards through.

Python side
-----------
Introduces a Python ``TRTRuntimeConfig`` class mirroring the C++ struct.
``_TRTEngine`` drops its three legacy fields (``runtime_config``,
``runtime_settings``, ``_implicit_cache_handle``) for a single
``self._trt_runtime_config`` member; ``_create_execution_context`` /
``update_runtime_settings`` / ``_is_monolithic_capturable`` /
``_enable_rtx_native_cudagraphs`` all delegate. Every
``ENABLED_FEATURES.tensorrt_rtx`` branch related to runtime-mode controls
is absorbed into the shim, so engine and module call sites stay uniform
across TRT and TRT-RTX builds.

Following the project's grouping convention,
``py/torch_tensorrt/runtime/_runtime_settings.py`` is merged into
``_runtime_config.py``; that file now holds ``RuntimeSettings``, the new
``TRTRuntimeConfig``, the existing ``runtime_config()`` CM, and its
factory. Imports across the tree are repointed.

RuntimeCacheHandle ownership model
----------------------------------
Save-on-destruction moves from the two engine-side ``__del__`` paths
(``_TRTEngine.close()`` for Python runtime, ``TorchTensorRTModule.__del__``
for cpp runtime) onto ``RuntimeCacheHandle.__del__`` itself, gated by a
new ``autosave_on_del`` flag. The flag is set by ownership context:

* Engine-implicit handles (created from a path-string compile-time hint)
  get ``autosave_on_del=True`` -- no other Python object holds them, so
  the destructor is the only save opportunity.
* The ``runtime_cache(target, path)`` CM uses ``autosave_on_del=False``
  on the handle it constructs; its ``__exit__`` saves explicitly.
* Hand-built handles default to ``autosave_on_del=False`` so save timing
  stays under the user's control.

The handle additionally accepts a ``torchbind_handle`` sibling so the
same Python object can wrap either a ``trt.IRuntimeCache`` (Python rt)
or a ``torch.classes.tensorrt.RuntimeCacheHandle`` (cpp rt); ``save`` /
``load`` source bytes from whichever is populated. The cpp-runtime
helpers on ``TorchTensorRTModule`` (``_load_cpp_implicit_cache``,
``_save_cpp_implicit_cache``, ``__del__``) and the duplicate save logic
in ``_TRTEngine.close()`` are removed; both runtimes funnel through the
single ``RuntimeCacheHandle.__del__`` path.

Tests
-----
test_000 grows two new tests asserting the new contract:
* ``test_cm_does_not_double_save_on_rc_gc`` -- only one save fires per
  CM block even after ``rc`` is GC'd.
* ``test_user_built_handle_no_autosave_by_default`` -- hand-built
  handles do not autosave on GC.

All 51 runtime tests pass on the refactored design (test_004 12/12,
test_000 12/12, test_001 ds 14/14, test_001 cg 13/13).
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

@tp5uiuc tp5uiuc marked this pull request as draft June 4, 2026 02:22
Five follow-up changes responding to PR review comments:

* **Fold strategy sugar into ``_runtime_config.py``.** Delete
  ``_dynamic_shapes_kernel_strategy.py`` and ``_cuda_graph_strategy.py``;
  ``set_dynamic_shapes_kernel_strategy`` / ``set_cuda_graph_strategy``
  now live alongside the ``runtime_config`` CM they delegate to.
  ``torch_tensorrt/runtime/__init__.py`` re-exports them from the
  consolidated module.

* **Hoist ``RuntimeSettings`` defaults into ``_defaults.py``.** Three
  new constants (``DYNAMIC_SHAPES_KERNEL_SPECIALIZATION_STRATEGY``,
  ``CUDA_GRAPH_STRATEGY``, ``RUNTIME_CACHE_PATH``) mirror the
  compilation-settings pattern. ``RUNTIME_CACHE_PATH`` defaults to a
  per-user temp file similar to ``ENGINE_CACHE_DIR``, so users get a
  disk-backed runtime cache without explicit opt-in; override via
  ``RuntimeSettings(runtime_cache="/path")`` or the ``runtime_cache``
  CM. Test_000 and test_004 updated to reflect the new default.

* **Warn on non-RTX ``RuntimeSettings`` construction.** ``__post_init__``
  now emits a one-shot ``UserWarning`` on regular TRT builds (gated by
  ``ENABLED_FEATURES.tensorrt_rtx``) so users see that the settings have
  no effect.

* **Drop ``TYPE_CHECKING`` string forward-refs for ``RuntimeSettings``.**
  Direct top-level imports across ``_compiler.py``, ``_conversion.py``,
  ``_TRTEngine.py`` and ``_TorchTensorRTModule.py``; bare
  ``Optional[RuntimeSettings]`` annotations everywhere. Deferred imports
  inside ``__init__`` / ``__setstate__`` removed.

All 51 runtime tests pass (test_004 12/12, test_000 12/12,
test_001 ds 14/14, test_001 cg 13/13).
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Comment thread core/runtime/register_jit_hooks.cpp Outdated
Comment on lines +27 to +45
// Expose the underlying IRuntimeCache bytes to Python so the Python-
// side save/load logic can persist them under filelock. Returns an
// empty uint8 tensor if the cache hasn't been materialized yet.
//
// We return ``at::Tensor`` rather than ``std::string`` because TorchBind
// forces ``std::string`` to round-trip through Python ``str`` (UTF-8)
// and serialized cache bytes are not valid UTF-8.
.def(
"serialize",
[](const c10::intrusive_ptr<RuntimeCacheHandle>& self) -> at::Tensor {
#ifdef TRT_MAJOR_RTX
auto opts = at::TensorOptions().dtype(at::kByte);
if (!self->cache) {
return at::empty({0}, opts);
}
auto host_mem = make_trt(self->cache->serialize());
if (!host_mem) {
return at::empty({0}, opts);
}
Copy link
Copy Markdown
Owner Author

@tp5uiuc tp5uiuc Jun 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods "serialize", "deserialize", "has_cache" should be implemented elsewhere (in its cpp file). If the concern is introducing at:: symbols we can work around it. This file should only be used for registration.

Another smell is using #ifdef here, these should be hidden in the implementation.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 111cdb2eb. serialize / deserialize / has_cache are now member functions of RuntimeCacheHandle declared in RuntimeSettings.h and implemented in RuntimeSettings.cpp. All #ifdef TRT_MAJOR_RTX lives inside those method bodies; the registration in register_jit_hooks.cpp is preprocessor-free for this class:

torch::class_<RuntimeCacheHandle>("tensorrt", "RuntimeCacheHandle")
    .def(torch::init<std::string>())
    .def_readwrite("path", &RuntimeCacheHandle::path)
    .def("serialize", &RuntimeCacheHandle::serialize)
    .def("deserialize", &RuntimeCacheHandle::deserialize)
    .def("has_cache", &RuntimeCacheHandle::has_cache);

[](const c10::intrusive_ptr<TRTEngine>& self,
std::string const& dynamic_shapes_kernel_specialization_strategy,
std::string const& cuda_graph_strategy,
c10::optional<c10::intrusive_ptr<RuntimeCacheHandle>> runtime_cache) -> void {
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not possible to implement this as a property with getter and setter because of this c10::optional<c10::intrusive_ptr<RuntimeCacheHandle>> runtime_cache?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible — the c10::optional<c10::intrusive_ptr<RuntimeCacheHandle>> signature is fine for torchbind def_property (device_memory_budget immediately below in this same registration is a property on TRTEngine, for a comparable point).

The reason update_runtime_settings is a single bundled setter is that RuntimeSettings is the unit of context invalidation: changing any one of the three fields ends up calling recreate_execution_context once. Splitting into three individual properties would cause three sequential context-recreates on the engine-setup path (where all three are set together via _dispatch_runtime_settings_to_engine). The diff-check inside TRTRuntimeConfig::set_settings would catch no-op repeats, but consecutive changing writes would each trigger a recreate.

If you would rather have property syntax I can split it, but the bundled form keeps setup tight. WDYT?

Mirror ``TRTRuntimeConfig.set_settings`` (Python runtime) on the cpp
runtime path. Previously the cpp side dropped the C++ engine's
intrusive_ptr on settings change but left ``self._implicit_cache_handle``
on the ``TorchTensorRTModule`` pointing at the *old* wrapper -- the new
cache had no Python autosave companion and never wrote to disk.

Factor the path-string-to-torchbind-handle materialization into
``TorchTensorRTModule._materialize_cpp_implicit_handle``. Called from
``setup_engine`` and ``_dispatch_runtime_settings_to_engine`` (cpp
branch); synchronously saves the prior wrapper before swap, replaces
``self._implicit_cache_handle`` with the new one, then runs ``load()``
after the C++ engine has attached the IRuntimeCache.

Test: ``test_set_runtime_settings_saves_prior_cache_on_swap`` (parametrized
over both runtimes). Compiles with path A; swaps to path B; asserts A is
written synchronously at swap time and B is written on ``del compiled``.
The walk-to-inner-module is wrapped in a helper so the loop variable
doesn't outlive the call and keep the inner TRT module alive past
``del compiled`` (which would suppress the post-del autosave).

All 53 tests pass (test_004 12/12, test_000 14/14, test_001 ds 14/14,
test_001 cg 13/13).
Comment thread core/runtime/RuntimeSettings.cpp Outdated
Comment on lines +12 to +13
return dynamic_shapes_kernel_specialization_strategy == other.dynamic_shapes_kernel_specialization_strategy &&
cuda_graph_strategy == other.cuda_graph_strategy && runtime_cache.get() == other.runtime_cache.get();
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use std::tie(a,b,c) == std::tie(other.a, other.b, other.c)?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 111cdb2eb. Note: runtime_cache.get() returns a temporary T* so the raw pointers are hoisted into locals first to satisfy std::ties lvalue requirement.

auto* this_cache = runtime_cache.get();
auto* other_cache = other.runtime_cache.get();
return std::tie(dynamic_shapes_kernel_specialization_strategy, cuda_graph_strategy, this_cache) ==
       std::tie(other.dynamic_shapes_kernel_specialization_strategy, other.cuda_graph_strategy, other_cache);

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Comment thread core/runtime/RuntimeSettings.cpp Outdated
Comment on lines +17 to +24
RuntimeSettings result = *this;
result.dynamic_shapes_kernel_specialization_strategy = override.dynamic_shapes_kernel_specialization_strategy;
result.cuda_graph_strategy = override.cuda_graph_strategy;
if (override.runtime_cache) {
result.runtime_cache = override.runtime_cache;
}
return result;
}
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this at all. An easier way of doing this is to define a swap method and implement it that way (similar to copy-and-swap idiom) for example

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped merge() entirely in 111cdb2eb (see also your follow-up comment below making the same point about member-wise assignment). Direct field assignment is the idiom now — no merge() helper.

Comment thread core/runtime/register_jit_hooks.cpp Outdated
Comment on lines +25 to +26
.def("path", &RuntimeCacheHandle::path)
.def("set_path", &RuntimeCacheHandle::set_path)
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path can be a property.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 111cdb2eb. path is now a public field on the (now struct) RuntimeCacheHandle and registered as def_readwrite("path", &RuntimeCacheHandle::path). The old path() getter and set_path() setter methods are gone.

Comment thread core/runtime/RuntimeSettings.h Outdated
Comment on lines +40 to +41
private:
std::string path_;
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does making path private get us at the end of day? its not like if path changes the runtime cache immediately changes. better to make it public and replace the entire class RuntimeCacheHandle with a struct.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 111cdb2eb — RuntimeCacheHandle is now a struct with a public path field. The cache shared_ptr is also a public field. Agreed there were no encapsulation invariants worth preserving.

Comment thread core/runtime/RuntimeSettings.h Outdated
// For non-default detection on the strategy strings we always overlay; the cache
// pointer is overlaid iff `override.runtime_cache` is non-null.
RuntimeSettings merge(RuntimeSettings const& override) const;

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the purpose of this again? Can't we do

auto another_runtime_settings = old.runtime_settings;
another_runtime_settings.dynamic_shapes_kernel_specialization_strategy = "something_else";
old.runtime_settings = another_runtime_settings; // or std::swap(old.runtime_settings, another_runtime_settings);

instead? Why add anothr function if we are going to member-wise swap anyway?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped merge() in 111cdb2. Direct field assignment / std::swap on the struct is exactly the right pattern — no helper required. (For context: Python RuntimeSettings is @dataclass(frozen=True), so on the Python side RuntimeSettings.merge(**kwargs) exists as the dataclass-replace idiom; that is unrelated and stays.)

const std::string& serialized_metadata,
const ResourceAllocationStrategy resource_allocation_strategy,
TRTRuntimeConfig runtime_cfg)
RuntimeSettings runtime_settings)
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this constructor be ever called from the Python side so that Runtimesettings get passed in and context gets created only once?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today: no. This is the "first" constructor overload (the public one that accepts a RuntimeSettings directly) — but the only torchbind init is the vector<string> one at L87 of register_jit_hooks.cpp, which goes through the deserialization constructor at L114 of TRTEngine.cpp, passing RuntimeSettings{} (defaults). The Python side then calls engine.update_runtime_settings(...) right after construction, which recreates the context once. So construction → recreate-1 (with defaults) → settings dispatch → recreate-2 (with the real settings).

To get it down to one recreate, I would add a second torch::init<...> overload that accepts the strategy strings + the cache handle alongside serialized_info, route through this RuntimeSettings-aware constructor, and skip the post-construction dispatch in Python. Doable in this PR if you want — but the savings are small (one createExecutionContext per engine) and the dispatch path stays as-is for live runtime_config CM changes anyway.

Comment on lines +51 to +52
std::string dynamic_shapes_kernel_specialization_strategy = "lazy";
std::string cuda_graph_strategy = "disabled";
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to store strings? Can't we store ints here since they will be converted to enumerations anyway? This way we just static cast them to get the enumeration

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tradeoff:

  • Ints: type-safe; static_cast<nvinfer1::DynamicShapesKernelSpecializationStrategy>(settings_.foo) is a no-op; one validation layer.
  • Strings: human-readable to_str() (logged via operator<<) is useful for support reports / debug logs; identical Python-and-C++ value-form for the canonical setting; reasonably grep-able through the codebase.

Today the layout is "strings everywhere, two validation points": Python RuntimeSettings.__post_init__ validates against the string→int map, then TRTRuntimeConfig::ensure_initialized re-validates inside to_trt_ds_strategy / to_trt_cg_strategy. Moving to int internally means: keep the Python-facing API as strings, convert once at the torchbind boundary (update_runtime_settings lambda), store ints, skip the second validation. The cost is that to_str() needs a reverse-lookup table to print "lazy" / "whole_graph_capture".

Happy to push that through if you prefer ints — I held off here because the current form keeps logs human-readable on both sides of the boundary. WDYT?

…Handle

C++-side cleanup spurred by review comments on #3:

- Convert ``RuntimeCacheHandle`` from a class with a private ``path_``
  field + accessor methods (``path()`` / ``set_path()``) to a struct
  with a public ``path`` field. Re-register the torchbind binding via
  ``.def_readwrite("path", &RuntimeCacheHandle::path)``.
- Move the bodies of ``serialize``, ``deserialize``, and ``has_cache``
  out of the JIT-binding registration file
  (``register_jit_hooks.cpp``) and into member functions implemented in
  ``RuntimeSettings.cpp``. The ``#ifdef TRT_MAJOR_RTX`` guards live
  inside those impls; the registration file is preprocessor-free for
  these bindings.
- Use ``std::tie`` in ``RuntimeSettings::operator==`` for cleaner
  field-wise comparison (raw ``intrusive_ptr::get()`` results hoisted
  to lvalues to satisfy ``std::tie``'s reference requirement).
- Drop ``RuntimeSettings::merge``. C++ ``RuntimeSettings`` is now
  value-typed end-to-end; direct field assignment is the idiom. No
  callers used ``merge`` outside its own definition.

No behavior change. Python-side ``RuntimeCacheHandle`` wrapper and the
runtime test suite are unaffected.
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant