Skip to content

feat(mediainfo): add MediaInfoContext for reusable library loading#1

Merged
bakgio merged 2 commits into
mainfrom
feature/media-info-context
Apr 11, 2026
Merged

feat(mediainfo): add MediaInfoContext for reusable library loading#1
bakgio merged 2 commits into
mainfrom
feature/media-info-context

Conversation

@bakgio
Copy link
Copy Markdown
Owner

@bakgio bakgio commented Apr 11, 2026

Why

Every v0.1.0 parse call paid a full dlopen → 10-symbol resolve → MediaInfo_New → version probe → configure → parse → MediaInfo_Deletedlclose cycle. For single-file use this is invisible; for batch workloads (asset scanners, directory walks, import pipelines) the load overhead dominates and is pure waste.

This PR adds a reusable MediaInfoContext that loads the MediaInfo shared library once and reuses it across many parse calls. Existing free functions are transparently routed through a lazily initialized process-wide default context so current users get the performance win without a single line of code change. The feature is purely additive — no public API signature changed and every existing test passes unmodified.

What

[core] src/mediainfo.rs

  • New public pub struct MediaInfoContext (Clone + Send + Sync) with the full 16-method parse surface mirrored from MediaInfo (parse, parse_media_info, parse_path, parse_from_reader, parse_input, their _with_options variants, plus the two parse_to_string / parse_reader_to_string raw-text helpers).
  • Three constructors: new() (default search order), with_library_file(path), with_library_search_dir(dir).
  • Accessors: library_version() -> LibVersion, library_version_string() -> &str, library_file() -> Option<&Path>, can_parse() -> bool.
  • Private internal refactor:
    • New struct LoadedLibrary<'a> carrying a borrowed &'a Arc<MediaInfoLib> + cached version info.
    • load_library split into load_library_full(library_file, library_search_dir) -> Result<(Arc<MediaInfoLib>, String, LibVersion)>.
    • parse_to_string_internal_unlocked, parse_to_string_from_url_unlocked, parse_reader_to_string_internal_unlocked, and parse_url_via_http now take &LoadedLibrary<'_> as their first parameter instead of loading the library themselves.
  • Process-wide default context: static DEFAULT_CONTEXT: OnceLock<RwLock<Option<Arc<MediaInfoContext>>>>, lazy-initialized on first use, resettable.
  • New pub fn MediaInfo::reset_default_context() escape hatch to drop and rebuild the cached context.
  • Each locked wrapper (parse_to_string_internal, parse_to_string_from_url, parse_reader_to_string_internal) now checks whether options.library_file/options.library_search_dir is set:
    • If yes → fresh per-call load_library_full (v0.1.0 path preserved verbatim).
    • If no → borrow the shared library from the default context.
  • ParseOptions::mediainfo_options rustdoc gained a "Thread safety" subsection explaining how custom options interact with the process-wide parse lock.

[errors] src/error.rs

  • New variant MediaInfoError::LibraryMismatch { context: PathBuf, requested: PathBuf } plus a library_mismatch(context, requested) constructor helper.
  • Returned when a context parse call is given a library_file / library_search_dir override that would require a different library than the one the context already loaded. Matching library_file is allowed (tautology); library_search_dir always triggers a mismatch because a re-resolve is conceptually a different load.

[api] src/lib.rs

  • Re-export MediaInfoContext from the crate root alongside the existing types.
  • Crate module docs updated with a MediaInfoContext feature bullet and a pointer to the new batch_parse example.

[tests] 20 new tests, total count 203 → 223

  • tests/context_tests.rs (new, 14 tests): basic reuse, equivalence with the free function path, version cache, shared-across-threads stress (16 workers × 5 parses), library mismatch (file + search-dir), matching-file tautology, reader input, raw JSON output, custom options isolation between calls, default context reuse, reset_default_context recovery, free function with explicit library override, and pre-built MediaInfoInput dispatch.
  • tests/end_to_end_tests.rs: adds test_context_url_parse (URL via shared context using the existing tiny_http mock) and test_thread_safety_context (100-thread stress with a shared Arc<MediaInfoContext>). Original test_thread_safety is untouched.
  • tests/error_unit_tests.rs: 2 new tests covering the LibraryMismatch Display format and variant matching.
  • 2 new doctests on MediaInfoContext and MediaInfoError::library_mismatch.

[bench] benches/parse_overhead.rs (new)

  • Criterion benchmark with three variants:
    1. free_fn/parse_media_info_path — exercises the default-context fast path.
    2. context/parse_media_info_path — explicit MediaInfoContext reuse.
    3. free_fn/parse_media_info_path_uncached — pins library_file in ParseOptions to bypass the default context and reproduce the v0.1.0 fresh-load path, giving a proper A/B baseline without needing a git worktree.
  • criterion = "0.5" added to [dev-dependencies]; new [[bench]] entry in Cargo.toml.

[examples] examples/batch_parse.rs (new)

  • Short, argv-driven example mirroring the style of parse_path.rs. Shows MediaInfoContext::new()? + loop, printing the loaded library version and a per-file track count.

[docs] README.md, CHANGELOG.md

  • README.md: new "Reusable context" bullet in the features list, examples pointer updated to mention batch parsing, install snippet bumped "0.1.0" → "0.2.0".
  • CHANGELOG.md: new 0.2.0 entry with Added / Changed sections.

[ci] .github/workflows/ci.yml

  • New semver job that installs cargo-semver-checks --locked and runs cargo semver-checks -p rsmediainfo. Sets RS_MEDIAINFO_SKIP_DOWNLOAD=1 so it does not fetch MediaInfo.dll during the check.
  • semver added to the release job's needs: list so a semver regression blocks a release.

[version] Cargo.toml, README.md, .github/ISSUE_TEMPLATE/bug_report.yml

  • Crate version bumped 0.1.0 → 0.2.0 in Cargo.toml.
  • Version placeholder in the bug-report issue template bumped to match.

bakgio added 2 commits April 11, 2026 15:28
# Why

- Every v0.1.0 parse call paid a full dlopen, 10-symbol resolve, handle
  creation, version probe, and dlclose cycle. For batch workloads
  (asset scanners, directory walks, import pipelines) the load
  overhead was pure waste and dominated the work.
- A reusable context lets callers amortize the one-time library load
  cost across many parses while keeping every existing public API
  signature and the global parse lock intact.

# What

- Add `MediaInfoContext` public type that loads the MediaInfo shared
  library once and reuses it across all parse entry points (path,
  reader, URL, pre-built `MediaInfoInput`, structured + raw-text
  output). `Clone + Send + Sync`; wrap in `Arc` to share across
  worker pools.
- Constructors: `new`, `with_library_file`, `with_library_search_dir`.
  Accessors: `library_version`, `library_version_string`,
  `library_file`, `can_parse`.
- Refactor the parse pipeline: introduce private `LoadedLibrary<'a>`
  and split `load_library` into `load_library_full` so the
  `*_internal_unlocked` functions accept an already-loaded library
  instead of re-loading on every call.
- Route free `MediaInfo::parse*` functions through a lazily
  initialized process-wide default context
  (`OnceLock<RwLock<Option<Arc<MediaInfoContext>>>>`) whenever no
  `library_file`/`library_search_dir` override is set. Callers that
  pin an explicit path still go through the per-call load path,
  preserving v0.1.0 behavior verbatim.
- Add `MediaInfo::reset_default_context()` as an escape hatch to drop
  the cached context and force a fresh load on the next parse.
- Add `MediaInfoError::LibraryMismatch { context, requested }` +
  `library_mismatch` constructor, returned when a context parse call
  is given a conflicting `library_file`/`library_search_dir`
  override.
- Ship 20 new tests: 14 in `tests/context_tests.rs` covering reuse,
  thread safety, mismatch guard, reader/JSON/custom-options/reset
  paths; 2 in `tests/end_to_end_tests.rs` (URL via context, 100-thread
  shared-context stress); 2 in `tests/error_unit_tests.rs`; 2 new
  doctests. Total test count: 203 -> 223.
- Add `benches/parse_overhead.rs` with criterion benchmarks: free
  function, context, and a forced-uncached variant that pins
  `library_file` to reproduce the v0.1.0 fresh-load path without
  needing a git worktree.
- Add `examples/batch_parse.rs` mirroring the existing example style.
- Wire a new `semver` job (`cargo semver-checks -p rsmediainfo`) into
  the CI pipeline and the release gate so future version bumps are
  gated on API compatibility.
- Bump crate version 0.1.0 -> 0.2.0 in `Cargo.toml`, `README.md`
  install snippet, and the bug-report issue template placeholder.
  Update `CHANGELOG.md` with the 0.2.0 entry.

# Notes

- Additive only. No existing public API signature changed; every
  existing test passes unmodified.
- Observable behavior change: the library now stays mapped into the
  process after the first successful parse instead of being dlclosed
  at the end of every call. This matches other language wrappers and
  is the whole reason the reuse path is faster. The OS reclaims the
  mapping on process exit.
- Default-context error handling: load failures are not cached, so a
  retry after fixing the environment will work without needing
  `reset_default_context()`. This sidesteps the `std::io::Error`
  not-Clone problem entirely.
- Windows benchmark numbers are within noise because the OS loader
  keeps `MediaInfo.dll` mapped across `LoadLibrary`/`FreeLibrary`
  pairs; on Linux, where `dlclose` is eager, the context path is
  expected to dominate the uncached path significantly.
- `cargo-semver-checks` classifies 0.1.0 -> 0.2.0 as a major change
  (0.x semver) and skips all 252 lint checks, which is the expected
  "no further update required" outcome.
Why
- MediaInfo_New / MediaInfo_Delete are not thread-safe on every
  libmediainfo build, so the throwaway probe handle used by
  load_library_full could race against an in-flight parse on another
  thread.

What
- Take the global parse lock inside load_library_full for the
  duration of the version probe so both handle creation and
  destruction happen in the locked region.
- Reorder parse_to_string / parse_to_string_from_url /
  parse_reader_to_string so the library is resolved before the parse
  lock is acquired, preventing the inner probe from re-entering and
  deadlocking the same lock.
- Document the new locking contract on load_library_full.

Notes
- No API changes. Pure concurrency fix.
@bakgio bakgio merged commit 592a29b into main Apr 11, 2026
11 checks passed
@bakgio bakgio self-assigned this Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant