Skip to content

feat(dl): Windows support via ggml-registry CPU routing#2

Merged
fiorelorenzo merged 3 commits into
mainfrom
feat/windows-dl
Jun 9, 2026
Merged

feat(dl): Windows support via ggml-registry CPU routing#2
fiorelorenzo merged 3 commits into
mainfrom
feat/windows-dl

Conversation

@fiorelorenzo

Copy link
Copy Markdown
Owner

Problem

parakeet.cpp's dynamic-backends (GGML_BACKEND_DL) build worked on macOS + Linux but was deferred on Windows. Root cause: src/backend.cpp and src/model_loader.cpp call ggml CPU-backend functions directly by symbol (ggml_backend_cpu_init, ggml_backend_is_cpu, ggml_backend_cpu_set_n_threads, ggml_backend_cpu_buffer_from_ptr). Under DL the CPU backend is a separate loadable module, so those symbols are not in the linked core. macOS/Linux papered over it (-undefined dynamic_lookup / --allow-shlib-undefined + an RTLD_GLOBAL ggml patch). MSVC requires all symbols resolved at DLL link time -> LNK2019 -> Windows DL could not build.

Fix

Extend the vendored patch (parakeet-cpp-sys/patches/parakeet/0001-backend-dl.patch) so parakeet.cpp no longer references the CPU-module symbols directly. CPU access is routed through the ggml device registry (which lives in ggml-base, always linked) — the portable DL pattern llama.cpp uses:

Direct CPU symbol Registry-based replacement
ggml_backend_cpu_init() ggml_backend_dev_init(ggml_backend_dev_by_type(GGML_BACKEND_DEVICE_TYPE_CPU), NULL)
ggml_backend_is_cpu(b) ggml_backend_dev_type(ggml_backend_get_device(b)) == GGML_BACKEND_DEVICE_TYPE_CPU
ggml_backend_cpu_set_n_threads(b, n) ggml_backend_reg_get_proc_address(reg, "ggml_backend_set_n_threads") cast to ggml_backend_set_n_threads_t, called if non-null
ggml_backend_cpu_buffer_from_ptr(p, sz) ggml_backend_dev_buffer_from_host_ptr(dev, p, sz, SIZE_MAX) (CPU device maps this to the same zero-copy from_ptr)

Unified static + DL (no #ifdef). The registry is populated in both modes: static via GGML_USE_CPU (CPU device registered at startup), DL via the existing ggml_backend_load_all in global_backend(), which runs before any Backend is constructed. So one path covers both. Drops the now-unused ggml-cpu.h include.

CI

Added the windows-latest / dl / --features dynamic-backends,vulkan matrix leg and removed the "Windows DL deferred" note.

Validation (macOS, local)

  • cargo build -p parakeet-cpp (static) — pass
  • cargo build -p parakeet-cpp --features dynamic-backends (DL) — pass
  • cargo test -p parakeet-cpp --features dynamic-backends --test dl_metal with a real model — pass (resolved backend (DL): MTL0, real transcribe succeeds)
  • Static test suite (integration + prefix + ABI) — pass

The submodule pointer is unchanged; only the patch file + CI workflow are committed. Windows (and Linux) DL are validated by CI in this PR.

Under GGML_BACKEND_DL the CPU backend is a dlopen'd module, so the
parakeet/ggml core can no longer reference ggml_backend_cpu_init /
_is_cpu / _set_n_threads / cpu_buffer_from_ptr directly. macOS/Linux
papered over this with -undefined dynamic_lookup / --allow-shlib-undefined
+ an RTLD_GLOBAL ggml patch, but MSVC has no equivalent and rejects the
unresolved symbols at DLL link time (LNK2019), so Windows DL was deferred.

Extend the vendored parakeet patch to obtain CPU access via the ggml
device registry (in ggml-base, always linked) instead:

- ggml_backend_cpu_init()        -> ggml_backend_dev_init(
                                      ggml_backend_dev_by_type(CPU), NULL)
- ggml_backend_is_cpu(b)         -> ggml_backend_dev_type(
                                      ggml_backend_get_device(b)) == CPU
- ggml_backend_cpu_set_n_threads -> ggml_backend_reg_get_proc_address(
                                      reg, "ggml_backend_set_n_threads")
                                      (the llama.cpp DL pattern)
- ggml_backend_cpu_buffer_from_ptr -> ggml_backend_dev_buffer_from_host_ptr
                                      (maps to the same zero-copy CPU buffer)

Single path for both static and DL: the registry is populated in both
modes (static via GGML_USE_CPU; DL via the existing ggml_backend_load_all
in global_backend(), which runs before any Backend is constructed), so no
#ifdef GGML_BACKEND_DL is needed. Drops the now-unused ggml-cpu.h include.

Add the windows-latest DL leg to CI and remove the "deferred" note.

Validated on macOS: static build, DL build, and the DL Metal test
(dl_backend_is_metal -> MTL0, real transcribe) all pass.
parakeet.cpp has no install() rule, so the parakeet library only exists in
the cmake build tree, never the install prefix. Under DL on Windows the
SHARED parakeet target produces parakeet.dll + an MSVC import parakeet.lib;
ggml redirects DLLs to <build>/bin (CMAKE_RUNTIME_OUTPUT_DIRECTORY, a
dir-scoped var that doesn't reach the parent parakeet scope) and the import
.lib lands in a generator-dependent spot the fixed lib_dirs list didn't
cover -> the consumer test exe link failed with LNK1181: cannot open input
file 'parakeet.lib'. (The lib `cargo build` passed because an rlib doesn't
link; only `cargo test`, which builds executables, surfaced it.)

Walk the whole build tree and add every dir holding a linkable artifact
(.lib/.a/.dll/.so/.dylib) to the link-search path, so parakeet.lib is found
regardless of generator/platform layout. The install `lib/` dir stays first
in search order, so the linked ggml/ggml-base resolve to the install copies;
the extra dirs only add the otherwise-uninstalled parakeet library.

macOS static + DL builds and the DL Metal test still pass.
The parakeet C API (parakeet_capi_*) is plain `extern "C"` with no
__declspec(dllexport) or GGML_API-style export macro. On Windows a SHARED
DLL with no exported symbols produces NO import library, so the consumer
link failed with `LNK1181: cannot open input file 'parakeet.lib'` — the
file was never created (the previous build-tree search couldn't find what
doesn't exist).

Set WINDOWS_EXPORT_ALL_SYMBOLS on the SHARED parakeet target so CMake
auto-generates a .def exporting every public symbol and MSVC emits the
import lib. No-op on ELF/Mach-O (default-export), so macOS static + DL
builds and the DL Metal test are unaffected (verified locally).

Pairs with the build-tree link-search walk: this creates parakeet.lib,
that finds it.
@fiorelorenzo fiorelorenzo merged commit 53e2981 into main Jun 9, 2026
6 checks passed
@fiorelorenzo fiorelorenzo deleted the feat/windows-dl branch June 9, 2026 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant