feat(dl): Windows support via ggml-registry CPU routing#2
Merged
Conversation
Under GGML_BACKEND_DL the CPU backend is a dlopen'd module, so the
parakeet/ggml core can no longer reference ggml_backend_cpu_init /
_is_cpu / _set_n_threads / cpu_buffer_from_ptr directly. macOS/Linux
papered over this with -undefined dynamic_lookup / --allow-shlib-undefined
+ an RTLD_GLOBAL ggml patch, but MSVC has no equivalent and rejects the
unresolved symbols at DLL link time (LNK2019), so Windows DL was deferred.
Extend the vendored parakeet patch to obtain CPU access via the ggml
device registry (in ggml-base, always linked) instead:
- ggml_backend_cpu_init() -> ggml_backend_dev_init(
ggml_backend_dev_by_type(CPU), NULL)
- ggml_backend_is_cpu(b) -> ggml_backend_dev_type(
ggml_backend_get_device(b)) == CPU
- ggml_backend_cpu_set_n_threads -> ggml_backend_reg_get_proc_address(
reg, "ggml_backend_set_n_threads")
(the llama.cpp DL pattern)
- ggml_backend_cpu_buffer_from_ptr -> ggml_backend_dev_buffer_from_host_ptr
(maps to the same zero-copy CPU buffer)
Single path for both static and DL: the registry is populated in both
modes (static via GGML_USE_CPU; DL via the existing ggml_backend_load_all
in global_backend(), which runs before any Backend is constructed), so no
#ifdef GGML_BACKEND_DL is needed. Drops the now-unused ggml-cpu.h include.
Add the windows-latest DL leg to CI and remove the "deferred" note.
Validated on macOS: static build, DL build, and the DL Metal test
(dl_backend_is_metal -> MTL0, real transcribe) all pass.
parakeet.cpp has no install() rule, so the parakeet library only exists in the cmake build tree, never the install prefix. Under DL on Windows the SHARED parakeet target produces parakeet.dll + an MSVC import parakeet.lib; ggml redirects DLLs to <build>/bin (CMAKE_RUNTIME_OUTPUT_DIRECTORY, a dir-scoped var that doesn't reach the parent parakeet scope) and the import .lib lands in a generator-dependent spot the fixed lib_dirs list didn't cover -> the consumer test exe link failed with LNK1181: cannot open input file 'parakeet.lib'. (The lib `cargo build` passed because an rlib doesn't link; only `cargo test`, which builds executables, surfaced it.) Walk the whole build tree and add every dir holding a linkable artifact (.lib/.a/.dll/.so/.dylib) to the link-search path, so parakeet.lib is found regardless of generator/platform layout. The install `lib/` dir stays first in search order, so the linked ggml/ggml-base resolve to the install copies; the extra dirs only add the otherwise-uninstalled parakeet library. macOS static + DL builds and the DL Metal test still pass.
The parakeet C API (parakeet_capi_*) is plain `extern "C"` with no __declspec(dllexport) or GGML_API-style export macro. On Windows a SHARED DLL with no exported symbols produces NO import library, so the consumer link failed with `LNK1181: cannot open input file 'parakeet.lib'` — the file was never created (the previous build-tree search couldn't find what doesn't exist). Set WINDOWS_EXPORT_ALL_SYMBOLS on the SHARED parakeet target so CMake auto-generates a .def exporting every public symbol and MSVC emits the import lib. No-op on ELF/Mach-O (default-export), so macOS static + DL builds and the DL Metal test are unaffected (verified locally). Pairs with the build-tree link-search walk: this creates parakeet.lib, that finds it.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
parakeet.cpp's dynamic-backends (
GGML_BACKEND_DL) build worked on macOS + Linux but was deferred on Windows. Root cause:src/backend.cppandsrc/model_loader.cppcall ggml CPU-backend functions directly by symbol (ggml_backend_cpu_init,ggml_backend_is_cpu,ggml_backend_cpu_set_n_threads,ggml_backend_cpu_buffer_from_ptr). Under DL the CPU backend is a separate loadable module, so those symbols are not in the linked core. macOS/Linux papered over it (-undefined dynamic_lookup/--allow-shlib-undefined+ an RTLD_GLOBAL ggml patch). MSVC requires all symbols resolved at DLL link time -> LNK2019 -> Windows DL could not build.Fix
Extend the vendored patch (
parakeet-cpp-sys/patches/parakeet/0001-backend-dl.patch) so parakeet.cpp no longer references the CPU-module symbols directly. CPU access is routed through the ggml device registry (which lives inggml-base, always linked) — the portable DL pattern llama.cpp uses:ggml_backend_cpu_init()ggml_backend_dev_init(ggml_backend_dev_by_type(GGML_BACKEND_DEVICE_TYPE_CPU), NULL)ggml_backend_is_cpu(b)ggml_backend_dev_type(ggml_backend_get_device(b)) == GGML_BACKEND_DEVICE_TYPE_CPUggml_backend_cpu_set_n_threads(b, n)ggml_backend_reg_get_proc_address(reg, "ggml_backend_set_n_threads")cast toggml_backend_set_n_threads_t, called if non-nullggml_backend_cpu_buffer_from_ptr(p, sz)ggml_backend_dev_buffer_from_host_ptr(dev, p, sz, SIZE_MAX)(CPU device maps this to the same zero-copyfrom_ptr)Unified static + DL (no
#ifdef). The registry is populated in both modes: static viaGGML_USE_CPU(CPU device registered at startup), DL via the existingggml_backend_load_allinglobal_backend(), which runs before anyBackendis constructed. So one path covers both. Drops the now-unusedggml-cpu.hinclude.CI
Added the
windows-latest/dl/--features dynamic-backends,vulkanmatrix leg and removed the "Windows DL deferred" note.Validation (macOS, local)
cargo build -p parakeet-cpp(static) — passcargo build -p parakeet-cpp --features dynamic-backends(DL) — passcargo test -p parakeet-cpp --features dynamic-backends --test dl_metalwith a real model — pass (resolved backend (DL): MTL0, real transcribe succeeds)The submodule pointer is unchanged; only the patch file + CI workflow are committed. Windows (and Linux) DL are validated by CI in this PR.