Skip to content

feat(q3tts): add qwen3-tts llama-server runtime#14

Open
muggle-stack wants to merge 1 commit into
spacemit-com:spacemit-mtmdfrom
muggle-stack:feat/q3tts-llama-server-runtime
Open

feat(q3tts): add qwen3-tts llama-server runtime#14
muggle-stack wants to merge 1 commit into
spacemit-com:spacemit-mtmdfrom
muggle-stack:feat/q3tts-llama-server-runtime

Conversation

@muggle-stack

Copy link
Copy Markdown

Summary

  • Add Qwen3-TTS runtime tooling under tools/speech/backends/qwen3_tts.
  • Wire an OpenAI-compatible /v1/audio/speech path through the common speech backend layout.
  • Register Qwen3-TTS side tensors and SpacemiT talker/CP runtime kernel paths for Q8/Q4 execution.
  • Add default 24 kHz audio post-processing and guard against no-EOS truncated segment responses.

Validation

  • git diff --check
  • K3 native build: cmake --build build-q3tts-spacemit --target llama-server q3tts-runner talker_driver.headmain -j8
  • K3 speech smoke: /v1/audio/speech returned 24 kHz PCM WAV with X-Speech-Backend: qwen3-tts
  • K3 regression: Chinese, English, and mixed Chinese/English short/long text cases generated successfully
  • No-EOS guard check: forcing old long-segment behavior returns HTTP 500 instead of a partial truncated WAV

Regression Results

zh_short     audio 3.760s   wall 3.487s   RTF 0.93   segments 1
zh_long      audio 17.520s  wall 14.949s  RTF 0.85   segments 3
en_short     audio 3.920s   wall 3.611s   RTF 0.92   segments 1
en_long      audio 16.140s  wall 14.300s  RTF 0.89   segments 3
mixed_short  audio 6.720s   wall 6.095s   RTF 0.91   segments 1
mixed_long   audio 20.000s  wall 18.162s  RTF 0.91   segments 4

- Add Qwen3-TTS runtime tools and OpenAI-compatible speech endpoint wiring.
- Register Qwen3-TTS side tensors and safe mixed-language stdin splitting.
- Add SpacemiT talker and CP runtime kernel paths for Q8/Q4 execution.
- Clean 24 kHz audio output with default runtime post-processing.
- Move Qwen3-TTS under the common speech backend layout.

Co-authored-by: codex <codex@openai.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an initial speech synthesis backend integration for llama-server, exposing an OpenAI-compatible /v1/audio/speech endpoint backed by a new Qwen3-TTS runtime/tooling layout under tools/speech/. It also introduces Qwen3-side tensors and a few performance/graph-reuse related changes needed to support the runtime efficiently (especially on SpacemiT/RISC-V targets).

Changes:

  • Add tools/speech/ + qwen3_tts backend with runner/tools, runtime packaging, and reference prompt conversion utilities.
  • Wire /v1/audio/speech (and /audio/speech) into llama-server, with a server_speech_service abstraction and a Qwen3-TTS backend implementation.
  • Add Qwen3 model-side tensor registrations + embed-only mode and integrate SpacemiT CPU kernel/perf toggles used by the runtime.

Reviewed changes

Copilot reviewed 42 out of 42 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tools/speech/README.md Documents speech backend layout and current backends
tools/speech/CMakeLists.txt Adds speech backend subdir + install target
tools/speech/backends/qwen3_tts/tools/q3tts_run_main.cpp CLI entrypoint for q3tts runner
tools/speech/backends/qwen3_tts/tools/q3tts_ref_to_bin.cpp Reference WAV/text to speaker/prompt bin converter
tools/speech/backends/qwen3_tts/tools/q3tts_cp_kernel_bench.cpp CP kernel microbenchmark for SpacemiT/RVV
tools/speech/backends/qwen3_tts/src/talker_driver.c In-process talker+CP driver emitting codec frames
tools/speech/backends/qwen3_tts/src/kernels/heads_pool.h RVV GEMV pool for CP lm-heads
tools/speech/backends/qwen3_tts/README.md Backend build/run documentation
tools/speech/backends/qwen3_tts/include/qwen3_tts/qwen3_tts_runtime.h Runtime public header (CLI)
tools/speech/backends/qwen3_tts/include/qwen3_tts/q3tts_codec_ort.h ONNX Runtime codec decoder pool
tools/speech/backends/qwen3_tts/include/qwen3_tts/q3tts_audio_sdk.h Optional ALSA segment playback helper
tools/speech/backends/qwen3_tts/CMakeLists.txt Builds/installs q3tts runtime + tools
tools/speech/backends/qwen3_tts/cmake/talker_driver.qwen3tts-k3.in Script wrapper for talker_driver defaults
tools/speech/backends/qwen3_tts/cmake/q3tts-run.in Script wrapper for end-to-end runner
tools/server/server.cpp Registers speech routes and proxy plumbing
tools/server/server-speech.h Speech service API surface for server
tools/server/server-speech.cpp Backend selection + service wrapper implementation
tools/server/server-speech-qwen3-tts.h Qwen3-TTS speech backend interface
tools/server/server-speech-qwen3-tts.cpp Qwen3-TTS backend: runner process mgmt + WAV merge
tools/server/server-speech-backend.h Abstract speech backend interface
tools/server/server-context.h Adds post_speech_oai route slot
tools/server/server-context.cpp Implements /audio/speech handler and backend init
tools/server/CMakeLists.txt Adds speech sources to server-context target
tools/CMakeLists.txt Adds tools/speech subdir behind build option
src/models/qwen3.cpp Adds Q3TTS tensors, SWIGLU gate_up support, embed-only mode
src/llama-quant.cpp Prevents quantizing q3tts.* tensors
src/llama-model.h Adds ffn_gate_up tensor pointer in layer struct
src/llama-model.cpp Avoids layer buft assignment for TENSOR_SKIP tensors
src/llama-context.h Adds graph reuse + threadpool caching fields
src/llama-context.cpp Adds ctx pad env, 2-way graph cache, threadpool/n_threads caching
src/llama-arch.h Adds tensor IDs for gate_up and Q3TTS tensors
src/llama-arch.cpp Maps new tensor IDs to names/infos
ggml/src/ggml-cpu/spacemit/ime2_kernels.cpp Adds env toggle + new m1 n64 + m2 i8i4_hp kernels
ggml/src/ggml-cpu/spacemit/ime.cpp Adds env toggles and SWIGLU-down fusion path for SpacemiT
ggml/src/ggml-cpu/spacemit/ime_env.cpp Fixes env-string empty checks and logging call
ggml/src/ggml-cpu/ggml-cpu.c Adds SWIGLU-down fusion support and fusion skip behavior
ggml/CMakeLists.txt Adds GGML_RV_ZBA option
common/common.h Enables media_backend/smt_config_dir fields for speech builds
common/arg.cpp Enables media backend args under speech builds; expands examples
CMakeLists.txt Adds LLAMA_BUILD_SPEECH option + LLAMA_BUILD_Q3TTS alias/define

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +5
add_subdirectory(backends/qwen3_tts)

add_custom_target(speech-install
DEPENDS q3tts-install
)
Comment on lines +529 to +536
std::vector<char *> argv;
argv.reserve(args.size() + 1);
for (auto & arg : args) {
argv.push_back(arg.data());
}
argv.push_back(nullptr);
execv(argv[0], argv.data());
_exit(127);
Comment on lines +473 to +479
void qwen3_tts_backend::ensure_started() {
std::lock_guard<std::mutex> lock(start_mutex);
if (child_pid > 0 && !child_closed) {
return;
}
start_process();
}
Comment on lines +30 to +38
if(LLAMA_BUILD_SPEECH)
target_sources(${TARGET} PRIVATE
server-speech-backend.h
server-speech.cpp
server-speech.h
server-speech-qwen3-tts.cpp
server-speech-qwen3-tts.h
)
endif()
Comment on lines +1 to +7
#include <chrono>
#include <cstdint>
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <pthread.h>
#include <vector>
Comment on lines +147 to +171
add_executable(talker_driver.headmain ${CMAKE_CURRENT_SOURCE_DIR}/src/talker_driver.c)
target_compile_definitions(talker_driver.headmain PRIVATE _GNU_SOURCE)
target_compile_options(talker_driver.headmain PRIVATE
$<$<C_COMPILER_ID:GNU>:-O2>
$<$<C_COMPILER_ID:GNU>:-fno-tree-vectorize>
)
if(CMAKE_SYSTEM_PROCESSOR MATCHES "^(riscv)")
target_compile_options(talker_driver.headmain PRIVATE
$<$<C_COMPILER_ID:GNU>:-march=rv64gcv_zfh_zvfh_zba_zicbop_zihintpause>
$<$<C_COMPILER_ID:GNU>:-mabi=lp64d>
)
endif()
target_include_directories(talker_driver.headmain PRIVATE
${Q3TTS_INCLUDE_DIRS}
${CMAKE_CURRENT_SOURCE_DIR}/src/kernels
)
target_link_libraries(talker_driver.headmain PRIVATE
llama
ggml-cpu
ggml-base
ggml
pthread
m
)

Comment on lines +172 to +185
add_executable(q3tts-cp-kernel-bench ${CMAKE_CURRENT_SOURCE_DIR}/tools/q3tts_cp_kernel_bench.cpp)
target_include_directories(q3tts-cp-kernel-bench PRIVATE
${CMAKE_SOURCE_DIR}/ggml/src
${CMAKE_SOURCE_DIR}/ggml/src/ggml-cpu
)
target_link_libraries(q3tts-cp-kernel-bench PRIVATE
ggml-cpu
ggml-base
ggml
pthread
m
)
target_compile_features(q3tts-cp-kernel-bench PRIVATE cxx_std_17)

Comment on lines +186 to +195
configure_file(${CMAKE_CURRENT_SOURCE_DIR}/cmake/q3tts-run.in
${Q3TTS_SCRIPT_OUTPUT_DIR}/q3tts-run @ONLY)
configure_file(${CMAKE_CURRENT_SOURCE_DIR}/cmake/talker_driver.qwen3tts-k3.in
${Q3TTS_SCRIPT_OUTPUT_DIR}/talker_driver.qwen3tts-k3 @ONLY)

execute_process(COMMAND chmod +x
${Q3TTS_SCRIPT_OUTPUT_DIR}/q3tts-run
${Q3TTS_SCRIPT_OUTPUT_DIR}/talker_driver.qwen3tts-k3
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build documentation Improvements or additions to documentation ggml model server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants