feat: Support multiple input formats w/ ffmpeg#7
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
Today the server only accepts WAV uploads. Anything else (MP3, OGG, WebM, FLAC, M4A, AAC, Opus, ...) is rejected with a
"not yet implemented" error, which is awkward for a Whisper-compatible API since most OpenAI clients upload compressed audio by default.
This PR adds an optional ffmpeg-backed fallback: if the upload is not a WAV, the server transcodes it to 16 kHz mono WAV
on the fly and then runs the normal pipeline. Format is detected by reading the file's magic bytes, not the filename, so
clients that upload without an extension still work.
How it works
• WAV input is still parsed in pure Go — no external process involved. That is the fast path.
• Non-WAV input is handed to a small ffmpegConverter that shells out to the system ffmpeg binary with a timeout and
captured stderr.
• At startup we run a single exec.LookPath("ffmpeg") . If ffmpeg is not installed, conversion is disabled cleanly: the
server boots, logs a warning, and any non-WAV upload gets a clear 400 invalid_request_error instead of a 500.
• Each conversion uses os.CreateTemp for input and output files, so concurrent requests never share paths. This keeps
the guarantees of the existing worker pool intact.