Skip to content

feat: Support multiple input formats w/ ffmpeg#7

Merged
achetronic merged 1 commit into
masterfrom
feat/ffmpeg-audio-conversion
Apr 23, 2026
Merged

feat: Support multiple input formats w/ ffmpeg#7
achetronic merged 1 commit into
masterfrom
feat/ffmpeg-audio-conversion

Conversation

@achetronic
Copy link
Copy Markdown
Owner

Overview

Today the server only accepts WAV uploads. Anything else (MP3, OGG, WebM, FLAC, M4A, AAC, Opus, ...) is rejected with a
"not yet implemented" error, which is awkward for a Whisper-compatible API since most OpenAI clients upload compressed audio by default.

This PR adds an optional ffmpeg-backed fallback: if the upload is not a WAV, the server transcodes it to 16 kHz mono WAV
on the fly and then runs the normal pipeline. Format is detected by reading the file's magic bytes, not the filename, so
clients that upload without an extension still work.

How it works

• WAV input is still parsed in pure Go — no external process involved. That is the fast path.
• Non-WAV input is handed to a small ffmpegConverter that shells out to the system ffmpeg binary with a timeout and
captured stderr.
• At startup we run a single exec.LookPath("ffmpeg") . If ffmpeg is not installed, conversion is disabled cleanly: the
server boots, logs a warning, and any non-WAV upload gets a clear 400 invalid_request_error instead of a 500.
• Each conversion uses os.CreateTemp for input and output files, so concurrent requests never share paths. This keeps
the guarantees of the existing worker pool intact.

@achetronic achetronic self-assigned this Apr 23, 2026
@achetronic achetronic added the enhancement New feature or request label Apr 23, 2026
@achetronic achetronic merged commit 52c7b93 into master Apr 23, 2026
2 checks passed
@achetronic achetronic deleted the feat/ffmpeg-audio-conversion branch April 24, 2026 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant