Skip to content

[media3ext] FFmpeg video renderer leaves Surface BufferQueue connected to NATIVE_WINDOW_API_CPU — breaks subsequent MediaCodec.configure() on the same Surface #190

@samaudi

Description

@samaudi

Affected: io.github.anilbeesetti:nextlib-media3ext:1.10.0-0.12.1 (and all earlier; same code on main).

Summary

After NextLib's FFmpeg video renderer renders frames to an app-provided Android Surface, the Surface's underlying BufferQueue stays connected to NATIVE_WINDOW_API_CPU. The connection is never explicitly torn down. When the app subsequently attaches a MediaCodec-based renderer (or any other consumer that needs NATIVE_WINDOW_API_MEDIA) to the same Surface, MediaCodec.configure() fails:

BufferQueueProducer: … nativeWindowConnect returned error: Invalid argument (-22)
MediaCodec: configure failed with err 0xffffffea
MediaCodecRenderer: Failed to initialize decoder: c2.qti.avc.decoder
java.lang.IllegalArgumentException: at android.media.MediaCodec.native_configure(Native Method)

Same class of bug as androidx/media#1802 (filed against Media3, still open) — the extension renderer doesn't release its ANativeWindow API connection on the consumer Surface. That issue's owner explicitly asked how to do it from native code; the answer is in this issue.

Repro

In a Media3 ExoPlayer setup using NextLib's FFmpeg video renderer (NextRenderersFactory with extension renderers enabled, or any equivalent wiring), play a video that the FFmpeg software renderer handles (e.g. an exotic codec, or with software preference). Then switch (e.g. by changing the media item) to a video that the platform MediaCodec video renderer handles, on the same output Surface (typical setup with one SurfaceView / PlayerView). The MediaCodec renderer crashes on configure() with the stack above. Not device- or Media3-version-specific.

Root cause

media3ext/src/main/cpp/ffvideo.cpp:

  • ffmpegRenderFrame (called per output frame) uses ANativeWindow_lock + ANativeWindow_unlockAndPost to write the decoded YUV into the Surface. The very first ANativeWindow_lock call implicitly connects the underlying BufferQueue to NATIVE_WINDOW_API_CPU (documented Android behaviour: Surface::lock() calls connect(NATIVE_WINDOW_API_CPU) if not already connected).
  • MaybeAcquireNativeWindow caches the ANativeWindow* across frames (only releases & re-acquires when the consumer Surface changes).
  • On decoder shutdown (ffmpegRelease~JniContext), only ANativeWindow_release is called on the cached window. That decrements the producer-side wrapper's refcount, but does not call native_window_api_disconnect. The underlying app Surface is still alive (it's owned by the app's SurfaceView/TextureView, not by NextLib), so the BufferQueue's connect(CPU) state persists.

A subsequent MediaCodec.configure(format, surface, …) calls native_window_api_connect(NATIVE_WINDOW_API_MEDIA), which the BufferQueue rejects with BAD_VALUE (-22) because it's still connected to NATIVE_WINDOW_API_CPU.

Symmetric counterpart works fine: MediaCodec → FFmpeg never crashes, because MediaCodec's release() cleanly calls native_window_api_disconnect(NATIVE_WINDOW_API_MEDIA). The asymmetry is purely on the FFmpeg side.

Suggested fix

Call native_window_api_disconnect(window, NATIVE_WINDOW_API_CPU) before each ANativeWindow_release of a window that has been used as a CPU producer (i.e. that has had ANativeWindow_lock called on it at least once). Two locations in ffvideo.cpp:

  1. MaybeAcquireNativeWindow, when the cached window is replaced because the consumer Surface changed.
  2. JniContext destructor (fires on ffmpegRelease), when the cached window is released on decoder shutdown.

native_window_api_disconnect is declared in <system/window.h> (NDK-available). A small bool connected_as_cpu flag in JniContext is enough to track whether to call disconnect (set on first successful ANativeWindow_lock, cleared on release/disconnect).

Approximate sketch:

// In JniContext:
struct JniContext {
    ANativeWindow* native_window = nullptr;
    bool connected_as_cpu = false;   // NEW
    // ...
    ~JniContext() {
        if (native_window) {
            if (connected_as_cpu) {
                native_window_api_disconnect(native_window, NATIVE_WINDOW_API_CPU);
            }
            ANativeWindow_release(native_window);
        }
    }
};

// In ffmpegRenderFrame, after a successful lock:
int result = ANativeWindow_lock(jniContext->native_window, &native_window_buffer, nullptr);
if (result == 0) {
    jniContext->connected_as_cpu = true;
    // ... render ...
    ANativeWindow_unlockAndPost(jniContext->native_window);
}

// In MaybeAcquireNativeWindow, before releasing the old window:
if (native_window) {
    if (jniContext->connected_as_cpu) {
        native_window_api_disconnect(native_window, NATIVE_WINDOW_API_CPU);
        jniContext->connected_as_cpu = false;
    }
    ANativeWindow_release(native_window);
}

App-level workaround (we currently ship)

We recreate the SurfaceView (Compose key(token)) on FFmpeg→MediaCodec transitions so MediaCodec attaches to a brand-new BufferQueue. Works, but it's heavy (UI surface lifecycle churn) and triggers an unrelated Media3 compose-layout race that we then had to work around separately. A proper fix at the renderer is much cleaner — would benefit every NextLib user mixing the FFmpeg fallback with the platform MediaCodec renderer (i.e. any PREFER_HARDWARE + FFmpeg-fallback configuration).

Severity

Not device-/Android-version-specific. Affects the standard PREFER_HARDWARE + FFmpeg-fallback video player setup whenever a stream switch crosses the renderer boundary.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions