-
Notifications
You must be signed in to change notification settings - Fork 0
Add GPU-accelerated AV1 encode/decode (VA-API, NVENC/NVDEC, Vulkan Video) #217
Copy link
Copy link
Open
Description
Summary
Add hardware-accelerated AV1 video encoding and decoding to StreamKit, leveraging GPU capabilities via VA-API, NVIDIA NVENC/NVDEC, and (future) Vulkan Video. This builds on the CPU AV1 foundation (#216) and the GPU compositing work (#194) to move toward a zero-copy GPU media pipeline.
Motivation
- CPU AV1 encoding (rav1e) at real-time speeds trades significant quality; HW encoders deliver better quality at the same latency
- GPU compositing (GPU-accelerated compositing via wgpu #194) already keeps frames on GPU — HW codecs reduce CPU↔GPU roundtrips
- AV1 has universal HW encode support: Intel (Arc+), AMD (RDNA 3+), NVIDIA (RTX 40xx+)
- The ultimate goal is a zero-copy pipeline:
GPU decode → GPU composite → GPU encodewith no CPU copies
Prerequisites
- Add CPU AV1 codec support (rav1e encoder + rav1d decoder) + codec negotiation refactor #216 — CPU AV1 support + codec negotiation refactor (provides the
av1feature,Av1EncoderNode/Av1DecoderNodeinterfaces, and MoQ codec negotiation) - GPU-accelerated compositing via wgpu #194 — GPU compositing via wgpu (provides the GPU texture pipeline and wgpu device management)
HW Acceleration Options
Option A: VA-API via cros-codecs / cros-libva
| Property | Detail |
|---|---|
| Crate | cros-codecs v0.0.6 (BSD-3-Clause) |
| AV1 decode | Supported (VA-API) |
| AV1 encode | Supported (VA-API) |
| VP9 decode/encode | Also supported |
| HW support | Intel, AMD, NVIDIA on Linux (via Mesa/intel-media-driver) |
| Maturity | Solid — developed for ChromeOS/crosvm, 546k downloads |
| wgpu interop | No native interop — requires DMA-BUF → Vulkan VkImage → wgpu texture bridge (custom work) |
Option B: NVIDIA NVENC/NVDEC (native SDK bindings)
| Property | Detail |
|---|---|
| SDK | NVIDIA Video Codec SDK |
| AV1 encode | Supported (RTX 40xx+, Ada Lovelace) |
| AV1 decode | Supported (RTX 30xx+, Ampere) |
| Rust bindings | Would need custom FFI bindings or use existing crates like nvidia-video-codec-sdk |
| HW support | NVIDIA GPUs only |
| Performance | Best-in-class for NVIDIA hardware, lower latency than VA-API on NVIDIA |
| wgpu interop | CUDA↔Vulkan interop is possible but complex |
Option C: Vulkan Video via vk-video
| Property | Detail |
|---|---|
| Crate | vk-video v0.2.1 (MIT) |
| AV1 support | Not yet — currently H.264 only |
| wgpu interop | Native! — built on wgpu, decoded frames are wgpu::Textures |
| Maturity | Very early (1.2k downloads), developed by Software Mansion for Smelter |
| Vulkan spec | VK_KHR_video_decode_av1 ratified Feb 2024, VK_KHR_video_encode_av1 ratified Nov 2024 |
| Potential | Best long-term option for zero-copy pipeline, but needs AV1 support added |
Proposed Phased Approach
Phase 2a: VA-API HW acceleration (most practical today)
// crates/nodes/src/video/vaapi.rs (new)
/// VA-API accelerated AV1 encoder using cros-codecs.
pub struct VaapiAv1EncoderNode { config: VaapiAv1EncoderConfig }
// Input: I420/NV12 raw frames
// Output: AV1 encoded packets
// Uses cros_codecs::encoder for VA-API AV1 encoding
/// VA-API accelerated AV1 decoder using cros-codecs.
pub struct VaapiAv1DecoderNode { config: VaapiAv1DecoderConfig }
// Input: AV1 encoded packets
// Output: NV12 frames (VA surface → mapped to CPU buffer)
/// Runtime capability detection
pub fn probe_vaapi_capabilities() -> VaapiCapabilities { ... }
// Checks available VA-API profiles/entrypoints for AV1 encode/decode- Feature:
vaapi = ["dep:cros-codecs"] - Fallback: auto-detect VA-API at runtime, fall back to CPU rav1e/rav1d if unavailable
- Works on Intel/AMD/NVIDIA on Linux
- Docker: requires
--gpusflag + VA-API drivers
Phase 2b: NVIDIA NVENC/NVDEC (optional, for best NVIDIA performance)
- Feature:
nvenc = ["dep:nvidia-video-codec-sdk"](or custom FFI bindings) - Only worth pursuing if VA-API performance on NVIDIA is insufficient
- NVENC typically has lower latency and better rate control than VA-API on NVIDIA
Phase 2c: Vulkan Video zero-copy pipeline (future, when ecosystem matures)
AV1 packets → Vulkan Video decode → wgpu::Texture (NV12)
→ GPU compositor (wgpu, already merged) → wgpu::Texture (output)
→ Vulkan Video encode → AV1 packets
- No CPU roundtrips — frames never leave GPU memory
- Requires
vk-videocrate to add AV1 support, OR building Vulkan Video AV1 directly usingash+ wgpu Vulkan interop - The wgpu
DeviceandQueueare shared between compositor and codec - This is the ultimate performance target
Tasks
- Evaluate VA-API AV1 encode/decode quality and latency with
cros-codecs(PoC) - Implement
VaapiAv1EncoderNodeandVaapiAv1DecoderNode - Add runtime GPU capability detection and CPU fallback
- Evaluate NVIDIA NVENC/NVDEC bindings (if VA-API perf on NVIDIA is insufficient)
- Update Docker GPU images for VA-API/NVENC support
- Benchmark HW vs CPU encode/decode (latency, quality, throughput)
- Track
vk-videoAV1 progress for future zero-copy integration - Update docs
Open Questions
- VA-API vs NVENC priority — Start with VA-API (cross-vendor) or NVENC (best NVIDIA perf)?
- DMA-BUF → wgpu bridge — Build custom interop for Phase 2a, or wait for Vulkan Video (Phase 2c)?
- Auto-selection — Should the runtime automatically choose HW > CPU, or let the user configure?
- Docker story — Separate
Dockerfile.gpu-vaapi/Dockerfile.gpu-nvenc, or unified with runtime detection?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels