Add GPU-accelerated AV1 encode/decode (VA-API, NVENC/NVDEC, Vulkan Video)

## Summary

Add hardware-accelerated AV1 video encoding and decoding to StreamKit, leveraging GPU capabilities via VA-API, NVIDIA NVENC/NVDEC, and (future) Vulkan Video. This builds on the CPU AV1 foundation (#216) and the GPU compositing work (#194) to move toward a zero-copy GPU media pipeline.

## Motivation

- CPU AV1 encoding (rav1e) at real-time speeds trades significant quality; HW encoders deliver better quality at the same latency
- GPU compositing (#194) already keeps frames on GPU — HW codecs reduce CPU↔GPU roundtrips
- AV1 has universal HW encode support: Intel (Arc+), AMD (RDNA 3+), NVIDIA (RTX 40xx+)
- The ultimate goal is a zero-copy pipeline: `GPU decode → GPU composite → GPU encode` with no CPU copies

## Prerequisites

- #216 — CPU AV1 support + codec negotiation refactor (provides the `av1` feature, `Av1EncoderNode`/`Av1DecoderNode` interfaces, and MoQ codec negotiation)
- #194 — GPU compositing via wgpu (provides the GPU texture pipeline and wgpu device management)

## HW Acceleration Options

### Option A: VA-API via `cros-codecs` / `cros-libva`

| Property | Detail |
|----------|--------|
| **Crate** | [`cros-codecs`](https://crates.io/crates/cros-codecs) v0.0.6 (BSD-3-Clause) |
| **AV1 decode** | Supported (VA-API) |
| **AV1 encode** | Supported (VA-API) |
| **VP9 decode/encode** | Also supported |
| **HW support** | Intel, AMD, NVIDIA on Linux (via Mesa/intel-media-driver) |
| **Maturity** | Solid — developed for ChromeOS/crosvm, 546k downloads |
| **wgpu interop** | No native interop — requires DMA-BUF → Vulkan VkImage → wgpu texture bridge (custom work) |

### Option B: NVIDIA NVENC/NVDEC (native SDK bindings)

| Property | Detail |
|----------|--------|
| **SDK** | NVIDIA Video Codec SDK |
| **AV1 encode** | Supported (RTX 40xx+, Ada Lovelace) |
| **AV1 decode** | Supported (RTX 30xx+, Ampere) |
| **Rust bindings** | Would need custom FFI bindings or use existing crates like `nvidia-video-codec-sdk` |
| **HW support** | NVIDIA GPUs only |
| **Performance** | Best-in-class for NVIDIA hardware, lower latency than VA-API on NVIDIA |
| **wgpu interop** | CUDA↔Vulkan interop is possible but complex |

### Option C: Vulkan Video via `vk-video`

| Property | Detail |
|----------|--------|
| **Crate** | [`vk-video`](https://crates.io/crates/vk-video) v0.2.1 (MIT) |
| **AV1 support** | **Not yet** — currently H.264 only |
| **wgpu interop** | **Native!** — built on wgpu, decoded frames are `wgpu::Texture`s |
| **Maturity** | Very early (1.2k downloads), developed by Software Mansion for [Smelter](https://github.com/software-mansion/smelter) |
| **Vulkan spec** | `VK_KHR_video_decode_av1` ratified Feb 2024, `VK_KHR_video_encode_av1` ratified Nov 2024 |
| **Potential** | Best long-term option for zero-copy pipeline, but needs AV1 support added |

## Proposed Phased Approach

### Phase 2a: VA-API HW acceleration (most practical today)

```rust
// crates/nodes/src/video/vaapi.rs (new)

/// VA-API accelerated AV1 encoder using cros-codecs.
pub struct VaapiAv1EncoderNode { config: VaapiAv1EncoderConfig }
// Input: I420/NV12 raw frames
// Output: AV1 encoded packets
// Uses cros_codecs::encoder for VA-API AV1 encoding

/// VA-API accelerated AV1 decoder using cros-codecs.  
pub struct VaapiAv1DecoderNode { config: VaapiAv1DecoderConfig }
// Input: AV1 encoded packets
// Output: NV12 frames (VA surface → mapped to CPU buffer)

/// Runtime capability detection
pub fn probe_vaapi_capabilities() -> VaapiCapabilities { ... }
// Checks available VA-API profiles/entrypoints for AV1 encode/decode
```

- Feature: `vaapi = ["dep:cros-codecs"]`
- Fallback: auto-detect VA-API at runtime, fall back to CPU rav1e/rav1d if unavailable
- Works on Intel/AMD/NVIDIA on Linux
- Docker: requires `--gpus` flag + VA-API drivers

### Phase 2b: NVIDIA NVENC/NVDEC (optional, for best NVIDIA performance)

- Feature: `nvenc = ["dep:nvidia-video-codec-sdk"]` (or custom FFI bindings)
- Only worth pursuing if VA-API performance on NVIDIA is insufficient
- NVENC typically has lower latency and better rate control than VA-API on NVIDIA

### Phase 2c: Vulkan Video zero-copy pipeline (future, when ecosystem matures)

```
AV1 packets → Vulkan Video decode → wgpu::Texture (NV12)
  → GPU compositor (wgpu, already merged) → wgpu::Texture (output)
  → Vulkan Video encode → AV1 packets
```

- No CPU roundtrips — frames never leave GPU memory
- Requires `vk-video` crate to add AV1 support, OR building Vulkan Video AV1 directly using `ash` + wgpu Vulkan interop
- The wgpu `Device` and `Queue` are shared between compositor and codec
- This is the ultimate performance target

## Tasks

- [ ] Evaluate VA-API AV1 encode/decode quality and latency with `cros-codecs` (PoC)
- [ ] Implement `VaapiAv1EncoderNode` and `VaapiAv1DecoderNode`
- [ ] Add runtime GPU capability detection and CPU fallback
- [ ] Evaluate NVIDIA NVENC/NVDEC bindings (if VA-API perf on NVIDIA is insufficient)
- [ ] Update Docker GPU images for VA-API/NVENC support
- [ ] Benchmark HW vs CPU encode/decode (latency, quality, throughput)
- [ ] Track `vk-video` AV1 progress for future zero-copy integration
- [ ] Update docs

## Open Questions

1. **VA-API vs NVENC priority** — Start with VA-API (cross-vendor) or NVENC (best NVIDIA perf)?
2. **DMA-BUF → wgpu bridge** — Build custom interop for Phase 2a, or wait for Vulkan Video (Phase 2c)?
3. **Auto-selection** — Should the runtime automatically choose HW > CPU, or let the user configure?
4. **Docker story** — Separate `Dockerfile.gpu-vaapi` / `Dockerfile.gpu-nvenc`, or unified with runtime detection?

Property	Detail
Crate	`vk-video` v0.2.1 (MIT)
AV1 support	Not yet — currently H.264 only
wgpu interop	Native! — built on wgpu, decoded frames are `wgpu::Texture`s
Maturity	Very early (1.2k downloads), developed by Software Mansion for Smelter
Vulkan spec	`VK_KHR_video_decode_av1` ratified Feb 2024, `VK_KHR_video_encode_av1` ratified Nov 2024
Potential	Best long-term option for zero-copy pipeline, but needs AV1 support added

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GPU-accelerated AV1 encode/decode (VA-API, NVENC/NVDEC, Vulkan Video) #217

Summary

Motivation

Prerequisites

HW Acceleration Options

Option A: VA-API via `cros-codecs` / `cros-libva`

Option B: NVIDIA NVENC/NVDEC (native SDK bindings)

Option C: Vulkan Video via `vk-video`

Proposed Phased Approach

Phase 2a: VA-API HW acceleration (most practical today)

Phase 2b: NVIDIA NVENC/NVDEC (optional, for best NVIDIA performance)

Phase 2c: Vulkan Video zero-copy pipeline (future, when ecosystem matures)

Tasks

Open Questions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Property	Detail
Crate	`cros-codecs` v0.0.6 (BSD-3-Clause)
AV1 decode	Supported (VA-API)
AV1 encode	Supported (VA-API)
VP9 decode/encode	Also supported
HW support	Intel, AMD, NVIDIA on Linux (via Mesa/intel-media-driver)
Maturity	Solid — developed for ChromeOS/crosvm, 546k downloads
wgpu interop	No native interop — requires DMA-BUF → Vulkan VkImage → wgpu texture bridge (custom work)

Property	Detail
SDK	NVIDIA Video Codec SDK
AV1 encode	Supported (RTX 40xx+, Ada Lovelace)
AV1 decode	Supported (RTX 30xx+, Ampere)
Rust bindings	Would need custom FFI bindings or use existing crates like `nvidia-video-codec-sdk`
HW support	NVIDIA GPUs only
Performance	Best-in-class for NVIDIA hardware, lower latency than VA-API on NVIDIA
wgpu interop	CUDA↔Vulkan interop is possible but complex

Add GPU-accelerated AV1 encode/decode (VA-API, NVENC/NVDEC, Vulkan Video) #217

Description

Summary

Motivation

Prerequisites

HW Acceleration Options

Option A: VA-API via cros-codecs / cros-libva

Option B: NVIDIA NVENC/NVDEC (native SDK bindings)

Option C: Vulkan Video via vk-video

Proposed Phased Approach

Phase 2a: VA-API HW acceleration (most practical today)

Phase 2b: NVIDIA NVENC/NVDEC (optional, for best NVIDIA performance)

Phase 2c: Vulkan Video zero-copy pipeline (future, when ecosystem matures)

Tasks

Open Questions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Option A: VA-API via `cros-codecs` / `cros-libva`

Option C: Vulkan Video via `vk-video`