refactor: remove parallel GPU coordinator, fix streaming backpressure by ChrisLundquist · Pull Request #122 · ChrisLundquist/libpz

ChrisLundquist · 2026-03-12T06:25:56Z

Summary

Remove GPU coordinator from parallel scheduler (~900 lines deleted) — it serialized entropy encoding on one thread, bottlenecking at 28 MiB/s. The parallel path is now CPU-only (6+ GiB/s).
Fix one-way backpressure ratchet in streaming GPU path — the coordinator now decrements pressure by batch_len on batch completion. Previously, once pressure exceeded the limit no try_send was attempted, so pressure could never decrease, permanently locking out GPU. GPU routing went from ~16% → ~62% on the mozilla corpus.
Honor Backend::WebGpu in compress_with_options — multi-block GPU requests now route through compress_stream internally (in-memory Cursor I/O), which has the GPU coordinator with adaptive backpressure. Output uses framed format, transparently decompressible by decompress().

Additional cleanup

Flatten UnifiedTask from single-variant enum to type alias (usize, usize)
Simplify complete_task_lifecycle (remove dead next_task parameter, always None)
Remove dead GPU telemetry methods and atomic fields from LocalSchedulerStats
Add criterion benchmark (gpu_parallel_vs_streaming) comparing all four paths
Update docs: CLAUDE.md, DESIGN.md, gpu-strategy.md, pipeline-architecture.md

Architecture after this PR

Path	GPU handling
Single block (input ≤ block_size)	`compress_block` respects `Backend::WebGpu` directly
Single thread (threads = 1)	Sequential `compress_block` respects `Backend::WebGpu` directly
Multi-thread + CPU	`compress_parallel` — CPU-only unified scheduler
Multi-thread + GPU	Routes through `compress_stream` — GPU coordinator with adaptive backpressure

Test plan

All 694 tests pass (692 unit + 2 doc), clippy clean
Pre-commit hook passes (fmt, clippy, build, test)
Verified backpressure fix with instrumented streaming runs on Silesia corpus
Criterion benchmark confirms parallel GPU path no longer bottlenecks (was 568ms → now 2.97ms, uses CPU-only path)
Verify GPU streaming throughput on target hardware with cargo bench -- gpu_par_vs_stream

🤖 Generated with Claude Code

…ng backpressure The parallel GPU coordinator serialized entropy encoding on one thread, bottlenecking at 28 MiB/s. The streaming path's GPU coordinator with adaptive backpressure is the correct architecture for GPU acceleration. Changes: - Remove GPU coordinator from parallel.rs (~900 lines): GpuRequest enum, StageGpu/FusedGpu task variants, gpu_fused_span, pressure_inc/dec, should_route_block_to_gpu_stage0, complete_gpu_stage, and all GPU channel/routing logic. Parallel scheduler is now CPU-only. - Fix one-way backpressure ratchet in streaming.rs: coordinator now decrements pressure by batch_len on completion, preventing permanent GPU lockout (routing went from ~16% to ~62% on mozilla corpus). - Honor Backend::WebGpu in compress_with_options: multi-block GPU requests route through compress_stream with in-memory Cursor I/O, producing framed-format output (transparently decompressible). - Flatten UnifiedTask from single-variant enum to type alias. - Simplify complete_task_lifecycle (remove dead next_task parameter). - Remove dead GPU telemetry methods and atomic fields from LocalSchedulerStats; retain public UnifiedSchedulerStats fields for API stability. - Add criterion benchmark comparing parallel vs streaming GPU paths. - Update CLAUDE.md, DESIGN.md, gpu-strategy.md, pipeline-architecture.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ChrisLundquist force-pushed the claude/reverent-wright branch from 469c061 to 4387364 Compare March 12, 2026 06:29

ChrisLundquist merged commit 3287348 into master Mar 12, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: remove parallel GPU coordinator, fix streaming backpressure#122

refactor: remove parallel GPU coordinator, fix streaming backpressure#122
ChrisLundquist merged 1 commit into
masterfrom
claude/reverent-wright

ChrisLundquist commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ChrisLundquist commented Mar 12, 2026

Summary

Additional cleanup

Architecture after this PR

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant