Skip to content

refactor: remove parallel GPU coordinator, fix streaming backpressure#122

Merged
ChrisLundquist merged 1 commit into
masterfrom
claude/reverent-wright
Mar 12, 2026
Merged

refactor: remove parallel GPU coordinator, fix streaming backpressure#122
ChrisLundquist merged 1 commit into
masterfrom
claude/reverent-wright

Conversation

@ChrisLundquist
Copy link
Copy Markdown
Owner

Summary

  • Remove GPU coordinator from parallel scheduler (~900 lines deleted) — it serialized entropy encoding on one thread, bottlenecking at 28 MiB/s. The parallel path is now CPU-only (6+ GiB/s).
  • Fix one-way backpressure ratchet in streaming GPU path — the coordinator now decrements pressure by batch_len on batch completion. Previously, once pressure exceeded the limit no try_send was attempted, so pressure could never decrease, permanently locking out GPU. GPU routing went from ~16% → ~62% on the mozilla corpus.
  • Honor Backend::WebGpu in compress_with_options — multi-block GPU requests now route through compress_stream internally (in-memory Cursor I/O), which has the GPU coordinator with adaptive backpressure. Output uses framed format, transparently decompressible by decompress().

Additional cleanup

  • Flatten UnifiedTask from single-variant enum to type alias (usize, usize)
  • Simplify complete_task_lifecycle (remove dead next_task parameter, always None)
  • Remove dead GPU telemetry methods and atomic fields from LocalSchedulerStats
  • Add criterion benchmark (gpu_parallel_vs_streaming) comparing all four paths
  • Update docs: CLAUDE.md, DESIGN.md, gpu-strategy.md, pipeline-architecture.md

Architecture after this PR

Path GPU handling
Single block (input ≤ block_size) compress_block respects Backend::WebGpu directly
Single thread (threads = 1) Sequential compress_block respects Backend::WebGpu directly
Multi-thread + CPU compress_parallel — CPU-only unified scheduler
Multi-thread + GPU Routes through compress_stream — GPU coordinator with adaptive backpressure

Test plan

  • All 694 tests pass (692 unit + 2 doc), clippy clean
  • Pre-commit hook passes (fmt, clippy, build, test)
  • Verified backpressure fix with instrumented streaming runs on Silesia corpus
  • Criterion benchmark confirms parallel GPU path no longer bottlenecks (was 568ms → now 2.97ms, uses CPU-only path)
  • Verify GPU streaming throughput on target hardware with cargo bench -- gpu_par_vs_stream

🤖 Generated with Claude Code

…ng backpressure

The parallel GPU coordinator serialized entropy encoding on one thread,
bottlenecking at 28 MiB/s. The streaming path's GPU coordinator with
adaptive backpressure is the correct architecture for GPU acceleration.

Changes:
- Remove GPU coordinator from parallel.rs (~900 lines): GpuRequest enum,
  StageGpu/FusedGpu task variants, gpu_fused_span, pressure_inc/dec,
  should_route_block_to_gpu_stage0, complete_gpu_stage, and all GPU
  channel/routing logic. Parallel scheduler is now CPU-only.
- Fix one-way backpressure ratchet in streaming.rs: coordinator now
  decrements pressure by batch_len on completion, preventing permanent
  GPU lockout (routing went from ~16% to ~62% on mozilla corpus).
- Honor Backend::WebGpu in compress_with_options: multi-block GPU
  requests route through compress_stream with in-memory Cursor I/O,
  producing framed-format output (transparently decompressible).
- Flatten UnifiedTask from single-variant enum to type alias.
- Simplify complete_task_lifecycle (remove dead next_task parameter).
- Remove dead GPU telemetry methods and atomic fields from
  LocalSchedulerStats; retain public UnifiedSchedulerStats fields for
  API stability.
- Add criterion benchmark comparing parallel vs streaming GPU paths.
- Update CLAUDE.md, DESIGN.md, gpu-strategy.md, pipeline-architecture.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ChrisLundquist ChrisLundquist force-pushed the claude/reverent-wright branch from 469c061 to 4387364 Compare March 12, 2026 06:29
@ChrisLundquist ChrisLundquist merged commit 3287348 into master Mar 12, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant