Skip to content

persist: rip lgbytes/lgalloc out of blob and arrow paths#36307

Open
antiguru wants to merge 1 commit intoMaterializeInc:mainfrom
antiguru:lgbytes-rip
Open

persist: rip lgbytes/lgalloc out of blob and arrow paths#36307
antiguru wants to merge 1 commit intoMaterializeInc:mainfrom
antiguru:lgbytes-rip

Conversation

@antiguru
Copy link
Copy Markdown
Member

Lgalloc is disabled in practice, so the memcpy from SDK `Bytes` (azure) or arrow `Buffer`s into freshly-allocated `MetricsRegion`s falls back to the heap and is pure overhead.
This PR streams SDK `Bytes` straight into `SegmentedBytes` for the azure get path and stops reallocating arrow buffers in the parquet decode path.
The parquet null-buffer workaround stays (now in `rebuild_data`); it's independent of lgalloc.

Follow-up to #36305 which did the same for S3.

Motivation

Reduces CPU and allocator pressure on every persist read.

Tips for reviewer

The behavior changes are in `fetch_chunk` (`src/persist/src/azure.rs`) and `rebuild_data`/`realloc_buffer` (`src/persist/src/indexed/columnar/arrow.rs`).
Everything else is dead-code removal that follows: `mz_ore::lgbytes` module is deleted, `LgBytesMetrics` and `MetricsRegion` go away, and the now-unused `cfg`/`is_cc_active` plumbing in `ColumnarMetrics`, `AzureBlobConfig`, and `BlobConfig::try_from` is dropped.

Checklist

  • This PR has adequate test coverage / QA is not needed.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a `T-proto` label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

Lgalloc is disabled in practice, so the memcpy from SDK `Bytes` (azure)
or arrow `Buffer`s into freshly-allocated `MetricsRegion`s falls back to
the heap and is pure overhead. Stream SDK `Bytes` straight into
`SegmentedBytes` for azure and stop reallocating arrow buffers. Keep
the parquet null-buffer workaround in `realloc_data` (now `rebuild_data`)
since it's independent of lgalloc.

Removes:
* `LgBytesMetrics` and `MetricsRegion` from `mz_ore::lgbytes` (delete the
  whole module).
* `S3BlobMetrics::lgbytes`.
* `ColumnarMetrics::{lgbytes_arrow, cfg, is_cc_active}` and the
  associated constructor params.
* `BlobConfig::try_from`'s `cfg` param, `AzureBlobConfig`'s `cfg` field,
  and the `persist_enable_arrow_lgalloc_{cc,noncc}_sizes` dyncfgs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@antiguru antiguru requested a review from a team as a code owner April 28, 2026 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants