Fix concurrent partition draining in write execution to prevent memory accumulation by Copilot · Pull Request #45 · relativityone/delta-rs

Copilot · 2025-12-22T09:11:37Z

Description

Partition streams were created sequentially before spawning async consumers, causing repartition producers to fill channels for idle partitions. This manifests as memory buildup in repartition queues with some partitions at 0 bytes while others explode.

Changes

Moved execute(i, task_ctx) inside spawned tasks: Ensures all partition streams start concurrently rather than sequentially registering with repartition logic before consumption begins
Applied to both write paths: Standard write and CDC write paths in write_execution_plan_v2

Before:

for i in 0..partition_count {
    let mut stream = inner_plan.execute(i, task_ctx)?;  // Sequential registration
    tokio::task::spawn(async move { /* consume stream */ });
}

After:

for i in 0..partition_count {
    tokio::task::spawn(async move {
        let mut stream = inner_plan.execute(i, task_ctx)?;  // Concurrent execution
        /* consume stream */
    });
}

Related Issue(s)

N/A

Documentation

N/A

Original prompt

Make sure you drain all output partitions concurrently for Upsert operation.
If your application (or the Delta upsert pipeline) consumes partitions sequentially, repartition producers can fill channels for “not currently drained” partitions, causing memory to accumulate in the repartition queues.

This matches the symptom “some partitions at 0 bytes” while others explode: it can happen when some consumers aren’t polling, so only the partitions being drained make progress and the others build up/backpressure patterns get weird.

Fix
Ensure you execute/collect all output partitions concurrently, not one-by-one.
If you’re using something like collect(partition) in a loop, replace with JoinSet / FuturesUnordered over all partitions and await them together (similar to what DataFusion’s own tests do in this file).
If the upsert path is single-threadedly pulling partitions, this is especially important.

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

fix: handle edge in schema adapter for single missing field

fix: use provided sessionState in method argument

fix: scan time was always 0 for merge metrics

Co-authored-by: adampolomski <10196659+adampolomski@users.noreply.github.com>

…dability-metrics Refactor upsert.rs for improved readability, idiomatic Rust, and meaningful metrics

This reverts commit 9a5a201.

…hub.com>

… detection Instead of caching the conflicts DataFrame to work around DataFusion's Dictionary encoding schema mismatch, implement manual join logic: - Collect target DataFrame with join keys + file paths (small result) - Collect distinct source join keys (small result) - Perform join in memory using HashSet for efficient lookup - Extract file paths that have matching keys This avoids materializing large DataFrames while still handling the schema inconsistency by working entirely in memory on small, already-collected data. Memory impact: Only materializes join keys + file paths (one row per conflicting file), not full row data. Much more efficient than caching full DataFrames. Co-authored-by: adampolomski <10196659+adampolomski@users.noreply.github.com>

The previous approach incorrectly materialized the entire target DataFrame which could be billions of rows. The corrected approach: 1. Keeps target_df and source lazy (not materialized) 2. Performs inner join in DataFusion (lazy operation) 3. Selects only minimal columns (join keys + file path, not full rows) 4. Collects ONLY the join result which is small (only conflicting rows) Memory footprint: For a table with billions of rows but only thousands of conflicts, we materialize only thousands of rows with minimal columns, not billions of full rows. The join result is inherently small because it contains only rows where join keys match between source and target (actual conflicts). Co-authored-by: adampolomski <10196659+adampolomski@users.noreply.github.com>

…rics - Changed extract_conflicting_filenames to extract_conflicts_dataframe to return a DataFrame - Added extract_file_paths_from_conflicts to extract file paths from the cached DataFrame - Cache the conflicts DataFrame for reuse in multiple places - Added num_conflicting_records field to UpsertMetrics - Count and report conflicting records in metrics - Updated tests to verify num_conflicting_records metric Co-authored-by: adampolomski <10196659+adampolomski@users.noreply.github.com>

Co-authored-by: adampolomski <10196659+adampolomski@users.noreply.github.com>

…sert' into copilot/extract-dataframe-for-upsert # Conflicts: # crates/core/src/operations/upsert.rs

…or-upsert Extract conflicts DataFrame and add conflicting records metric to upsert

…or-upsert Copilot/extract dataframe for upsert

…reaming

Upsert performance and monitoring improvements

Move partition stream creation (execute call) inside spawned tasks to ensure all output partitions are drained concurrently. This prevents memory accumulation in repartition queues when some partitions are not actively being consumed. The fix applies to both the standard write path and the CDC write path. All upsert tests continue to pass. Co-authored-by: adampolomski <10196659+adampolomski@users.noreply.github.com>

alexwilcoxson-rel and others added 30 commits May 19, 2025 20:59

fix: handle edge in schema adapter for single missing field

9a50900

Merge pull request #31 from relativityone/empty-batch

25167bb

fix: handle edge in schema adapter for single missing field

fix: use provided sessionState in method argument

ed0f14e

Merge pull request #32 from relativityone/use_state_in_merge

b9b4a57

fix: use provided sessionState in method argument

fix: scan time was always 0 for merge metrics

ffb949b

Merge pull request #34 from relativityone/merge-scan-time

6ff0696

fix: scan time was always 0 for merge metrics

Upsert initial implementation

54c8628

Upsert initial implementation

6774bfd

Trying workspace filter

10d62f4

Trying workspace filter

66b4ecb

Trying workspace filter

eb1d247

Trying workspace filter

861dcbd

Trying workspace filter

27f050a

Removing old files from partition ... maybe?

8bd8ca3

Conflict check.

f76f87e

Initial plan

5df6922

Refactor upsert.rs for improved readability and add metrics

1751830

Co-authored-by: adampolomski <10196659+adampolomski@users.noreply.github.com>

Remove hardcoded workspace_id assumptions and make upsert generic

7ae023d

Co-authored-by: adampolomski <10196659+adampolomski@users.noreply.github.com>

Merge pull request #36 from relativityone/copilot/refactor-upsert-rea…

750113e

…dability-metrics Refactor upsert.rs for improved readability, idiomatic Rust, and meaningful metrics

Fixed tests.

7a82ba4

Cleaned up metrics and tests.

d4bcea7

Removed unnecessary Add struct creation for Remove actions.

804d150

feat: removed metrics cloning

004b5e3

feat: removed more cloning

a97bdb9

feat: reworked session handling

b302adf

feat: handling more partition column types, error on unhandled

1eef88f

feat: revert formatting changes in unrelated file

439aa04

feat: removed print

e232cc6

feat: execution time metric

22379ff

feat: one more test case

18a7773

adampolomski and others added 22 commits December 12, 2025 00:27

Revert "feat: removed useless caching"

76fcaf2

This reverts commit 9a5a201.

Co-authored-by: adampolomski <10196659+adampolomski@users.noreply.git…

ef24086

…hub.com>

feat: removed unnecessary columns

e80df96

feat: review comments

a043e25

feat: fetch only distinct files

b35671f

feat: fetch only distinct files

f68aa28

Initial plan

73bfb8a

Remove unnecessary clone when counting conflicts DataFrame

6f1feed

Co-authored-by: adampolomski <10196659+adampolomski@users.noreply.github.com>

Improve code clarity by renaming cached DataFrame variable

8df2d5f

Co-authored-by: adampolomski <10196659+adampolomski@users.noreply.github.com>

feat: moved stuff around

28c96e1

Merge remote-tracking branch 'origin/copilot/extract-dataframe-for-up…

ada3410

…sert' into copilot/extract-dataframe-for-upsert # Conflicts: # crates/core/src/operations/upsert.rs

Merge pull request #42 from relativityone/copilot/extract-dataframe-f…

0267821

…or-upsert Extract conflicts DataFrame and add conflicting records metric to upsert

feat: optimised join

8155c78

feat: optimised join

fdcb53a

Merge pull request #44 from relativityone/copilot/extract-dataframe-f…

6e07224

…or-upsert Copilot/extract dataframe for upsert

feat: optimised join

122e27c

Merge remote-tracking branch 'origin/upsert-streaming' into upsert-st…

f9a5ce4

…reaming

Merge pull request #43 from relativityone/upsert-streaming

7233ee0

Upsert performance and monitoring improvements

Initial plan

ea7ec89

Copilot AI assigned Copilot and adampolomski Dec 22, 2025

Copilot started work on behalf of adampolomski December 22, 2025 09:12 View session

Copilot AI changed the title ~~[WIP] Fix concurrent output partition drainage for upsert operation~~ Fix concurrent partition draining in write execution to prevent memory accumulation Dec 22, 2025

Copilot AI requested a review from adampolomski December 22, 2025 09:30

Copilot finished work on behalf of adampolomski December 22, 2025 09:30

mandrush force-pushed the upsert branch from 7233ee0 to 2976758 Compare February 16, 2026 12:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix concurrent partition draining in write execution to prevent memory accumulation#45

Fix concurrent partition draining in write execution to prevent memory accumulation#45
Copilot wants to merge 69 commits into
upsertfrom
copilot/fix-partition-drain-concurrency

Copilot AI commented Dec 22, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Copilot AI commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Related Issue(s)

Documentation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Copilot AI commented Dec 22, 2025 •

edited

Loading