Add code review for PR #4 (multipass performance optimizations)#5
Draft
Copilot wants to merge 2 commits into
Draft
Add code review for PR #4 (multipass performance optimizations)#5Copilot wants to merge 2 commits into
Copilot wants to merge 2 commits into
Conversation
Not up to standards ⛔🔴 Issues
|
| Category | Results |
|---|---|
| BestPractice | 1 minor |
| CodeStyle | 7 minor |
TIP This summary will be updated as you push new changes. Give us feedback
There was a problem hiding this comment.
Copilot wasn't able to review any files in this pull request.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Agent-Logs-Url: https://github.com/KaHIP/FREIGHT/sessions/70ec7d09-80a2-4157-ba10-86a0c3d6872c Co-authored-by: schulzchristian <5701578+schulzchristian@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Review the latest pull request and add comments
Add code review for PR #4 (multipass performance optimizations)
Apr 2, 2026
schulzchristian
added a commit
that referenced
this pull request
Apr 2, 2026
… freight_graphs ~4.3x speedup for multipass streaming partitioning while preserving bit-identical partition output. Applied to both hypergraph and graph streaming partitioners. I/O optimizations: - Cache binary input across multipass iterations (ram_stream) - Skip readFirstLineStream on subsequent ram_stream passes - Use mmap with MAP_POPULATE for zero-copy file loading (macOS compat) - Bulk-read + parse from char buffer with pre-reserved vectors - Skip per-node gettimeofday syscalls in inner loop Evaluation optimizations (hypergraphs): - Use cached data for evaluation instead of re-reading file - Replace unordered_set with flat bitset for connectivity tracking - Skip unused metric per mode (cut-only or connectivity-only) - Specialize bitset check for k<=64 (single word) - Cheap CUT_NET count for first-pass cut evaluation - Compute first-pass connectivity incrementally (sequential post-pass) - Use local scratch vector for inter-pass eval to avoid save/restore - Skip evaluation when partition has converged Processing optimizations: - Bypass OMP critical sections when single-threaded - Eliminate valid_neighboring_nets vector for ram_stream (hypergraphs) - Prefetch stream_edges_assign entries in hot loops (hypergraphs) - Remove -fno-builtin-malloc flags - Use memset for all_blocks_to_keys initialization - Skip dead computations for default config - Add PGO support to CMakeLists.txt (GCC/Clang only) Safety fixes (from Copilot review #5): - Add #pragma omp atomic for nodes_moved in parallel loop - Move first-pass bitset computation out of parallel region - Check open() return before fstat() in mmap code - Add default initializers for evaluate_cut/evaluate_connectivity - Guard PGO CMake flags with compiler ID check
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR #4 introduces ~4.3x multipass speedup via mmap-based I/O caching, bitset connectivity, and reduced per-node overhead. This PR documents a full review identifying correctness and safety issues in those changes.
Critical: Race conditions in parallel loops (
freight.cpp)nodes_movedunsynchronized write:Fix:
#pragma omp atomicorstd::atomic<LongNodeID>.first_pass_bits/first_pass_connectivityconcurrent RMW:Multiple threads can race on the same
net_idxslot (bit-OR) and on the shareddoubleaccumulator. Fix:std::atomic<uint64_t>for bitset words and#pragma omp atomicfor the accumulator, or accumulate connectivity in a post-parallel pass over the bitset.Medium issues
open()unchecked beforefstat()(bothgraph_io_stream.h):fd == -1is silently passed tofstat, leavingst.st_sizeindeterminate before themmapcall.vertex_partitioning.h): both branches contain the same call — only the pragma differs; the duplication should be collapsed.CMakeLists.txt):-fprofile-generate/-fprofile-useneed aCMAKE_CXX_COMPILER_ID MATCHES "GNU|Clang"guard.evaluate_cut/evaluate_connectivitylack default initializers inpartition_config.h— UB if the struct is used beforestandard()runs..size()call in prefetch guard (graph_io_stream.cpp): guard usesline_numbers.size()instead of the pre-computedline_size.best_cut/best_con/best_qap/best_pinsset but never read — the intended skip of the final re-evaluation is incomplete.Minor
config.remaining_stream_nodes--inreadNodeOnePass_netllacks a comment explaining why it is safe in all modes.cursor = curr_nodealias is redundant; usecurr_nodedirectly.Warning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
https://api.github.com/graphql/usr/bin/gh gh auth status(http block)https://api.github.com/repos/KaHIP/FREIGHT/pulls/4/usr/bin/curl curl -s -H Authorization: token ****** REDACTED(http block)If you need me to access, download, or install something from one of these locations, you can either:
Original prompt