Small C++23 + OpenMP experiments repo with two executables:
- omp_1: OpenMP scheduling / barrier behavior demos (
omp for,nowait, manual partitioning) - omp_2: 1D stencil-style update showing:
- a deliberately racy in-place OpenMP version (nondeterministic),
- two correct deterministic versions: next+swap and ghost in-place.
Install toolchain + build tools:
sudo pacman -S --needed base-devel gcc makesudo pacman -S --needed base-devel clang make
# OpenMP runtime for clang (often needed)
sudo pacman -S --needed libompNote: With GCC, OpenMP support is typically available out of the box via
-fopenmp. With Clang, you may needlibompinstalled (and still compile with-fopenmp).
make -jUseful overrides:
make -j CXX=clang++
make -j CXXFLAGS='-O3 -std=c++23 -Wall -Wextra -Wpedantic -Iinclude -fopenmp'Clean:
make clean./omp_1Control threads:
export OMP_NUM_THREADS=8
./omp_1Usage:
./omp_2 [size] [iters] [print]
# size default: 256
# iters default: 64
# print default: 1 (set 0 to disable output)Examples:
# Compare all variants side-by-side (stdout) + timings (stderr)
OMP_NUM_THREADS=8 ./omp_2 256 64 1
# Benchmark-only (recommended): disable printing
OMP_NUM_THREADS=8 ./omp_2 50000000 20 0Each iteration updates:
- for
i = 0..size-2:v[i] = (v[i] + v[i+1]) / 2 v[size-1]is unchanged
This is a loop-carried dependence (each i depends on the old value of i+1).
-
WRONG in-place (racy under OpenMP)
Parallelizes the in-place loop directly. Threadimay readv[i+1]after threadi+1already overwrote it, so you mix “old” and “new” values depending on timing. Result: nondeterministic output. -
RIGHT next+swap (OpenMP-safe)
Reads fromvand writes to a separatenextbuffer, thenswaps buffers.swapitself is O(1); the real work is the per-element reads/writes. Deterministic. -
RIGHT ghost in-place (OpenMP-safe)
Each thread updates a contiguous chunk in-place, but caches a single “ghost” boundary value (old v[end]) before any writes, with barriers to enforce correctness. Deterministic.
- When
print=1,omp_2prints the three outputs side-by-side to stdout. - It always prints timings to stderr:
Time WRONG in-place : ... s
Time RIGHT next+swap : ... s
Time RIGHT ghost in-place : ... s
Tip: for benchmarking, disable printing and redirect stdout:
OMP_NUM_THREADS=8 ./omp_2 50000000 20 0 > /dev/nullinclude/demo/
argparse.hpp
omp1.hpp
omp2.hpp
omp_utils.hpp
src/
argparse.cpp
omp1.cpp
omp_1_main.cpp
omp2.cpp
omp_2_main.cpp
Makefile