EvaluateIPC

Here is the updated README.md content, incorporating your latest benchmark results for both the native macOS environment and the Linux Docker container.

EvaluateIPC

This repository evaluates the performance and scalability of various C++ concurrency and Inter-Process Communication (IPC) techniques. It provides a set of benchmarks comparing the C++ Standard Library, Boost.Asio, Facebook Folly, and the Seastar framework.

Benchmark Intent: The "Cache Calc" Pattern

The primary benchmark in this project simulates a high-concurrency scenario common in server-side applications: serialized computation interleaved with asynchronous IO waits.

The Workflow

Each benchmark iteration executes 1,000 tasks (default max_iter) with a concurrency level of 8 tasks (num_tasks). Each task follows this pattern:

Serialized Compute: Perform a calculation on a shared resource (an unordered_map cache) protected by a synchronization primitive such as a mutex, strand, or shard logic.
Simulated IO: Perform a non-blocking delay (8µs to 64µs) using asynchronous timers.
Serialized Compute: Perform a second calculation on the same shared resource.

Why this Pattern?

This pattern tests the efficiency of a framework's context-switching and task-scheduling capabilities. A high-performance framework should be able to overlap the "IO" periods of all 1,000 tasks, effectively parallelizing the wait time and ensuring the total benchmark time is dominated only by the serialized compute portions and framework overhead.

Implementation Details

C++ Standard Library:
BM_cachecalc_mutex: Uses manual std::thread management and a shared std::mutex.
BM_cachecalc_async: Uses std::async to launch tasks, highlighting the overhead of thread creation and destruction.
BM_cachecalc_simple_threadpool_class: Uses a custom ThreadPool class to manage worker lifecycles via condition variables.
Boost.Asio:
BM_cachecalc_boost_threadpool: Uses asio::thread_pool and asio::strand to serialize access to the cache map.
BM_cachecalc_boost_coroutine: Uses C++20 coroutines (asio::awaitable) to switch between the main pool and the strand for calculation and IO steps.
BM_cachecalc_boost_coroutine_strand: Runs both compute and simulated IO on the strand to reduce thread context switching.
BM_cachecalc_boost_coroutine_io_context: Runs everything on a single-thread io_context (no thread pool).
Facebook Folly:
BM_cachecalc_folly_future_parallel: Replicates the logic using folly::Future and StrandExecutor to chain asynchronous tasks.
BM_cachecalc_folly_parallel: Leverages folly::coro and StrandExecutor to achieve high-performance asynchronous execution.
Seastar:
BM_SeastarHash: Serializes all cache access to Shard 0 using smp::submit_to, simulating a central coordinator.
BM_SeastarShardedCacheParallel: A scalable, shared-nothing implementation using seastar::sharded where data is partitioned across all available cores, eliminating the central bottleneck.

Benchmark Results (MacBook Pro Apple M2)

The following results were captured on a MacBook Pro (Apple M2).

Native macOS (8 Cores)

These results show the overhead in a native environment without containerization.

Boost.Asio (`tests/boostbench`)

Benchmark	Time	CPU	Iterations
`BM_cachecalc_boost_threadpool/8`	2.17 ms	0.050 ms	10000
`BM_cachecalc_boost_threadpool/64`	2.06 ms	0.046 ms	10000
`BM_cachecalc_boost_coroutine/8`	2.93 ms	1.32 ms	631
`BM_cachecalc_boost_coroutine/64`	2.82 ms	1.17 ms	572
`BM_cachecalc_boost_coroutine_strand/8`	2.11 ms	0.154 ms	4584
`BM_cachecalc_boost_coroutine_strand/64`	2.09 ms	0.153 ms	4548
`BM_cachecalc_boost_coroutine_io_context/8`	2.03 ms	1.99 ms	358
`BM_cachecalc_boost_coroutine_io_context/64`	1.98 ms	1.95 ms	350

Facebook Folly (`tests/follybench`)

Benchmark	Time	CPU	Iterations
`BM_cachecalc_folly_future_parallel/8`	2.53 ms	0.226 ms	1000
`BM_cachecalc_folly_future_parallel/64`	2.48 ms	0.226 ms	3098
`BM_cachecalc_folly_parallel/8`	2.90 ms	0.006 ms	1000
`BM_cachecalc_folly_parallel/64`	2.93 ms	0.006 ms	1000

Standard Mutex & Async (`tests/mutexbench`)

Benchmark	Time	CPU	Iterations
`BM_sleep/8`	13.0 us	1.73 us	415551
`BM_sleep/64`	85.0 us	2.96 us	100000
`BM_sleep_devzero/8`	11.5 us	11.5 us	60083
`BM_sleep_devzero/64`	66.3 us	66.2 us	10573
`BM_cachecalc_only`	1.38 ms	1.38 ms	512
`BM_cachecalc/8`	13.1 ms	13.1 ms	53
`BM_cachecalc/64`	67.7 ms	67.6 ms	10
`BM_cachecalc_mutex/8`	7.30 ms	0.091 ms	1000
`BM_cachecalc_mutex/64`	13.7 ms	0.099 ms	1000
`BM_cachecalc_async/8`	11.3 ms	7.01 ms	92
`BM_cachecalc_async/64`	19.0 ms	9.10 ms	76
`BM_cachecalc_simple_threadpool_class/8`	8.46 ms	3.78 ms	192
`BM_cachecalc_simple_threadpool_class/64`	13.1 ms	3.34 ms	211

Docker Container (Linux, 4 Cores)

These results show the performance characteristics when constrained to 4 shards within a Docker container.

Seastar Framework (`tests/seastar/seastar_bench`)

Benchmark	Time	CPU	Iterations
`BM_SeastarHash/8`	1.25 ms	1.21 ms	586
`BM_SeastarHash/64`	1.31 ms	1.27 ms	569
`BM_SeastarShardedCacheParallel/8`	0.759 ms	0.720 ms	999
`BM_SeastarShardedCacheParallel/64`	0.773 ms	0.736 ms	957

Boost.Asio (`tests/boostbench`)

Benchmark	Time	CPU	Iterations
`BM_cachecalc_boost_threadpool/8`	2.58 ms	0.108 ms	1000
`BM_cachecalc_boost_threadpool/64`	2.33 ms	0.102 ms	6775
`BM_cachecalc_boost_coroutine/8`	3.32 ms	0.775 ms	908
`BM_cachecalc_boost_coroutine/64`	3.31 ms	0.797 ms	975
`BM_cachecalc_boost_coroutine_strand/8`	1.83 ms	0.239 ms	2861
`BM_cachecalc_boost_coroutine_strand/64`	1.84 ms	0.240 ms	2966
`BM_cachecalc_boost_coroutine_io_context/8`	1.68 ms	1.68 ms	400
`BM_cachecalc_boost_coroutine_io_context/64`	1.68 ms	1.68 ms	419

Facebook Folly (`tests/follybench`)

Benchmark	Time	CPU	Iterations
`BM_cachecalc_folly_future_parallel/8`	2.36 ms	0.450 ms	1549
`BM_cachecalc_folly_future_parallel/64`	2.34 ms	0.455 ms	1533
`BM_cachecalc_folly_parallel/8`	3.00 ms	0.027 ms	1000
`BM_cachecalc_folly_parallel/64`	2.95 ms	0.028 ms	1000

Standard Mutex & Async (`tests/mutexbench`)

Benchmark	Time	CPU	Iterations
`BM_sleep/8`	86.0 us	5.52 us	128021
`BM_sleep/64`	155 us	5.78 us	126059
`BM_sleep_devzero/8`	9.03 us	8.93 us	75674
`BM_sleep_devzero/64`	65.2 us	65.2 us	10784
`BM_cachecalc_only`	1.03 ms	1.03 ms	673
`BM_cachecalc/8`	10.1 ms	10.1 ms	68
`BM_cachecalc/64`	66.2 ms	66.1 ms	11
`BM_cachecalc_mutex/8`	4.50 ms	0.250 ms	1000
`BM_cachecalc_mutex/64`	21.0 ms	0.334 ms	1000
`BM_cachecalc_async/8`	14.5 ms	12.2 ms	48
`BM_cachecalc_async/64`	30.9 ms	15.7 ms	51
`BM_cachecalc_simple_threadpool_class/8`	5.91 ms	5.01 ms	150
`BM_cachecalc_simple_threadpool_class/64`	23.7 ms	20.0 ms	30

How to Run

Using Docker

From the root of this repository:

./bin/rundocker --build
./bin/rundocker

Inside the container:

/build# /workspace/bin/runcmakeindocker.sh
/build# make
/build# ./tests/mutexbench
/build# ./tests/boostbench
/build# ./tests/follybench
/build# ./tests/seastar/seastar_bench

Local Development (Ubuntu/OSX)

Ensure dependencies like libfast-float-dev, libgoogle-glog-dev, and liburing-dev are installed:

cd /workspace
./docker/cmake/install-dependencies.sh
cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_FOLLY=ON
cmake --build build

Debugging

Before debugging in GDB, disable verbose thread logs to keep the output readable:

set print thread-events off

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
bin		bin
docker/cmake		docker/cmake
folly		folly
ipcperf		ipcperf
julia		julia
src		src
tests		tests
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
WORKSPACE		WORKSPACE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EvaluateIPC

Benchmark Intent: The "Cache Calc" Pattern

The Workflow

Why this Pattern?

Implementation Details

Benchmark Results (MacBook Pro Apple M2)

Native macOS (8 Cores)

Boost.Asio (`tests/boostbench`)

Facebook Folly (`tests/follybench`)

Standard Mutex & Async (`tests/mutexbench`)

Docker Container (Linux, 4 Cores)

Seastar Framework (`tests/seastar/seastar_bench`)

Boost.Asio (`tests/boostbench`)

Facebook Folly (`tests/follybench`)

Standard Mutex & Async (`tests/mutexbench`)

How to Run

Using Docker

Local Development (Ubuntu/OSX)

Debugging

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EvaluateIPC

Benchmark Intent: The "Cache Calc" Pattern

The Workflow

Why this Pattern?

Implementation Details

Benchmark Results (MacBook Pro Apple M2)

Native macOS (8 Cores)

Boost.Asio (tests/boostbench)

Facebook Folly (tests/follybench)

Standard Mutex & Async (tests/mutexbench)

Docker Container (Linux, 4 Cores)

Seastar Framework (tests/seastar/seastar_bench)

Boost.Asio (tests/boostbench)

Facebook Folly (tests/follybench)

Standard Mutex & Async (tests/mutexbench)

How to Run

Using Docker

Local Development (Ubuntu/OSX)

Debugging

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Boost.Asio (`tests/boostbench`)

Facebook Folly (`tests/follybench`)

Standard Mutex & Async (`tests/mutexbench`)

Seastar Framework (`tests/seastar/seastar_bench`)

Boost.Asio (`tests/boostbench`)

Facebook Folly (`tests/follybench`)

Standard Mutex & Async (`tests/mutexbench`)

Packages