Skip to content

Feat: Replaced the old lockfree queue with the moodycamel ConcurrentQ…#7056

Merged
hkaiser merged 1 commit into
TheHPXProject:masterfrom
toxicteddy00077:fea/thread-queue-opt
Mar 21, 2026
Merged

Feat: Replaced the old lockfree queue with the moodycamel ConcurrentQ…#7056
hkaiser merged 1 commit into
TheHPXProject:masterfrom
toxicteddy00077:fea/thread-queue-opt

Conversation

@toxicteddy00077
Copy link
Copy Markdown
Contributor

Description

Closes #7050

The staged queue in thread_queue.hpp stored task_description* as a pointer (virtue of boost::lockfree::queue requiring copyable types). Thus we are forced to have one malloc/free per normal-priority task the default path for hpx::async which can become expensive for larger task size.

In this PR, I replace the staged queue backend with the moodycamel::ConcurrentQueue (concurrentqueue_fifo), the pattern for which already exists in the thread_queue_mc.hpp

@StellarBot
Copy link
Copy Markdown
Collaborator

Can one of the admins verify this patch?

…ueue

Signed-off-by: Aryamaan Singh <aryamaan.singh05@gmail.com>
@hkaiser
Copy link
Copy Markdown
Contributor

hkaiser commented Mar 18, 2026

Are you able to compare the performance before and after your change? Is there a significant difference?

@toxicteddy00077
Copy link
Copy Markdown
Contributor Author

toxicteddy00077 commented Mar 19, 2026

@hkaiser I would say there is no extreme performance improvement , with varying speedups for different API, but mostly averaging around ~1.45. The improvements are there as you can see on the sent test output. on image left I have the 'before' performance, and on the right 'after'

Screenshot From 2026-03-19 09-55-31

@hkaiser
Copy link
Copy Markdown
Contributor

hkaiser commented Mar 19, 2026

@hkaiser I would say there is no extreme performance improvement , with varying speedups for different API, but mostly averaging around ~1.45. The improvements are there as you can see on the sent test output. on image left I have the 'before' performance, and on the right 'after'

Frankly, that's more than I expected. Thanks!

Copy link
Copy Markdown
Contributor

@hkaiser hkaiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@hkaiser hkaiser added this to the 2.0.0 milestone Mar 19, 2026
@Pansysk75
Copy link
Copy Markdown
Contributor

We did have some issues in the past when using the MoodyCamel queue in the scheduler. It does not guarantee FIFO ordering for items inserted by different threads, which can create some deadlocks (at least, in some of our tests).

I'm not sure if it will be an issue here, but I thought it would be worth bringing up.

Relevant comment with more explanation:
#6428 (comment)

@hkaiser
Copy link
Copy Markdown
Contributor

hkaiser commented Mar 19, 2026

We did have some issues in the past when using the MoodyCamel queue in the scheduler. It does not guarantee FIFO ordering for items inserted by different threads, which can create some deadlocks (at least, in some of our tests).

I'm not sure if it will be an issue here, but I thought it would be worth bringing up.

Relevant comment with more explanation: #6428 (comment)

That might not be a problem for the queue that holds the task descriptions, those don't depend on any strict ordering.

@hkaiser
Copy link
Copy Markdown
Contributor

hkaiser commented Mar 19, 2026

@hkaiser I would say there is no extreme performance improvement , with varying speedups for different API, but mostly averaging around ~1.45. The improvements are there as you can see on the sent test output. on image left I have the 'before' performance, and on the right 'after'

Frankly, that's more than I expected. Thanks!

@toxicteddy00077 May I ask you to run the test more than once (100 times?) and collect the averages?

@toxicteddy00077
Copy link
Copy Markdown
Contributor Author

@hkaiser I have distilled the output over 100 runs(see image). I am also attatching the csv files which store output per run.
improved.csv
basleline.csv

I have compared the averages as you can see below

image

@hkaiser hkaiser merged commit 05d7290 into TheHPXProject:master Mar 21, 2026
110 of 113 checks passed
@hkaiser
Copy link
Copy Markdown
Contributor

hkaiser commented Mar 21, 2026

@toxicteddy00077 Thanks for looking into this! The results are encouraging and unexpected!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

thread_queue: per-task heap allocation in staged queue could be further optimised

4 participants