Skip to content

Add benchmarks for new OQueues#212

Open
aanyas72 wants to merge 23 commits intomainfrom
oqueue-bench
Open

Add benchmarks for new OQueues#212
aanyas72 wants to merge 23 commits intomainfrom
oqueue-bench

Conversation

@aanyas72
Copy link
Copy Markdown
Contributor

I added benchmarks for mixed_bench, consume_bench, produce_bench, weak_obs_bench, and strong_obs_bench. I will add the fixes for the scaling benchmarks next.

@aanyas72 aanyas72 requested review from aneeshdurg and arthurp April 17, 2026 22:48
@aanyas72 aanyas72 requested a review from a team as a code owner April 17, 2026 22:48
Comment thread kernel/src/benchmarks/README.md Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this is useful. If there are parameters, document them on the types or functions that take them. Or if that's not practical (because they show up similarly in multiple places) document them in the module docs.

Copy link
Copy Markdown
Contributor

@arthurp arthurp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed a few things that appear to be a problem from the previous code. @aneeshdurg can you look at them?

Given there appear to be some inconsistencies. I think it's pretty important to factor out the boiler plate around the tests. If you have trouble getting the Rust lambdas to work for you, tell me and I can help. There are some potentially confusing trait bounds that come up around lambdas. The refactored boiler plate should be usable in all the benchmarks, new and old.

Comment thread kernel/src/benchmarks/oqueue.rs Outdated
Comment on lines +401 to +472
.cpu_affinity(cpu_set)
.spawn();
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could extract the loop and other shared boiler plate (for creating and placing the threads and looping) into a function which takes impl FnMut() as an argument and calls it inside the loop. The benchmarks would then pass a closure with their specific produce call. You might be able to use the same one for consume too.

(It would need to take a couple of extra parameters, like the prefix for the name in the printed message.)

Comment thread kernel/src/benchmarks/oqueue.rs Outdated
let producer = q.attach_ref_producer().unwrap();
move || {
barrier.fetch_sub(1, Ordering::Acquire);
while barrier.load(Ordering::Relaxed) > 0 {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, this should actually have call to spin_loop in it. It helps the CPU optimize spin loops in multiple different ways. @aneeshdurg? Did you leave this out intentionally?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember if it was intentional or if I just didn't bother. Worth adding it and seeing what happens.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unlikely to change all that much. But it is the "correct" way to do it. If we are trying to get the last 0.5% at some point we can try turning it off again.

Comment thread kernel/src/benchmarks/oqueue.rs Outdated
let barrier = barrier.clone();
move || {
barrier.fetch_sub(1, Ordering::Acquire);
// ostd::task::Task::yield_now();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented code unless there is a specific reason for it.

Comment thread kernel/src/benchmarks/oqueue.rs Outdated
.cpu_affinity(cpu_set)
.spawn();

// Start all consumers
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is wrong. It's strong observers.

Comment on lines 805 to 812
let weak_observer = q.attach_consumer().unwrap();
let barrier = barrier.clone();
move || {
barrier.fetch_sub(1, Ordering::Acquire);
while barrier.load(Ordering::Relaxed) > 0 {}
let mut cnt = 0;
for _ in 0..(2 * N_MESSAGES_PER_THREAD) {
cnt += weak_observer.consume();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aneeshdurg. What's going on with the consumer named "weak_observer"? Was this in there for any of our benchmark results?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was so I could run the MPMC baseline in the same function. If you see the if above, the OQueue implementations will use weak observers while the normal MPMC queue uses a consumer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use a comment in the code, and probably a different variable name.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea. Since the code isn't actually reused with a macro or anything. Let's definitely use the correct name and a comment. This is and or maybe should be the topic for another PR.

Comment thread kernel/src/benchmarks/oqueue.rs
Copy link
Copy Markdown
Contributor

@aneeshdurg aneeshdurg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmarks themselves look good!

input: Option<OQueueNewBenchmarkInput>,
}

struct OQueueNewBenchmarkInput {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of calling the "new" impl "new", I'd rather we called the old impl "legacy" for consistency,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aanyas72 You added "legacy" but didn't remove "new" as was implied here. The reason to do this is that when we remove legacy, we don't have to change the name of all the "new" stuff. This matches the naming approach in ostd.

Comment on lines +1392 to +1393
WeakObs,
StrongObs,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Obs/Observer

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aanyas72 Please address

// Start all producers
for tid in 0..n_threads_per_type {
let mut cpu_set = ostd::cpu::set::CpuSet::new_empty();
cpu_set.add(ostd::cpu::CpuId::try_from(tid + 1).unwrap());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we should assert in the benchmark setup that there are sufficient CPUs attached. Can be a separate PR though.

arthurp and others added 8 commits April 25, 2026 00:20
This does NOT enable the same feature for LTP. Problems with it's launcher prevented an easy way to do it, and the LTP is out of date, so a correct solution will wait until/if we update.
This removes our dependency on core2 and replaces it with no_std_io2.

Initially there was an upstream issue with libflate carrying a core2 dependency, but libflate fixed that with a new 2.3 release.

NOTE: For context, the maintainer of core2 yanked it from crates.io. This caused all unlocked builds with dependencies on it to fail, not just for us, but across the internet.
Previously, these error types had context, but did not display it. This meant enhanced logs didn't get stack traces even when they were being captured. This corrects that. A global via the `ostd_error` macro may be possible, but would not be trivial to implement and Snafu recommends explicit error messages anyway.
This changes the data capture machinery:
* Use [CBOR](https://cbor.io/). It is a self describing format generally similar to JSON. This is selected not because the format is better than others, but because the Rust library supports no-std and is stable. I evaluated message pack, and BSON. CBOR has a Python implementation. Notably, we are not generating CBOR lists, instead we are generating a series of CBOR values directly into the file.
* Update the output format to include a CBOR-based directory of the files in the partition image.
* Added a global function to create new files within the output images. This is provided for both new and legacy OQueues.
* Run environment support for the above.
* A python library for reading the data generated by all the above.
Comment on lines +451 to +456
mut setup: Setup,
on_complete: Done,
) where
Setup: FnMut() -> Work,
Work: FnMut() + Send + 'static,
Done: Fn() + Send + Sync + 'static,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use an impl type here: on_complete: impl Fn() + Send + Sync + 'static. You may be able to do the same for Setup but it's return type may make that more complex.

Comment thread kernel/src/benchmarks/oqueue.rs Outdated
Comment on lines +461 to +463
let barrier = barrier.clone();
let on_complete = on_complete.clone();
let completed = completed.clone();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why clone these? They are only used once.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I try to use the default values it gives an error: error[E0382]: use of moved value: on_complete

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Huh. I don't see the other usage, but whatever. The clone is fine.

@arthurp
Copy link
Copy Markdown
Contributor

arthurp commented Apr 30, 2026

@aanyas72 I think you are going to need to clean up your merge. You conflict now. However, I think a local git rebase will correctly fix it:

git fetch                 # To get the current origin/main
git rebase origin/main    # To automatically rebase the current branch onto that

Git rebase is smart enough to drop the commits from main that you ended up putting in the middle of your history (it detected them as cherry-picks which is more or less correct).

Once you have the cleaned up branch. I'll take a final look and approve it.

@aanyas72
Copy link
Copy Markdown
Contributor Author

Sounds good, I'll get that done by tomorrow

aanyas72 and others added 3 commits May 1, 2026 04:26
#214)

This adds a scheduling event OQueue and support for capturing it. Unlike most OQueues this one must be enabled by a feature. See the documentation file.
This adds path metadata to OQueues in a way that it can be accessed via the OQueueRef.
Copy link
Copy Markdown
Contributor

@arthurp arthurp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future do the rebase and force push again. This diff is confusing, and git rebase main fixes it entirely.

Overall, I'd like you to carefully read this and look for inconsistencies and things that could be cleaned up. I think you missed some things. See the comments, but also look for similar patterns elsewhere in the code.

Comment on lines +1392 to +1393
WeakObs,
StrongObs,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aanyas72 Please address

OQueueRef::new_anonymous(2 << 20)
}

fn get_consumable_oq(&self) -> ConsumableOQueueRef<u64> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_oqueue

input: Option<OQueueNewBenchmarkInput>,
}

struct OQueueNewBenchmarkInput {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aanyas72 You added "legacy" but didn't remove "new" as was implied here. The reason to do this is that when we remove legacy, we don't have to change the name of all the "new" stuff. This matches the naming approach in ostd.

}

fn strong_obs_bench(
fn weak_obs_bench<Q: ConsumableOQueue<u64>>(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weak_observer_bench or weak_observer_benchmark

Comment on lines +875 to +891
let mut cpu_set = ostd::cpu::set::CpuSet::new_empty();
cpu_set.add(ostd::cpu::CpuId::try_from(tid + 1).unwrap());
ThreadOptions::new({
let completed = completed.clone();
let producer = q.attach_value_producer().unwrap();
let barrier = barrier.clone();
move || {
barrier.fetch_sub(1, Ordering::Acquire);
while barrier.load(Ordering::Relaxed) > 0 {}
for _ in 0..(2 * N_MESSAGES_PER_THREAD) {
producer.produce(0);
}
completed.fetch_add(1, Ordering::Relaxed);
}
})
.cpu_affinity(cpu_set)
.spawn();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't these and similar benchmarks in the file not use the new thread spawning and loop handling utilities?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants