Add enumerate to parallel iterators#127
Conversation
orxfun
left a comment
There was a problem hiding this comment.
Unfortunately, this does not provide the desired behavior. Please try the following test:
#[test]
fn enumerate_on_filter() {
let input: Vec<_> = (0..10).collect();
// # seq
use crate::*;
let par = input.iter().copied().filter(|x| x % 2 == 0);
let enum_par = par.enumerate();
let seq_result: Vec<_> = enum_par.collect();
// # rayon: does not compile
// use rayon::iter::*;
// let par = input.par_iter().copied().filter(|x| x % 2 == 0);
// let enum_par = par.enumerate();
// let rayon_result: Vec<_> = enum_par.collect();
// # orx
let par = input.par().copied().filter(|x| x % 2 == 0);
let enum_par = par.enumerate();
let orx_result: Vec<_> = enum_par.collect();
assert_eq!(orx_result, seq_result);
}|
Hi @TechnoPorg . Thanks a lot for the PR. But unfortunately, we cannot get the required behavior with this. There are two potential fixes. Why it doesn't workIt works whenever we are using
Potential Fix 1 (Hard)We can still post-compute the indices. This requires some performance optimizations such that:
I have some ideas on this and planning to test it out. But have to implement and see the benchmarks to make sure. Potential Fix 2 (Easy)We need another trait, say
In this case, indices from Notice that in the example test above, Wrap up:)I will work on Potential Fix 1, trying to plan for a time for it. If you are interested, you might take over Potential Fix 2. This could already enable |
|
And on wild-linker/wild#1413, firstly thank you for all your great work! |
9f384d8 to
5123bc6
Compare
|
I've redone this MR with Option 1 as proposed above and a significantly reduced scope. The new trait is called ExactSizeParIter, and |
|
Thanks a lot @TechnoPorg ! That's a very good point. I hadn't thought of this complexity of implementing it on On the other hand, it is still a limitation which might be a bit more problematic in some cases. One example would be not being able to I have one last comment on the name. It is actually not about knowing the exact size of the iterator that enables enumerating. It is the one-to-one match between input and output, even when we don't know the length of the input. For instance, the following works nicely on this branch: #[test]
fn enumerate_on_non_exact_size_iter() {
let input: Vec<_> = (0..5).collect();
// not an exact-sized iter
let iter = input.into_iter().filter(|x| *x != 2);
assert_ne!(iter.size_hint().0, iter.size_hint().1.unwrap());
// not an exact-sized par
let par = iter.iter_into_par();
// we can still enumerate
let values: Vec<_> = par.enumerate().collect();
assert_eq!(values, vec![(0, 0), (1, 1), (2, 3), (3, 4)]);
}Since it doesn't match the definition of the standard One thought: This is a minimal trait. We have similar traits in standard library names of which are simply verbs, such as |
|
Sorry for the delay in getting back to you on this! I like your name suggestion of As for the method chaining limitation, what the standard library and rayon both do is that they return known types ( |
|
For the naming, I am a bit more inclined towards Thanks a lot for pointing to the solution for extending enumeration from So yes, I fully agree with the direction you suggest. But as you said it will be a larger refactoring. I recommend to finish and merge this PR first to already enable functionality, and have the larger refactoring at a separate branch. |
5123bc6 to
64f3683
Compare
That makes sense. I've just pushed what I hope will be the final version of this PR implementing the basic functionality, and I'll open an issue now to track the eventual refactor. |
|
Happy to help! Unfortunately, though, I will not be able to contribute much for the next few months, though, as things have gotten quite busy on my end. |
|
You helped a lot @TechnoPorg, and it is straightforward to take over the rest of enumeration, thanks to your work. Feel free to visit again when you need a break, I think we'll have more and hopefully fun issues in this repo. |
Closes #35 as part of my ongoing effort to port wild to orx-parallel (wild-linker/wild#1413). Let me know if you think the tests I've added here are sufficient.
If you have time, I'd also appreciate your input on the PR I linked above, as performance seems to have gone down in some cases when using orx-parallel, so I'm likely not using it optimally.