-
Notifications
You must be signed in to change notification settings - Fork 2
Fixed #26 Use less-aggressive planning flag as default #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Review these changes at https://app.gitnotebooks.com/stumpy-dev/sliding_dot_product/pull/27 |
So, what do these plots tell us? All that I can conclude is that |
Should have added "WIP". Apologies. Was planning to update.
Right. And, in some of those cases, pocketfft is faster than |
Just sharing here before I forget: Section 5.3 "How Many Threads to Use?" of FFTW documentation says:
So.. I think it is better to use |
Note that
I wonder how Matlab is able to achieve this dynamically 🤔
I like this visualization. It makes it very easy to understand!
Based on these results alone, it seems that it's not even worth using multithreaded Matlab because having 8 cores doesn't even help it be anywhere close to being 2x faster than pyfftw. Is that the correct interpretation? Also, may I ask where you got the Matlab numbers from? |
What happens when |
👍
I have been doing some works locally regarding this, and I tried different approaches:
The best approach is probably the first one, which is what you suggested. We just compute the proper number of threads for input sizes that are So, for Btw, I tried the last approach (i.e. using the hard-coded cutoff 2**15) and got the following figure.
So this result shows that "pyfftw_sdp" has a good performance when we use multi-threading for large input sizes.
Couldn't find anything before. Will provide an update if I find something.
Our comparison may not be fair from MATLAB's perspective because we are using "RFFT". The only reason I am comparing
By running MATLAB_spd code (see DAMP_2_0.m) on MATLAB online using the code provided in the attached zip file. It would be great if you could review the code that I used for timing! This can help us make sure there is no silly mistake in my numbers. After I collect the running times saved as .mat & .npy files, I plot the performance-ratio figure using the following code:
Haven't thought about it yet! I was wondering if we should wrap up |
Do you mean that Matlab is not doing |
|
Update: Please ignore the figures (R)FFT
Right. MATALB does not have Rfft function. Although MATLAB fft on a real-valued array is faster than MATLAB fft on complex-valued array (probably because it uses some rfft trick under the hood) , it is still less efficient than (pyFFTW's) RFFT for large arrays. The following figure shows the performance ratio of PyFFTW's RFFT (with 1 logical thread) relative to MATLAB's fft 1 physical core (blue) and relative to MATLAB's fft auto-adjusted 8 physical core (red). The code is provided at the bottom of this comment. When a y value is larger than one, it means pyfftw's RFFT is faster.
According to the figure, the pyfftw's Rfft shows better performance for larger arrays compared to MATLAB's fft. Results may change a bit from one run to another...but the overall trend remains the same. Now that I've provided the performance ratio for (R)fft, I think it is worth it to show the plot for I(R)FFT, the other component of SDP. I(R)FFTI can see a different trend for irfft. When a y value is larger than one, it means pyfftw's IRFFT is faster.
This shows that MATLAB's ifft is faster for longer arrays. Now if we consider BOTH this I(R)FFT plot and the previous (R)FFT plot, it makes sense to some extent as why we saw some cases with performance ratio < 1 in the SDP plot, the last figure in this comment. Code for (R)FFTand, |
|
I think that the plots above confirm that "auto-adjusted 8 physical core" doesn't really help (based on the fact that the blue and red lines are very similar). Having said that, I wouldn't put to much emphasis on your ability to leverage all 8-cores on Matlab Online! I would guess that the hardware resources on Matlab Online are shared and, maybe, you have access to 2-cores. |
Certainly possible but nearly impossible to confirm. I just don't want you to waste time trying to interpret results that may be highly variable and that you may have little control over. The max performance between 1-core and 2-cores is around 10-20% and appears to diminish as you add more cores. |
Right! For instance....
I checked those pyfftw-vs-MATLAB plots again for FFT and for IFFT . Got slightly different outcome but the overall trends were almost the same. I also plotted that Q-T scatter plot (to check the correctness of the last plot in this comment) and the results were slightly different but the overall trend was the same:
Right! And, therefore, my suggestion is to just go with single-threaded pyfftw for now as I think it is ALMOST ready to be used in DAMP. And, later, if needed, we can try to enhance it to find the proper number of threads. To have smaller search space, we can also limit the maximum number of threads to four (or even two??). We can discuss it further when the time comes. |
@seanlaw |
Agreed!
Yes, please feel free to go ahead and merge when you think this is ready. I don't see anything wrong with the changes files |









This PR is to address #26. The goal is to use less-aggressive planning flag for FFTW.
As mentioned in #26 (comment), MATLAB takes a similar approach by default: