Star analysis performance improvement#257
Conversation
|
It seems that some of these new code paths are still quite scalar in nature. Have you explored using the |
Hello, I have explored this idea before my first PR with demosaicing as it seemed the rigth thing to do in the first place. Though I could not get a meaningful improvement compared with current scalar path, Could be because I didn't have good benchmaks at that time. Now with the right test and benchmark harness in place this idea can be pursued. Let me take a look. |
|
I have collected some data from test implementation for demosaicing as I had already worked on it. I am not sure what is the oldest CPU N.I.N.A. could have supported - probably limited with CPU that can run Windows 10? But one approach was using standard Vector that is SIMD-friendly and another one is using AVX intrinsics. I had benchmarks running with the following JIT knobs - DOTNET_EnableHWIntrinsic = 0/1 (0 for scalar path), DOTNET_EnableAVX2 = 0/1 and DOTNET_EnableAVX512F = 0/1 CaseName: Bayer16Aligned_IMX461_11656x8750_Demosaic Reference algorithm
Vector
AVX intrinsics
With the number of threads performance does not change much. I guess, I have saturated memory I/O on my laptop. What do you think? |
|
How did you verify that the edge detection will yield equivalent results. This is such a central component and crucial for star detection that I'm very hesitant to replace it. |
|
Considering NINA is ran most often on low-power CPUs, "typical" core count is going to be 4-8 cores. The |
I understand the question needs to be addressed better. In the last commits an image fixture has been created and used in bayer, gaussian and canny tests on different image formats - this fixture creates images with features like uniform black, uniform white, star field, diagonal line, step edge, checker board, etc. The change is mostly focused on improving performance of gaussian blur and canny edge detection (magnitude and edge orientation, non-maximums suppression, hysteresis is unchanged as it relies on sequential execution). I tried to keep the structure of the non blur canny the same is in the original file, so it is easier to compare side-by-side. The rest of the updates is benchmarking and test improvements. Anyway, I think I need to spend more time hardening the tests. The change was relatively easy to do, but testing is lot more important. |
This is also my line of thinking with Vector classes, it feels like keeping generic implementation is more beneficial for all kinds of PCs out there.. I will try it later on mini-PCs running N100 to see if we can still benefit. But I think this can be another round once we have good tests we can trust.
If I understood the question right, the parallel processing is not a problem as long as we can produce independent artifacts like debayered pixels or parts of canny pipeline like in this change.. |
|
I have improved tests by adding randomly generated star field with different image features to compare new canny behavior against the original (blured vs non-blured). Please find attached the example of the input images. |
Added back the actual reference implementation Bayer demosaicing and No Blur Canny Edge Detector to use in tests, accordingly tests are updated to use reference calculations (including Accord's ones) Added Benchmark project
…nnyEdgeDetector uses this core implementation along with a separate GaussianBlur Using deterministic image fixtures that are reused across all tests for all image formats
…generate a starfield with different features (100 images) that are used for both blured and non-blured canny paths
f8d537b to
056e70f
Compare



🚀 Purpose
Hello everyone,
I want to propose a performance improvement in the image analysis path.
As well as demosaicing of the image, the canny edge detectors could also benefit of parallelization and some local optimizations.
In order to achieve bit-exact results during the optimization, I have established different frame format tests and also benchmarks to compare against original algorithms. This is done by reusing original source file for no blur canny edge detector and using original Accord's canny edge detector. I also improved testing for demosaicing the same way, rather than relying on rewritten implementation in the test itself.
I have added a Benchmarks sub-project but I am not sure if it should be kept. Please let me know. See the benchmark results below.
🧪 How Was It Tested?
The code is unit tested with different real-world frame formats and output is compared against reference calculations that are preserved in the test folder or reused from Accord library.
✅ PR Checklist
Changelog.mdupdated (if applicable)🔗 Related Issues
Follow up on #147.
📸 Screenshots
📝 Additional Notes
CannyBlurredBenchmarks
CaseName: Gray8Aligned_IMX571_6224x4168
CannyNoBlurBenchmarks
CaseName: Gray8Aligned_IMX571_6224x4168