Skip to content

Use fused multiply-add instructions when available#268

Draft
JSorngard wants to merge 7 commits intomainfrom
mul_add
Draft

Use fused multiply-add instructions when available#268
JSorngard wants to merge 7 commits intomainfrom
mul_add

Conversation

@JSorngard
Copy link
Copy Markdown
Owner

@JSorngard JSorngard commented Nov 7, 2025

This PR makes the polynomial function use fused multiply-add instructions when the target CPU supports them. This speeds up function execution slightly in those cases.

@JSorngard JSorngard added the enhancement New feature or request label Nov 7, 2025
@JSorngard
Copy link
Copy Markdown
Owner Author

JSorngard commented Nov 7, 2025

This means some tests started failing, because the exact errors have been shifted around by the difference in round-off error.

As a result they were made slightly less stringent.

@JSorngard
Copy link
Copy Markdown
Owner Author

JSorngard commented Nov 7, 2025

In the original Fortran code by Fukushima the compiler is allowed to pick which instructions to use. E.g. if the user sets "-march=native" it may use fused multiply-add instructions. As a result their use shouldn't conflict with how the algorithm is set up. Though it is possible that the minimax algorithm that determined the coefficients for the polynomials would have determined different coefficients if it ran with fma instrucitons (I don't know if it did).

@JSorngard JSorngard added the tests Related to the test suite of the library label Nov 7, 2025
@JSorngard
Copy link
Copy Markdown
Owner Author

I will need to determine if the fma instructions break the minimax approximation before I merge this.

@JSorngard JSorngard self-assigned this Nov 27, 2025
@JSorngard JSorngard marked this pull request as draft December 4, 2025 20:29
@JSorngard JSorngard removed their assignment Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request tests Related to the test suite of the library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant