-
Notifications
You must be signed in to change notification settings - Fork 48
digest refactoring for performance #212
Description
While working #210, I noticed opportunities for opportunites for consolidating / refactoring code in digest.c.
In some sense, digest.c is providing a Facade around interacting with the various quirks of the imported hashing libraries. It does some setup / reinterpretation of the input from R, uses a series of cases to handle each of the different algorithms, and then returns a consistent output.
It seems over time those cases have been internally implemented in slightly different ways (and I imagine, the hasher interfaces have changed over time after initial implementation). There is now a fair bit of repetitious code and/or inconsistent naming/etc that could be consolidated. This would make the assorted "Consider X" issues smaller lifts to accomplish, present opportunities for a bit of unit testing on the C side, benchmarking, etc - all the normal benefits of DRY.
I propose the following refactoring objectives:
- refactor macros to functions (per discussion in expand hashing algorithms that support
raw = TRUEoption #210) - eliminate the duplicate code for streaming vs not approaches (e.g. by figuring out how to share the setup / return code, while switching between stream vs blob inputs)
- more clearly distinguish "jobs" of various functions (e.g. I imagine
digestwould be as a more of traffic manager, with the individual hasher interface logic being modularized out; then adding a new hasher means adding a new consistently-surfaced-function + another case to digest and presto) - additional linting (e.g. spaces around binary operators)
Thoughts? Additional objectives? I think some of these (and possibly additional targets added via discussion) are entangled, but there's some potential to isolate at least some of these as their own PRs - preferences one way or another?