Skip to content

digest refactoring for performance #212

@pearsonca

Description

@pearsonca

While working #210, I noticed opportunities for opportunites for consolidating / refactoring code in digest.c.

In some sense, digest.c is providing a Facade around interacting with the various quirks of the imported hashing libraries. It does some setup / reinterpretation of the input from R, uses a series of cases to handle each of the different algorithms, and then returns a consistent output.

It seems over time those cases have been internally implemented in slightly different ways (and I imagine, the hasher interfaces have changed over time after initial implementation). There is now a fair bit of repetitious code and/or inconsistent naming/etc that could be consolidated. This would make the assorted "Consider X" issues smaller lifts to accomplish, present opportunities for a bit of unit testing on the C side, benchmarking, etc - all the normal benefits of DRY.

I propose the following refactoring objectives:

  • refactor macros to functions (per discussion in expand hashing algorithms that support raw = TRUE option #210)
  • eliminate the duplicate code for streaming vs not approaches (e.g. by figuring out how to share the setup / return code, while switching between stream vs blob inputs)
  • more clearly distinguish "jobs" of various functions (e.g. I imagine digest would be as a more of traffic manager, with the individual hasher interface logic being modularized out; then adding a new hasher means adding a new consistently-surfaced-function + another case to digest and presto)
  • additional linting (e.g. spaces around binary operators)

Thoughts? Additional objectives? I think some of these (and possibly additional targets added via discussion) are entangled, but there's some potential to isolate at least some of these as their own PRs - preferences one way or another?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions