Conversation
ArcticDB Code Review SummaryPR: ASV benchmarks for column stats (#3018) Notes on previous review findingsThree issues flagged in the prior review have been addressed and remain addressed in this commit:
API & Compatibility
Memory & Safety
Correctness
Code Quality
Testing
Build & Dependencies
Security
PR Title & Description
Documentation
Overall: The extensions in this commit are well-designed and address all previously flagged issues. One minor item: restore a docstring for |
5cff615 to
45be6f5
Compare
python/benchmarks/column_stats.py
Outdated
| self.lib.read(self.symbol, columns=["uint64_col"], query_builder=q) | ||
|
|
||
|
|
||
| class ColumnStatsCreate: |
There was a problem hiding this comment.
Is this and ColumnStatsManagement worth benchmarking at this point? I think the current implementation will get ditched, and most stats will be generated at modification time, not through these explicit calls
There was a problem hiding this comment.
I think the APIs will survive - we will still need a way to create stats over old data that isn't being modified any more.
Evidence of stability from 3 runs: | Case | Run 1 | Run 2 | Run 3 | CV | |-----------------------------------|---------|---------|---------|------| | uint64 1M stats=T ordered=T | 5.97ms | 5.37ms | 5.37ms | 6.2% | | uint64 1M stats=F random=F | 3.67ms | 3.71ms | 3.97ms | 4.3% | | uint64 10M stats=T ordered=T | 13.2ms | 11.9ms | 12.4ms | 5.2% | | uint64 10M stats=F random=F | 16.8ms | 16.0ms | 17.2ms | 3.7% | | combined_or 1M stats=T ordered=T | 7.93ms | 7.38ms | 8.35ms | 6.2% | | combined_or 10M stats=T ordered=T| 23.5ms | 22.7ms | 24.0ms | 2.8% | | combined_or 10M stats=F random=F | 22.7ms | 22.3ms | 24.0ms | 3.9% |
45be6f5 to
079025f
Compare
Add benchmarks that run with and without column stats, on both ordered data (that should benefit a lot) and unordered data (that should not benefit). The unordered data helps us to measure the overhead of evaluating column stats when it cannot provide any benefit.
Since we're still waiting to merge #2958 the runs with and without column stats should be identical to start with. This will also help give us some nice evidence when we rebase and merge #2958.
Evidence of stability from 3 runs: