Conversation
ArcticDB Code Review SummaryAPI & Compatibility
Memory & Safety
Correctness
Code Quality
Testing
Build & Dependencies
Security
PR Title & Description
Documentation
SummaryThis PR adds comprehensive tests for groupby, resample, and concat on sparse Arrow data, fixes concat to accept Open issues (still unresolved): Medium — xfail markers missing Low — Combined sparse/row-count check in Informational — Latest push (f2fb26d):
Previously:
|
31c5dd5 to
f8bc36a
Compare
python/tests/unit/arcticdb/version_store/test_symbol_concatenation.py
Outdated
Show resolved
Hide resolved
python/tests/unit/arcticdb/version_store/test_symbol_concatenation.py
Outdated
Show resolved
Hide resolved
f8bc36a to
92d9f8f
Compare
| return pandas_meta; | ||
| } | ||
|
|
||
| // Pandas + Pandas |
There was a problem hiding this comment.
The Pandas + Pandas code in accumulate_norm_metadata is moved from the previous for loop in generate_norm_meta with the following minor changes:
- Variable renames to be more clear
- A couple more readability changes (non-behavior changing) highlighted in separate comments
| } | ||
| if (index.tz() != res_index->tz()) { | ||
| res_index->clear_tz(); | ||
| } |
There was a problem hiding this comment.
This comment was moved in favor of another comment in generate_norm_metadata explaining the interaction between non_mathching_name_indices and fake_field_pos
| res_index->fake_name()) { | ||
| res_index->set_name("index"); | ||
| res_index->set_is_int(false); | ||
| res_index->set_fake_name(true); |
There was a problem hiding this comment.
This was replaced by an:
if ((*res_index->mutable_timezone())[idx] != idx_timezone) {
(*res_index->mutable_timezone())[idx] = "";
}
92d9f8f to
fe96b45
Compare
This PR adds sparse arrow testing for groupby, resampling and concat. GroupBy works. Resampling does not. Monday ref: 11679866800 Concat is made to work in this PR and includes extra testing for concatenation between arrow and pandas
fe96b45 to
f2fb26d
Compare
Reference Issues/PRs
Monday ref: 18053022100
What does this implement or fix?
Concatenation is made to work with arrow written normalization metadata. Mixing arrow and pandas also works as long as
arrow.has_index() == pandas_index.is_physically_stored().This PR also adds sparse arrow testing for groupby, resampling and concat.
GroupBy and concat work as expected.
Resampling does not and is a non trivial fix. Monday ref to track: 11679866800
Any other comments?
Checklist
Checklist for code changes...