Several fixes and improvements#62
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #62 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 16 16
Lines 948 966 +18
=========================================
+ Hits 948 966 +18 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
5e2a7bf to
4f613b0
Compare
manodeep
left a comment
There was a problem hiding this comment.
(I know you haven't yet asked for a review - but I did one anyway :D)
Small suggestion to make the test clearer, if you do accept it, the change will need to be applied for the other instances of expected_regions that follow
| first_dataset = profiling_log.parse() | ||
| second_dataset = profiling_log.parse() | ||
|
|
||
| expected_regions = ["Region 1", "Region 1_2", "Region 2", "Region 1_3"] |
There was a problem hiding this comment.
| expected_regions = ["Region 1", "Region 1_2", "Region 2", "Region 1_3"] | |
| expected_unique_regions = ["Region 1", "Region 1_2", "Region 2", "Region 1_3"] |
There was a problem hiding this comment.
Good idea! Done.
4f613b0 to
9182992
Compare
9182992 to
b1021cc
Compare
|
@manodeep I think this is ready to go, just needs one final review. |
manodeep
left a comment
There was a problem hiding this comment.
With the exception of that one comment, everything else looks good!
For that comment, may be just explain why the double parsing is required
| first_dataset = profiling_log.parse() | ||
| second_dataset = profiling_log.parse() |
There was a problem hiding this comment.
Why are you parsing twice here?
There was a problem hiding this comment.
It's to increase the odds of catching a non-deterministic relabeling of the regions. Maybe not really worth the trouble though, as those kind of issues are always hard to catch anyway.
There was a problem hiding this comment.
Ahh right - yeah but I don't think it matters functionally speaking. There are input duplicate region names and those are automatically re-labelled by some algorithm. As long as the region and values don't get separated out, it should be all good. For example, the last "Region 1" could be relabelled as "Region 1_2" but as long as the value corresponding to that remains as "4.0", it should be all fine.
(Thankfully, the existing test would catch such a catastrophic scrambling of the keys and values)
There was a problem hiding this comment.
It does matter in some cases, when we aggregate or concatenate data for a given region over different experiments. For example, in a scaling test, it would be problematic if duplicated regions in two different runs would be renamed in a non-deterministic way. Taking the data from the test, if for run 1 the algorithm produced:
["Region 1", "Region 1_2", "Region 2", "Region 1_3"]
and for run 2 it produced:
["Region 1", "Region 1_3", "Region 2", "Region 1_2"]
one would later be concatenating profiling data for regions of code that are not the same.
In any case, I think a test with one dataset is enough. If for some reason the ordering becomes random, the test is bound to fail sooner or later, as the order in the expected values is fixed.
There was a problem hiding this comment.
@manodeep I've now changed the test to only include one dataset.
…naming the duplicated regions.
b1021cc to
abeea07
Compare
These fix issues found when updating the ESM1.6 scaling notebook.