Skip to content

Add Max/Min of differences for numeric variables in "Differences" table of comparedf #369

@sheramin

Description

@sheramin

I'm using comparedf function to cross check two tables which usually contain calculated variables, such as BMI. Because the two tables were built by two different programmers, they could have used different rounding which in this case we might have some differences but they're not really significant - just rounding and decimal differences. I think adding a min.diff and max.diff columns in the diffs.byvar.table table for numeric variables could be really helpful in such cases to show the range of differences, so if the differences are reasonable we can skip otherwise to check the details. I know we can print the whole table by setting the control option and visually check this, but that's time consuming when you have a ton of variables. Following is a visual example of what I'm trying to say. (The example datasets are from admiral.test::admiral_vs package. BMI has been calculated by reshaping the dataset).
image

Below I tried to implement a solution in a very basic and simple way:

add_xy_diff <- compareres$diffs.table %>% 
  mutate(
    values.x = unlist(values.x),
    values.y = unlist(values.y),
    xy.diff = case_when(
      is.numeric(values.x) & is.numeric(values.y) ~ values.x - values.y,
      TRUE ~ NA_real_
    )
  )

image

xy_diff_smry <- add_xy_diff %>% 
  group_by(var.x, var.y) %>% 
  summarise(
    min.diff = min(xy.diff, na.rm = TRUE),
    max.diff = max(xy.diff, na.rm = TRUE)
  )

image

compareres$diffs.byvar.table %>% 
  left_join(xy_diff_smry, by = c("var.x", "var.y"))

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions