Description
pairs_generator() in R/generate-pairs.R works by expanding a grid of all
name combinations and then keeping only those where the inequality holds between the
two name strings. Because R's < operator on character vectors uses lexicographic
(alphabetical) ordering, direction = "lt" currently means "keep pairs where
name_a comes before name_b alphabetically" — not "keep pairs where name_a was
listed before name_b in the call to pairwise()".
A concrete example:
pairwise(z_score, age, bmi)
# Declaration order: z_score (1st), age (2nd), bmi (3rd)
# Current behaviour with direction = "lt":
# Keeps pairs where name is alphabetically smaller on the left:
# → (age, bmi), (age, z_score), (bmi, z_score)
# What a user might reasonably expect ("lt" = earlier in my list):
# → (z_score, age), (z_score, bmi), (age, bmi)
Two things are affected:
- The direction within each pair. If downstream code computes a directed
difference (e.g. mean(group_a) - mean(group_b)), the sign of the result
depends on which name ends up on the left. The current code always puts the
alphabetically earlier name on the left, regardless of what the user wrote.
- Predictability. Renaming a column (for example from
bmi to BMI) can
silently change which name appears first in each pair, which could affect printed
output or downstream comparisons.
For a pure lower-triangle use case (where the set of pairs is all that matters and
direction within each pair is irrelevant), this is harmless. But it is worth
clarifying the intended semantics before the pairwise path is used more broadly.
Proposed solution / discussion point
One option is to work with integer positions rather than name strings inside
pairs_generator():
pairs_generator = function(x, direction = "lteq", simplify = TRUE) {
idx = seq_along(x)
pairs = tidyr::expand_grid(i = idx, j = idx) |>
dplyr::filter(inequality(.data$i, .data$j, direction = direction))
# then map back: x[pairs$i], x[pairs$j]
...
}
This would make "lt" mean "declared before in the list", which is probably what
most users expect.
The alternative is to keep the current alphabetical behaviour but document it
explicitly, since it is at least deterministic. The key question is: what should
direction express: position in the declaration list, or alphabetical order of
names?
Description
pairs_generator()inR/generate-pairs.Rworks by expanding a grid of allname combinations and then keeping only those where the inequality holds between the
two name strings. Because R's
<operator on character vectors uses lexicographic(alphabetical) ordering,
direction = "lt"currently means "keep pairs wherename_acomes beforename_balphabetically" — not "keep pairs wherename_awaslisted before
name_bin the call topairwise()".A concrete example:
Two things are affected:
difference (e.g.
mean(group_a) - mean(group_b)), the sign of the resultdepends on which name ends up on the left. The current code always puts the
alphabetically earlier name on the left, regardless of what the user wrote.
bmitoBMI) cansilently change which name appears first in each pair, which could affect printed
output or downstream comparisons.
For a pure lower-triangle use case (where the set of pairs is all that matters and
direction within each pair is irrelevant), this is harmless. But it is worth
clarifying the intended semantics before the
pairwisepath is used more broadly.Proposed solution / discussion point
One option is to work with integer positions rather than name strings inside
pairs_generator():This would make
"lt"mean "declared before in the list", which is probably whatmost users expect.
The alternative is to keep the current alphabetical behaviour but document it
explicitly, since it is at least deterministic. The key question is: what should
directionexpress: position in the declaration list, or alphabetical order ofnames?