Skip to content

bug: Quantity canonicalisation produces heterogeneous struct shapes, breaking repeat() over mixed-unit collections #2588

@piotrszul

Description

@piotrszul

Summary

When a FHIRPath collection contains Quantity values with different structural shapes (for example, an indefinite calendar duration alongside a UCUM quantity, or UCUM quantities with different canonical units that normalise differently), applying repeat($this) over that collection fails. The failure originates in Pathling's Quantity canonicalisation: structurally-different Quantity literals produce structs whose subfields differ in nullability, and repeat() subsequently casts those struct elements to VARIANT for deduplication, which rejects the heterogeneous shape.

This bug is independent of combine()combine() is simply the cleanest way to build a collection that exposes it, because it concatenates without deduplicating away the mismatched structs. The same underlying canonicalisation shape mismatch is also what produces the wontfix "Indefinite calendar duration union deduplication" exclusion ((1 year | 12 months)) in fhirpath/src/test/resources/fhirpath-js/config.yaml.

Reproduction

Both expressions fail in Pathling and are currently excluded from the fhirpath-js reference test suite (config.yaml:179-192):

(1 year).combine(12 months).repeat(\$this)
(3 'min').combine(180 seconds).repeat(\$this)

The first case mixes an indefinite calendar duration (year) with a definite one (month); the second mixes two UCUM units that canonicalise to different base representations.

Root cause (suspected)

Pathling's Quantity encoding uses a struct with fields for the original value/unit/system/code plus canonicalised value/unit fields used for equality and comparison. Indefinite calendar durations (year, month) leave the canonical subfields null, whereas UCUM quantities populate them. When two literals of these different classes end up in the same array, the resulting Spark array contains structs whose subfield nullabilities are inconsistent.

repeat()'s deduplication path casts the element struct to VARIANT, and Spark's VARIANT casting is strict about schema homogeneity within an array, so the cast fails.

Expected behaviour

repeat(\$this) over a heterogeneous Quantity collection should return the collection (each element is its own fixed point under \$this). It should not raise a canonicalisation/cast error.

Suggested fix directions

  1. Normalise Quantity struct shape at encoding time — emit a consistent schema where every subfield is always present (possibly with null values) regardless of whether the literal is a calendar duration or a UCUM quantity.
  2. Avoid VARIANT casting in repeat() for struct-typed collections — use a deduplication strategy that honours the collection's comparator instead, mirroring the approach already used by UnionOperator / CombiningLogic.dedupeArray for types with custom equality.

Option 1 would also unblock the related wontfix exclusion on indefinite calendar duration union deduplication.

Impact

  • Currently blocks repeat() / deduplication-based operations over any collection that mixes structurally-different Quantity literals.
  • Surfaces via combine() (new in Support combining functions #2384) but is a latent issue in the Quantity encoder, not in the combining functions.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingfhirpathRelated to fhirpath reference implementation

    Type

    Projects

    Status

    In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions