Currently, Product.data is just a NumPy array. Its dimensional meaning (e.g. [y, x, channel] vs [record, sample]) is purely by convention and not encoded anywhere. This limits interoperability between Workers and makes plugin behavior harder to validate.
It would help if Product instances carried explicit axis semantics.
Background / Options
1. xarray integration
xarray provides labeled dimensions via DataArray, and is becoming popular in scientific Python. However:
- adds heavy dependencies (pandas, xarray),
- indexing/overhead may affect performance in hot acquisition/processing loops,
- we would use only a small slice of its functionality.
Given Dirigo’s real-time constraints, xarray is probably not suitable for the core pipeline.
2. Lightweight axis annotation (preferred)
Add an axes attribute to Product, e.g. a tuple of StringEnum values like:
axes = (AxisName.Y, AxisName.X)
or for raw scanning:
axes = (AxisName.RECORD, AxisName.SAMPLE)
We would make this a required argument in _init_product_pool() so that every Worker explicitly declares the semantics of the data it produces.
This adds negligible overhead, no new dependencies, and makes Product meaning self-describing. Adding explicit axes enables:
- Better debugging/logging.
- More robust Writer plugins (TIFF, Zarr, HDF5) that can interpret dimensions correctly.
- Validation of Worker compatibility (e.g. a Processor expecting (Y, X) but receiving (RECORD, SAMPLE) could raise an error).
Proposed Implementation
- Add an
AxisName StrEnum with a small set of canonical dimension labels (RECORD, SAMPLE, X, Y, Z, CHANNEL, TIME, BIN, etc.).
- Add a new axes:
tuple[AxisName, ...] attribute to Product.
- Modify
_init_product_pool() to require an axes argument.
- Validate shape-axis consistency once at pool creation.
- In contrast to the
Dirigo.units module, we won't add magic methods for automatic unit algebra. No obvious need for this just yet.
Currently,
Product.datais just a NumPy array. Its dimensional meaning (e.g.[y, x, channel]vs[record, sample]) is purely by convention and not encoded anywhere. This limits interoperability between Workers and makes plugin behavior harder to validate.It would help if
Productinstances carried explicit axis semantics.Background / Options
1. xarray integration
xarray provides labeled dimensions via
DataArray, and is becoming popular in scientific Python. However:Given Dirigo’s real-time constraints, xarray is probably not suitable for the core pipeline.
2. Lightweight axis annotation (preferred)
Add an axes attribute to Product, e.g. a tuple of StringEnum values like:
or for raw scanning:
We would make this a required argument in
_init_product_pool()so that everyWorkerexplicitly declares the semantics of the data it produces.This adds negligible overhead, no new dependencies, and makes
Productmeaning self-describing. Adding explicit axes enables:Proposed Implementation
AxisNameStrEnumwith a small set of canonical dimension labels (RECORD,SAMPLE,X,Y,Z,CHANNEL,TIME,BIN, etc.).tuple[AxisName, ...]attribute toProduct._init_product_pool()to require anaxesargument.Dirigo.unitsmodule, we won't add magic methods for automatic unit algebra. No obvious need for this just yet.