Skip to content

feat: add N-D raster dimension query and manipulation functions#750

Draft
james-willis wants to merge 9 commits intoapache:mainfrom
james-willis:jw/nd-raster-functions
Draft

feat: add N-D raster dimension query and manipulation functions#750
james-willis wants to merge 9 commits intoapache:mainfrom
james-willis:jw/nd-raster-functions

Conversation

@james-willis
Copy link
Copy Markdown
Contributor

@james-willis james-willis commented Apr 3, 2026

Summary

Adds 8 new RS_* functions for querying and manipulating N-dimensional raster data. Depends on #749 (canonical N-D schema, traits, reader, builder, and RS_* migration). Now that #749 carries the schema and trait surface, this PR is purely additive — it adds 8 new rs_*.rs modules + register/lib updates + tests + doc stubs, with no schema or trait churn.

Dimension query functions (rs_dimensions.rs)

  • RS_NumDimensions(raster [, band]) → Int32 — number of dimensions
  • RS_DimNames(raster [, band]) → List<Utf8> — ordered dimension names
  • RS_DimSize(raster, dim_name [, band]) → Int64 — size of a named dimension (null if missing)
  • RS_Shape(raster [, band]) → List<Int64> — full shape array

When the band argument is omitted, defaults to band 0 and verifies all bands agree — returns an error if bands have different dimensionality.

Slice functions (rs_slice.rs)

  • RS_Slice(raster, dim_name, index) → Raster — reduce a dimension by picking one index
  • RS_SliceRange(raster, dim_name, start, end) → Raster — narrow a dimension to [start, end)

Dimension ↔ band functions (rs_dim_band.rs)

  • RS_DimToBand(raster, dim_name) → Raster — promote a dimension into separate bands
  • RS_BandToDim(raster, dim_name) → Raster — collapse bands into a new dimension

All slice/dim functions error on spatial dimensions (x_dim/y_dim). Phase 1 always materializes data as contiguous copies; view-composition fast paths are a follow-up.

None of these functions are GDAL-backed — none touch sedona-raster-gdal/.

Test plan

  • 31 new tests across 3 files (19 dimension queries + 12 slice/dim-band)
  • All 174 tests pass in sedona-raster-functions (143 existing + 31 new)
  • cargo clippy --all-targets -- -D warnings clean
  • cargo fmt --all --check clean
  • Round-trip test: RS_DimToBand then RS_BandToDim recovers original data

Add a crate-private parse_outdb_source helper that splits a SedonaDB
outdb URI into the underlying URI plus a 1-based source band index.
Two URI shapes are accepted, both private to the GDAL format driver:

- '<uri>#band=N' — SedonaDB convention for selecting band N.
- GDAL native subdataset URI ('HDF5:"x.h5":/var', 'GTIFF_DIR:N:foo.tif',
  ...) — passed through verbatim, defaulting to band 1.

Plain URIs default to band 1. Malformed '#band=' fragments (non-numeric,
zero, negative, > u32::MAX) return a clear Execution error.

Format-agnostic surfaces (incl. RS_BandPath) treat outdb_uri as opaque;
the parser is dispatched only when outdb_format routes to the GDAL
driver.
Replaces apache#787's 2D-only band schema with the canonical N-D schema:
spatial_dims/spatial_shape at the raster level; bands carry dim_names,
source_shape, nullable view, outdb_uri, outdb_format, plus the
non-nullable data buffer. Removes nodata_value, storage_type,
outdb_url, and outdb_band_id - every one is encodable in the new
schema:

- storage_type ↔ outdb_uri.is_null() (null = InDb, set = OutDbRef).
- outdb_url ↔ outdb_uri (no rename, same string).
- outdb_band_id ↔ encoded inside outdb_uri (#band=N or GDAL native
  subdataset URI), parsed only inside the GDAL format driver.
- nodata_value ↔ typed nodata: Binary (a null row means "no nodata").

Top-level adds spatial_dims: List<Utf8View> and spatial_shape:
List<Int64>; nullable view is List<Struct<source_axis, start, step,
steps: Int64>> where a null row encodes the canonical identity view.

Note: intermediate commits in this PR are not expected to build; only
the PR tip is CI-green. The trait, reader/builder, RS_* migration,
and GDAL loader port land in subsequent commits.
@james-willis james-willis force-pushed the jw/nd-raster-functions branch 2 times, most recently from 05eb8b0 to 1dca04b Compare May 4, 2026 23:31
RasterRef and BandRef accessors over the canonical N-D schema:
spatial_dims/spatial_shape, transform, crs, num_bands, band(i), and
band-level dim_names, source_shape, shape (visible, derived from view),
view, data_type, nodata, outdb_uri, outdb_format, nd_buffer,
contiguous_data returning Cow<[u8]>.

validate_view enforces all view rules including i64-overflow on
start + (steps-1)*step. NdBuffer exposes raw buffer + shape + byte
strides + offset for zero-copy access (numpy / Arrow C Data Interface
boundary); VIEW → byte strides happens inside nd_buffer().

Adds BandRef::is_2d() default method as the gate GDAL-backed paths
use to refuse N-D input cleanly: true iff dim_names == ["y","x"]
over the identity view.
… reader/builder + RS_* migration

View-aware Arrow reader (RasterStructArray, BandRefImpl) with corruption-
surgery (negative steps, bad source_axis, length mismatch) that
round-trips an ArrowError. Builder exposes start_raster / start_band
for full N-D plus start_raster_2d / start_band_2d for legacy 2D, with
identity-view default written as a null view row. finish_raster
validates each band's visible shape against the raster's spatial_shape
along the spatial dims.

All 33 RS_* functions migrated mechanically; outputs on 2D inputs are
byte-identical to apache#787. RS_BandPath keeps its existing inline
fragment-stripping (format-agnostic display, untouched by the GDAL
parser). Test helpers in sedona-testing rewritten on the N-D builder
API.
Reads outdb_uri + parse_outdb_source instead of apache#787's storage_type /
outdb_url / outdb_band_id triplet. Each GDAL-backed SQL function gates
on BandRef::is_2d() at entry and returns an Execution error on N-D
input. VSI normalization, the dataset cache, and RasterIO bodies are
byte-for-byte unchanged from apache#787 - only the schema-read sites move.

In-db reads use BandRef::contiguous_data() and require Cow::Borrowed
so MEM datasets can point at the StructArray's backing buffer without
copying; for is_2d identity views this always holds.

Tests rebuilt to use RasterBuilder directly. Adds an N-D rejection
test for raster_ref_to_gdal_mem and the VRT path, plus an end-to-end
#band=2 selection test against a two-band GeoTIFF.
Re-enables non-null `view` rows in the N-D raster reader. PR-B treats a
non-null view as a hard read error because its only writer (start_band)
emits the canonical identity (null view row); the view → byte-stride
composition path is needed by the slice/manipulation functions
(RS_Slice, RS_SliceRange, RS_DimToBand, RS_BandToDim) that land on
top of this PR.

Reintroduced:

- traits.rs: validate_view() + 22 unit tests.
- builder.rs: start_band_with_view() builder API and ~12
  view-construction tests (slice, broadcast, transpose, negative-step,
  IPC round-trip, etc.).
- array.rs: view → byte-stride composition in nd_buffer(), view-aware
  contiguous_data() Cow::Owned strided-copy slow path, and the array
  reader tests for explicit views (negative steps, OOB axis, length
  mismatch, malformed view rejection).

Identity-view writer/reader behavior is unchanged; PR-B's error path on
non-null view rows is replaced by the composition path validated here.
New dimension query functions for N-D rasters:

- RS_NumDimensions(raster [, band]) → Int32
- RS_DimNames(raster [, band]) → List<Utf8>
- RS_DimSize(raster, dim_name [, band]) → Int64 (null if dim missing)
- RS_Shape(raster [, band]) → List<Int64>

All accept an optional band index. When omitted, default to band 0
and verify all bands agree — error if bands have different
dimensionality, prompting user to specify a band index.

19 new tests covering 2D/3D rasters, explicit band args, null
handling, nonexistent dimensions, and mixed-dimensionality errors.
New N-D raster manipulation functions:

- RS_Slice(raster, dim_name, index) — reduce a dimension by picking
  one index, removing it from the output
- RS_SliceRange(raster, dim_name, start, end) — narrow a dimension
  to [start, end), keeping it with reduced size
- RS_DimToBand(raster, dim_name) — promote a dimension into separate
  bands (e.g., 1 band [time=3,y,x] → 3 bands [y,x])
- RS_BandToDim(raster, dim_name) — collapse all bands into one band
  with a new dimension (inverse of DimToBand)

All error on spatial dimension names (x_dim/y_dim). Phase 1 always
materializes data (contiguous copies). 12 new tests including
a DimToBand→BandToDim round-trip.
@james-willis james-willis force-pushed the jw/nd-raster-functions branch 2 times, most recently from 524c359 to bfc891f Compare May 5, 2026 02:11
Add .qmd doc stubs for RS_NumDimensions, RS_DimNames, RS_DimSize,
RS_Shape, RS_Slice, RS_SliceRange, RS_DimToBand, RS_BandToDim.
Required by the docs-and-deploy CI check which validates every
registered function has a documentation page.
@james-willis james-willis force-pushed the jw/nd-raster-functions branch from bfc891f to 2ee47a2 Compare May 5, 2026 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant