Discuss if we should change serialization format

This issue proposes revisiting whether HDF5 remains the best long-term container choice for OMX, or whether a modern alternative (or optional backend) would better support current and future use cases.

The goal is not to break OMX semantics, but to evaluate whether the container layer could evolve while preserving:

- Stable matrix semantics
- Efficient sparse and dense storage
- Long-term reproducibility

#### Background

OMX currently uses HDF5 as its underlying container, a design choice discussed early in the project’s history. At the time, HDF5 provided a mature, high-performance, cross-platform solution for large matrix storage.

Since then, the data ecosystem—especially in Python—has shifted significantly toward columnar, cross-language formats such as Apache Arrow and Parquet, which emphasize interoperability, cloud friendliness, and zero-copy data exchange.

#### Questions for discussion

1. Does HDF5 continue to meet OMX’s needs in modern Python and cloud-based workflows?
2. Are there known pain points with HDF5 (tooling, deployment, performance, maintenance)?
3. Could Arrow IPC / Feather or Parquet realistically serve as:
  a. A replacement container?
  b. An optional backend?
...with critical current OMX features(random access, slicing, determinism)

#### Prerequisites

Need to evolve our governance model to support a decision this significant.

#### Possible outcomes

1. Affirm HDF5 as the long-term container and document why
2. Support an alternative container backend while retaining OMX semantics

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Discuss if we should change serialization format #37

Background

Questions for discussion

Prerequisites

Possible outcomes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Discuss if we should change serialization format #37

Description

Background

Questions for discussion

Prerequisites

Possible outcomes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions