Skip to content

Sparse matrix storage #42

@Jake-Moss

Description

@Jake-Moss

With the recent discussions surrounding OpenMatrix, I'd like to propose that OMX support the storage of sparse matrices.

As noted by Pedro in osPlanning/omx-python#12 (comment) we've been testing a version of omx-python internally that supports the storage of sparse matrices in the CSR format with AequilibraE. This has allowed us to store incredibly large OD matrices (500k * 500K) in a manner than doesn't require terabytes of disk/memory. We believe that this functionality could be of use to the community at large.

The code is not public at the moment but the implementation itself is not particularly interesting. It stores a set of three arrays (two integer, one floating point) per matrix that encodes the CSR format under a single group. These groups are then stored under a top-level group named sparse. While this makes the files non-conformant to the OMX specification, it is forward compatible. Sparse matrices become invisible to other OMX software.

I believe that this can be achieved in a handful of ways. With the current HDF5 back end, an additional sparse group could be added that holds the sparse matrices, these could be encoded as CSR, or COO with some additionally metadata to specify the intended dimensions.

As columnar formats have been mentioned in other issues, I'll note that these would essentially switch the default storage format from dense to sparse (COO), dense matrices would see an increase in the explicitly stored information as I don't believe the formats are intended to support thousands of columns.

This issue is related to #35 and #37 .

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions